Intervals query
editIntervals query
editReturns documents based on the order and proximity of matching terms.
The intervals query uses matching rules, constructed from a small set of
definitions. These rules are then applied to terms from a specified field.
The definitions produce sequences of minimal intervals that span terms in a body of text. These intervals can be further combined and filtered by parent sources.
Example request
editThe following intervals search returns documents containing my
favorite food without any gap, followed by hot water or cold porridge in the
my_text field.
This search would match a my_text value of my favorite food is cold
porridge but not when it's cold my favorite food is porridge.
resp = client.search(
query={
"intervals": {
"my_text": {
"all_of": {
"ordered": True,
"intervals": [
{
"match": {
"query": "my favorite food",
"max_gaps": 0,
"ordered": True
}
},
{
"any_of": {
"intervals": [
{
"match": {
"query": "hot water"
}
},
{
"match": {
"query": "cold porridge"
}
}
]
}
}
]
}
}
}
},
)
print(resp)
response = client.search(
body: {
query: {
intervals: {
my_text: {
all_of: {
ordered: true,
intervals: [
{
match: {
query: 'my favorite food',
max_gaps: 0,
ordered: true
}
},
{
any_of: {
intervals: [
{
match: {
query: 'hot water'
}
},
{
match: {
query: 'cold porridge'
}
}
]
}
}
]
}
}
}
}
}
)
puts response
const response = await client.search({
query: {
intervals: {
my_text: {
all_of: {
ordered: true,
intervals: [
{
match: {
query: "my favorite food",
max_gaps: 0,
ordered: true,
},
},
{
any_of: {
intervals: [
{
match: {
query: "hot water",
},
},
{
match: {
query: "cold porridge",
},
},
],
},
},
],
},
},
},
},
});
console.log(response);
POST _search
{
"query": {
"intervals" : {
"my_text" : {
"all_of" : {
"ordered" : true,
"intervals" : [
{
"match" : {
"query" : "my favorite food",
"max_gaps" : 0,
"ordered" : true
}
},
{
"any_of" : {
"intervals" : [
{ "match" : { "query" : "hot water" } },
{ "match" : { "query" : "cold porridge" } }
]
}
}
]
}
}
}
}
}
Top-level parameters for intervals
editmatch rule parameters
editThe match rule matches analyzed text.
-
query -
(Required, string) Text you wish to find in the provided
<field>. -
max_gaps -
(Optional, integer) Maximum number of positions between the matching terms. Terms further apart than this are not considered matches. Defaults to
-1.If unspecified or set to
-1, there is no width restriction on the match. If set to0, the terms must appear next to each other. -
ordered -
(Optional, Boolean)
If
true, matching terms must appear in their specified order. Defaults tofalse. -
analyzer -
(Optional, string) analyzer used to analyze terms in the
query. Defaults to the top-level<field>'s analyzer. -
filter - (Optional, interval filter rule object) An optional interval filter.
-
use_field -
(Optional, string) If specified, then match intervals from this
field rather than the top-level
<field>. Terms are analyzed using the search analyzer from this field. This allows you to search across multiple fields as if they were all the same field; for example, you could index the same text into stemmed and unstemmed fields, and search for stemmed tokens near unstemmed ones.
prefix rule parameters
editThe prefix rule matches terms that start with a specified set of characters.
This prefix can expand to match at most indices.query.bool.max_clause_count
search setting terms. If the prefix matches more terms,
Elasticsearch returns an error. You can use the
index-prefixes option in the field mapping to avoid this
limit.
-
prefix -
(Required, string) Beginning characters of terms you wish to find in the
top-level
<field>. -
analyzer -
(Optional, string) analyzer used to normalize the
prefix. Defaults to the top-level<field>'s analyzer. -
use_field -
(Optional, string) If specified, then match intervals from this field rather than the top-level
<field>.The
prefixis normalized using the search analyzer from this field, unless a separateanalyzeris specified.
wildcard rule parameters
editThe wildcard rule matches terms using a wildcard pattern. This pattern can
expand to match at most indices.query.bool.max_clause_count
search setting terms. If the pattern matches more terms,
Elasticsearch returns an error.
-
pattern -
(Required, string) Wildcard pattern used to find matching terms.
This parameter supports two wildcard operators:
-
?, which matches any single character -
*, which can match zero or more characters, including an empty one
Avoid beginning patterns with
*or?. This can increase the iterations needed to find matching terms and slow search performance. -
-
analyzer -
(Optional, string) analyzer used to normalize the
pattern. Defaults to the top-level<field>'s analyzer. -
use_field -
(Optional, string) If specified, match intervals from this field rather than the top-level
<field>.The
patternis normalized using the search analyzer from this field, unlessanalyzeris specified separately.
regexp rule parameters
editThe regexp rule matches terms using a regular expression pattern.
This pattern can expand to match at most indices.query.bool.max_clause_count
search setting terms.
If the pattern matches more terms,Elasticsearch returns an error.
-
pattern -
(Required, string) Regexp pattern used to find matching terms.
For a list of operators supported by the
regexppattern, see Regular expression syntax.
Avoid using wildcard patterns, such as .* or .*?+`. This can
increase the iterations needed to find matching terms and slow search
performance.
-
analyzer -
(Optional, string) analyzer used to normalize the
pattern. Defaults to the top-level<field>'s analyzer.
-
use_field -
(Optional, string) If specified, match intervals from this field rather than the top-level
<field>.The
patternis normalized using the search analyzer from this field, unlessanalyzeris specified separately.
fuzzy rule parameters
editThe fuzzy rule matches terms that are similar to the provided term, within an
edit distance defined by Fuzziness. If the fuzzy expansion matches more than
indices.query.bool.max_clause_count
search setting terms, Elasticsearch returns an error.
-
term - (Required, string) The term to match
-
prefix_length -
(Optional, integer) Number of beginning characters left unchanged when creating
expansions. Defaults to
0. -
transpositions -
(Optional, Boolean) Indicates whether edits include transpositions of two
adjacent characters (ab → ba). Defaults to
true. -
fuzziness -
(Optional, string) Maximum edit distance allowed for matching. See Fuzziness
for valid values and more information. Defaults to
auto. -
analyzer -
(Optional, string) analyzer used to normalize the
term. Defaults to the top-level<field>'s analyzer. -
use_field -
(Optional, string) If specified, match intervals from this field rather than the top-level
<field>.The
termis normalized using the search analyzer from this field, unlessanalyzeris specified separately.
range rule parameters
editThe range rule matches terms contained within a provided range.
This range can expand to match at most indices.query.bool.max_clause_count
search setting terms.
If the range matches more terms,Elasticsearch returns an error.
-
gt - (Optional, string) Greater than: match terms greater than the provided term.
-
gte - (Optional, string) Greater than or equal to: match terms greater than or equal to the provided term.
-
lt - (Optional, string) Less than: match terms less than the provided term.
-
lte - (Optional, string) Less than or equal to: match terms less than or equal to the provided term.
It is required to provide one of gt or gte params.
It is required to provide one of lt or lte params.
-
analyzer -
(Optional, string) analyzer used to normalize the
pattern. Defaults to the top-level<field>'s analyzer. -
use_field -
(Optional, string) If specified, match intervals from this field rather than the
top-level
<field>.
all_of rule parameters
editThe all_of rule returns matches that span a combination of other rules.
-
intervals - (Required, array of rule objects) An array of rules to combine. All rules must produce a match in a document for the overall source to match.
-
max_gaps -
(Optional, integer) Maximum number of positions between the matching terms. Intervals produced by the rules further apart than this are not considered matches. Defaults to
-1.If unspecified or set to
-1, there is no width restriction on the match. If set to0, the terms must appear next to each other.Internal intervals can have their own
max_gapsvalues. In this case we first find internal intervals with theirmax_gapsvalues, and then combine them to see if a gap between internal intervals match the value ofmax_gapsof theall_ofrule.For examples, how
max_gapsworks, see max_gaps inall_ofordered and unordered rule. -
ordered -
(Optional, Boolean) If
true, intervals produced by the rules should appear in the order in which they are specified. Defaults tofalse.
If ordered is false, intervals can appear in any order,
including overlapping with each other.
-
filter - (Optional, interval filter rule object) Rule used to filter returned intervals.
any_of rule parameters
editThe any_of rule returns intervals produced by any of its sub-rules.
-
intervals - (Required, array of rule objects) An array of rules to match.
-
filter - (Optional, interval filter rule object) Rule used to filter returned intervals.
filter rule parameters
editThe filter rule returns intervals based on a query. See
Filter example for an example.
-
after -
(Optional, query object) Query used to return intervals that follow an interval
from the
filterrule. -
before -
(Optional, query object) Query used to return intervals that occur before an
interval from the
filterrule. -
contained_by -
(Optional, query object) Query used to return intervals contained by an interval
from the
filterrule. -
containing -
(Optional, query object) Query used to return intervals that contain an interval
from the
filterrule. -
not_contained_by -
(Optional, query object) Query used to return intervals that are not
contained by an interval from the
filterrule. -
not_containing -
(Optional, query object) Query used to return intervals that do not contain
an interval from the
filterrule. -
not_overlapping -
(Optional, query object) Query used to return intervals that do not overlap
with an interval from the
filterrule. -
overlapping -
(Optional, query object) Query used to return intervals that overlap with an
interval from the
filterrule. -
script -
(Optional, script object) Script used to return
matching documents. This script must return a boolean value,
trueorfalse. See Script filters for an example.
Notes
editFilter example
editThe following search includes a filter rule. It returns documents that have
the words hot and porridge within 10 positions of each other, without the
word salty in between:
resp = client.search(
query={
"intervals": {
"my_text": {
"match": {
"query": "hot porridge",
"max_gaps": 10,
"filter": {
"not_containing": {
"match": {
"query": "salty"
}
}
}
}
}
}
},
)
print(resp)
response = client.search(
body: {
query: {
intervals: {
my_text: {
match: {
query: 'hot porridge',
max_gaps: 10,
filter: {
not_containing: {
match: {
query: 'salty'
}
}
}
}
}
}
}
}
)
puts response
const response = await client.search({
query: {
intervals: {
my_text: {
match: {
query: "hot porridge",
max_gaps: 10,
filter: {
not_containing: {
match: {
query: "salty",
},
},
},
},
},
},
},
});
console.log(response);
POST _search
{
"query": {
"intervals" : {
"my_text" : {
"match" : {
"query" : "hot porridge",
"max_gaps" : 10,
"filter" : {
"not_containing" : {
"match" : {
"query" : "salty"
}
}
}
}
}
}
}
}
Script filters
editYou can use a script to filter intervals based on their start position, end
position, and internal gap count. The following filter script uses the
interval variable with the start, end, and gaps methods:
resp = client.search(
query={
"intervals": {
"my_text": {
"match": {
"query": "hot porridge",
"filter": {
"script": {
"source": "interval.start > 10 && interval.end < 20 && interval.gaps == 0"
}
}
}
}
}
},
)
print(resp)
response = client.search(
body: {
query: {
intervals: {
my_text: {
match: {
query: 'hot porridge',
filter: {
script: {
source: 'interval.start > 10 && interval.end < 20 && interval.gaps == 0'
}
}
}
}
}
}
}
)
puts response
const response = await client.search({
query: {
intervals: {
my_text: {
match: {
query: "hot porridge",
filter: {
script: {
source:
"interval.start > 10 && interval.end < 20 && interval.gaps == 0",
},
},
},
},
},
},
});
console.log(response);
POST _search
{
"query": {
"intervals" : {
"my_text" : {
"match" : {
"query" : "hot porridge",
"filter" : {
"script" : {
"source" : "interval.start > 10 && interval.end < 20 && interval.gaps == 0"
}
}
}
}
}
}
}
Minimization
editThe intervals query always minimizes intervals, to ensure that queries can
run in linear time. This can sometimes cause surprising results, particularly
when using max_gaps restrictions or filters. For example, take the
following query, searching for salty contained within the phrase hot
porridge:
resp = client.search(
query={
"intervals": {
"my_text": {
"match": {
"query": "salty",
"filter": {
"contained_by": {
"match": {
"query": "hot porridge"
}
}
}
}
}
}
},
)
print(resp)
response = client.search(
body: {
query: {
intervals: {
my_text: {
match: {
query: 'salty',
filter: {
contained_by: {
match: {
query: 'hot porridge'
}
}
}
}
}
}
}
}
)
puts response
const response = await client.search({
query: {
intervals: {
my_text: {
match: {
query: "salty",
filter: {
contained_by: {
match: {
query: "hot porridge",
},
},
},
},
},
},
},
});
console.log(response);
POST _search
{
"query": {
"intervals" : {
"my_text" : {
"match" : {
"query" : "salty",
"filter" : {
"contained_by" : {
"match" : {
"query" : "hot porridge"
}
}
}
}
}
}
}
}
This query does not match a document containing the phrase hot porridge is
salty porridge, because the intervals returned by the match query for hot
porridge only cover the initial two terms in this document, and these do not
overlap the intervals covering salty.
max_gaps in all_of ordered and unordered rule
editThe following intervals search returns documents containing my
favorite food without any gap, followed by cold porridge that
can have at most 4 tokens between "cold" and "porridge". These
two inner intervals when combined in the outer all_of interval,
must have at most 1 gap between each other.
Because the all_of rule has ordered set to true, the inner
intervals are expected to be in the provided order. Thus,
this search would match a my_text value of my favorite food is cold
porridge but not when it's cold my favorite food is porridge.
resp = client.search(
query={
"intervals": {
"my_text": {
"all_of": {
"ordered": True,
"max_gaps": 1,
"intervals": [
{
"match": {
"query": "my favorite food",
"max_gaps": 0,
"ordered": True
}
},
{
"match": {
"query": "cold porridge",
"max_gaps": 4,
"ordered": True
}
}
]
}
}
}
},
)
print(resp)
const response = await client.search({
query: {
intervals: {
my_text: {
all_of: {
ordered: true,
max_gaps: 1,
intervals: [
{
match: {
query: "my favorite food",
max_gaps: 0,
ordered: true,
},
},
{
match: {
query: "cold porridge",
max_gaps: 4,
ordered: true,
},
},
],
},
},
},
},
});
console.log(response);
POST _search
{
"query": {
"intervals" : {
"my_text" : {
"all_of" : {
"ordered" : true,
"max_gaps": 1,
"intervals" : [
{
"match" : {
"query" : "my favorite food",
"max_gaps" : 0,
"ordered" : true
}
},
{
"match" : {
"query" : "cold porridge",
"max_gaps" : 4,
"ordered" : true
}
}
]
}
}
}
}
}
Below is the same query, but with ordered set to false. This means that
intervals can appear in any order, even overlap with each other.
Thus, this search would match a my_text value of my favorite food is cold
porridge, as well as when it's cold my favorite food is porridge.
In when it's cold my favorite food is porridge, cold .... porridge interval
overlaps with my favorite food interval.
resp = client.search(
query={
"intervals": {
"my_text": {
"all_of": {
"ordered": False,
"max_gaps": 1,
"intervals": [
{
"match": {
"query": "my favorite food",
"max_gaps": 0,
"ordered": True
}
},
{
"match": {
"query": "cold porridge",
"max_gaps": 4,
"ordered": True
}
}
]
}
}
}
},
)
print(resp)
const response = await client.search({
query: {
intervals: {
my_text: {
all_of: {
ordered: false,
max_gaps: 1,
intervals: [
{
match: {
query: "my favorite food",
max_gaps: 0,
ordered: true,
},
},
{
match: {
query: "cold porridge",
max_gaps: 4,
ordered: true,
},
},
],
},
},
},
},
});
console.log(response);