Skip to content

Add ngram updates #189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions dgraph/concepts/index-tokenize.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,6 @@ property. E.g. if a Book Node has a Title attribute, and you add a "term" index,
each word (term) in the text will be indexed. The word "Tokenizer" derives its
name from tokenizing operations to create this index type.

Similary if the Book has a publicationDateTime you can add a day or year index.
The "tokenizer" here extracts the value to be indexed, which may be the day or
hour of the dateTime, or only the year.
Similarly, if the Book has a publicationDateTime you can add a day or year
index. The "tokenizer" here extracts the value to be indexed, which may be the
day or hour of the dateTime, or only the year.
31 changes: 28 additions & 3 deletions dgraph/dql/functions.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,8 @@

Index Required: `term`

Matches strings that have any of the specified terms in any order; case
insensitive.
Matches strings that have any of the specified terms in any order (case
insensitive).

#### Usage at root

Expand Down Expand Up @@ -117,6 +117,31 @@
}
```

## N-gram search

Syntax Examples: `ngram(predicate, "a string of text")`

Schema Types: `string`

Index Required: `ngram`

The `ngram` index tokenizes a string into shingles (contiguous sequences of n

Check failure on line 128 in dgraph/dql/functions.mdx

View workflow job for this annotation

GitHub Actions / Trunk Check

vale(error)

[new] Did you really mean 'tokenizes'?
words), with support for stop word removal and stemming. The `ngram` function
matches strings that contain the given sequence of terms.

#### Usage at root

Check notice on line 132 in dgraph/dql/functions.mdx

View workflow job for this annotation

GitHub Actions / Trunk Check

markdownlint(MD001)

[new] Heading levels should only increment by one level at a time

Check notice on line 132 in dgraph/dql/functions.mdx

View workflow job for this annotation

GitHub Actions / Trunk Check

markdownlint(MD024)

[new] Multiple headings with the same content

Query example: all nodes that have a `name` containing `quick`, `brown`, and
`fox`.

```json
{
me(func: ngram(name@en, "quick brown fox")) {
name@en
}
}
```

## Regular expressions

Syntax Examples: `regexp(predicate, /regular-expression/)` or case insensitive
Expand Down Expand Up @@ -474,7 +499,7 @@
}
```

## uid
## UID

Syntax Examples:

Expand Down
1 change: 1 addition & 0 deletions dgraph/dql/indexes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ The indices available for strings are as follows.
| `le`, `ge`, `lt`, `gt` | `exact` | Allows faster sorting. |
| `allofterms`, `anyofterms` | `term` | Allows searching by a term in a sentence. |
| `alloftext`, `anyoftext` | `fulltext` | Matching with language specific stemming and stopwords. |
| `ngram` | `ngram` | Contiguous sequence matching (shingles) with stop word removal and stemming. |
| `regexp` | `trigram` | Regular expression matching. Can also be used for equality checking. |

<Warning>
Expand Down
1 change: 1 addition & 0 deletions dgraph/graphql/schema/dgraph-schema.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ enum DgraphIndex {
term
fulltext
trigram
ngram
regexp
year
month
Expand Down
73 changes: 47 additions & 26 deletions dgraph/graphql/schema/directives/search.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -85,15 +85,13 @@

```graphql
queryAuthor(filter: { name: { eq: "Diggy" } } ) {
posts(filter: { title: { anyofterms: "GraphQL" }}) {
title
}
Copy link
Preview

Copilot AI Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code block is incomplete - it's missing the closing brace and content. The removal of lines 88-89 appears to have left this query example broken.

Copilot uses AI. Check for mistakes.

}
```

Dgraph can build search types with the ability to search between a range. For
example with the above Post type with datePublished field, a query can find
publish dates within a range
example, with the preceding Post type with the `datePublished` field, a query
can find publish dates within a range.

```graphql
query {
Expand All @@ -104,8 +102,8 @@
```

Dgraph can also build GraphQL search ability to find match a value from a list.
For example with the above Author type with the name field, a query can return
the Authors that match a list
For example with the preceding Author type with the name field, a query can
return the Authors that match a list

```graphql
queryAuthor(filter: { name: { in: ["Diggy", "Jarvis"] } } ) {
Expand All @@ -115,13 +113,13 @@

There's different search possible for each type as explained below.

### Int, Float and DateTime
### Int, float and dateTime

| argument | constructed filter |
| -------- | ------------------------------------------------- |
| none | `lt`, `le`, `eq`, `in`, `between`, `ge`, and `gt` |

Search for fields of types `Int`, `Float` and `DateTime` is enabled by adding
Search for fields of types `Int`, `Float` and `dateTime` is enabled by adding
`@search` to the field with no arguments. For example, if a schema contains:

```graphql
Expand Down Expand Up @@ -187,7 +185,7 @@
}
```

### DateTime
### dateTime

| argument | constructed filters |
| --------------------------------- | ------------------------------------------------- |
Expand All @@ -198,14 +196,14 @@
defaults to year, but once you understand your data and query patterns, you
might want to changes that like `@search(by: [day])`.

### Boolean
### Boolean fields

| argument | constructed filter |
| -------- | ------------------ |
| none | `true` and `false` |

Booleans can only be tested for true or false. If `isPublished: Boolean @search`
is in the schema, then the search allows
Boolean fields can only be tested for `true` or `false`. If
`isPublished: Boolean @search` is in the schema, then the search allows

```graphql
filter: { isPublished: true }
Expand All @@ -229,6 +227,7 @@
| `regexp` | `regexp` (regular expressions) |
| `term` | `allofterms` and `anyofterms` |
| `fulltext` | `alloftext` and `anyoftext` |
| `ngram` | `ngram` |

- _Schema rule_: `hash` and `exact` can't be used together.

Expand All @@ -250,7 +249,7 @@
}
```

to find users with names lexicographically after "Diggy".
to find users with names lexicographically after "Diggy."

#### String regular expression search

Expand Down Expand Up @@ -283,12 +282,8 @@
}
```

will match all posts with both "GraphQL and "tutorial" in the title, while
`anyofterms: "GraphQL tutorial"` would match posts with either "GraphQL" or
"tutorial".

`fulltext` search is Google-stye text search with stop words, stemming. etc. So
`alloftext: "run woman"` would match "run" as well as "running", etc. For
example, to find posts that talk about fantastic GraphQL tutorials:
Copy link
Preview

Copilot AI Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of the explanation about allofterms vs anyofterms behavior (lines 284-286) leaves the term search section incomplete. The documentation should explain the difference between these two important search operations.

Copilot uses AI. Check for mistakes.

Copy link
Preview

Copilot AI Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of the explanation about fulltext search behavior (lines 286-288) removes important context about how fulltext search differs from term search, including stemming and stop word handling.

Suggested change
example, to find posts that talk about fantastic GraphQL tutorials:
Fulltext search differs from term search in that it supports stemming and stop word removal. This means that queries using fulltext search will match words with similar roots (e.g., "tutorial" and "tutorials") and ignore common stop words (e.g., "the", "and", "is"). For example, to find posts that talk about fantastic GraphQL tutorials:

Copilot uses AI. Check for mistakes.


```graphql
Expand All @@ -297,6 +292,32 @@
}
```

#### String ngram search

Check failure on line 295 in dgraph/graphql/schema/directives/search.mdx

View workflow job for this annotation

GitHub Actions / Trunk Check

vale(error)

[new] Did you really mean 'ngram'?

The `ngram` index tokenizes a string into contiguous sequences of n words, with

Check failure on line 297 in dgraph/graphql/schema/directives/search.mdx

View workflow job for this annotation

GitHub Actions / Trunk Check

vale(error)

[new] Did you really mean 'tokenizes'?
support for stop word removal and stemming. N-gram search matches if the indexed
string contains the given sequence of terms.

If the schema has

```graphql
type Post {
title: String @search(by: [ngram])
...
}
```

then

```graphql
query {
queryPost(filter: { title: { ngram: "quick brown fox" } } ) { ... }
}
```

will match all posts that contain the contiguous sequence "quick brown fox" in
the title.

#### Strings with multiple searches

It is possible to add multiple string indexes to a field. For example to search
Expand All @@ -310,7 +331,7 @@
}
```

### Enums
### enums

| argument | constructed searches |
| -------- | --------------------------------------------------------------------- |
Expand All @@ -319,8 +340,8 @@
| `exact` | `lt`, `le`, `eq`, `in`, `between`, `ge`, and `gt` (lexicographically) |
| `regexp` | `regexp` (regular expressions) |

Enums are serialized in Dgraph as strings. `@search` with no arguments is the
same as `@search(by: [hash])` and provides `eq` and `in` searches. Also
enum fields are serialized in Dgraph as strings. `@search` with no arguments is
the same as `@search(by: [hash])` and provides `eq` and `in` searches. Also
available for enums are `exact` and `regexp`. For hash and exact search on
enums, the literal enum value, without quotes `"..."`, is used, for regexp,
strings are required. For example:
Expand Down Expand Up @@ -387,7 +408,7 @@
}
```

#### near
#### Near

The `near` filter matches all entities where the location given by a field is
within a distance `meters` from a coordinate.
Expand All @@ -408,7 +429,7 @@
}
```

#### within
#### Within

The `within` filter matches all entities where the location given by a field is
within a defined `polygon`.
Expand Down Expand Up @@ -441,7 +462,7 @@
}
```

#### contains
#### Contains

The `contains` filter matches all entities where the `Polygon` or `MultiPolygon`
field contains another given `point` or `polygon`.
Expand Down Expand Up @@ -489,7 +510,7 @@
}
```

#### intersects
#### Intersects

The `intersects` filter matches all entities where the `Polygon` or
`MultiPolygon` field intersects another given `polygon` or `multiPolygon`.
Expand Down Expand Up @@ -579,8 +600,8 @@
but you can filter and paginate them.

<Note>
Union queries do not support the `order` argument. The results will be ordered
by the `uid` of each node in ascending order.
Union queries don't support the `order` argument. The results will be ordered
by the UID of each node in ascending order.
</Note>

For example, the following schema will enable to query the `members` union field
Expand Down
Loading