Our website is made possible by displaying online advertisements to our visitors. Please consider supporting us by disabling your ad blocker.

An Introduction to Indexes for MongoDB Atlas Search

TwitterFacebookRedditLinkedInHacker News

Imagine reading a long book like “A Song of Fire and Ice,” “The Lord of the Rings,” or “Harry Potter.” Now imagine that there was a specific detail in one of those books that you needed to revisit. You wouldn’t want to search every page in those long books to find what you were looking for. Instead, you’d want to use some sort of book index to help you quickly locate what you were looking for. This same concept of indexing content within a book can be carried to MongoDB Atlas Search with search indexes.

Atlas Search makes it easy to build fast, relevant, full-text search on top of your data in the cloud. It’s fully integrated, fully managed, and available with every MongoDB Atlas cluster running MongoDB version 4.2 or higher.

Correctly defining your indexes is important because they are responsible for making sure that you’re receiving relevant results when using Atlas Search. There is no one-size-fits-all solution and different indexes will bring you different benefits.

In this tutorial, we’re going to get a gentle introduction to creating indexes that will be valuable for various full-text search use cases.

Before we get too invested in this introduction, it’s important to note that Atlas Search uses Apache Lucene. This means that search indexes are not unique to Atlas Search and if you’re already comfortable with Apache Lucene, your existing knowledge of indexing will transfer. However, the tutorial could act as a solid refresher regardless.

Understanding the Data Model for the Documents in the Example

Before we start creating indexes, we should probably define what our data model will be for the example. In an effort to cover various indexing scenarios, the data model will be complex.

Take the following for example:

{
    "_id": "cea29beb0b6f7b9187666cbed2f070b3",
    "name": "Pikachu",
    "pokedex_entry": {
        "red": "When several of these Pokemon gather, their electricity could build and cause lightning storms.",
        "yellow": "It keeps its tail raised to monitor its surroundings. If you yank its tail, it will try to bite you."
    },
    "moves": [
        {
            "name": "Thunder Shock",
            "description": "A move that may cause paralysis."
        },
        {
            "name": "Thunder Wave",
            "description": "An electrical attack that may paralyze the foe."
        }
    ],
    "location": {
        "type": "Point",
        "coordinates": [-127, 37]
    }
}

The above example document is around Pokemon, but Atlas Search can be used on whatever documents are part of your application.

Example documents like the one above allow us to use text search, geo search, and potentially others. For each of these different search scenarios, the index might change.

When we create an index for Atlas Search, it is created at the collection level.

Statically Mapping Fields in a Document or Dynamically Mapping Fields as the Schema Evolves

There are two ways to map fields within a document when creating an index:

  • Dynamic Mappings
  • Static Mappings

If your document schema is still changing or your use case doesn’t allow for it to be rigidly defined, you might want to choose to dynamically map your document fields. A dynamic mapping will automatically assign fields when new data is inserted.

Take the following for example:

{
    "mappings": {
        "dynamic": true
    }
}

The above JSON represents a valid index. When you add it to a collection, you are essentially mapping every field that exists in the documents and any field that might exist in the future.

We can do a simple search using this index like the following:

db.pokemon.aggregate([
    {
        "$search": {
            "text": {
                "query": "thunder",
                "path": ["moves.name"]
            }
        }
    }
]);

We didn’t explicitly define the fields for this index, but attempting to search for “thunder” within the moves array will give us matching results based on our example data.

To be clear, dynamic mappings can be applied at the document level or the field level. At the document level, a dynamic mapping automatically indexes all common data types. At both levels, it automatically indexes all new and existing data.

While convenient, having a dynamic mapping index on all fields of a document comes at a cost. These indexes will take up more disk space and may be less performant.

The alternative is to use a static mapping, in which case you specify the fields to map and what type of fields they are. Take the following for example:

{
    "mappings": {
        "dynamic": false,
        "fields": {
            "name": {
                "type": "string"
            }
        }
    }
}

In the above example, the only field within our document that is being indexed is the name field.

The following search query would return results:

db.pokemon.aggregate([
    {
        "$search": {
            "text": {
                "query": "pikachu",
                "path": ["name"]
            }
        }
    }
]);

If we try to search on any other field within our document, we won’t end up with results because those fields are not statically mapped nor is the document schema dynamically mapped.

There is, however, a way to get the best of both worlds if we need it.

Take the following which uses static and dynamic mappings:

{
    "mappings": {
        "dynamic": false,
        "fields": {
            "name": {
                "type": "string"
            },
            "pokedex_entry": {
                "type": "document",
                "dynamic": true
            }
        }
    }
}

In the above example, we are still using a static mapping for the name field. However, we are using a dynamic mapping on the pokedex_entry field. The pokedex_entry field is an object so any field within that object will get the dynamic mapping treatment. This means all sub-fields are automatically mapped, as well as any new fields that might exist in the future. This could be useful if you want to specify what top level fields to map, but map all fields within a particular object as well.

Take the following search query as an example:

db.pokemon.aggregate([
    {
        "$search": {
            "text": {
                "query": "pokemon",
                "path": ["name", "pokedex_entry.red"]
            }
        }
    }
]);

The above search will return results if “pokemon” appears in the name field or the red field within the pokedex_entry object.

When using a static mapping, you need to specify a type for the field or have dynamic set to true on the field. If you only specify a type, dynamic defaults to false. If you only specify dynamic as true, then Atlas Search can automatically default certain field types (e.g., string, date, number).

Atlas Search Indexes for Complex Fields within a Document

With the basic dynamic versus static mapping discussion out of the way for MongoDB Atlas Search indexes, now we can focus on more complicated or specific scenarios.

Let’s first take a look at what our fully mapped index would look like for the document in our example:

{
    "mappings": {
        "dynamic": false,
        "fields": {
            "name": {
                "type": "string"
            },
            "moves": {
                "type": "document",
                "fields": {
                    "name": {
                        "type": "string"
                    },
                    "description": {
                        "type": "string"
                    }
                }
            },
            "pokedex_entry": {
                "type": "document",
                "fields": {
                    "red": {
                        "type": "string"
                    },
                    "yellow": {
                        "type": "string"
                    }
                }
            },
            "location": {
                "type": "geo"
            }
        }
    }
}

In the above example, we are using a static mapping for every field within our documents. An interesting thing to note is the moves array and the pokedex_entry object in the example document. Even though one is an array and the other is an object, the index is a document for both. While writing searches isn’t the focus of this tutorial, searching an array and object would be similar using dot notation.

Had any of the fields been nested deeper within the document, the same approach would be applied. For example, we could have something like this:

{
    "mappings": {
        "dynamic": false,
        "fields": {
            "pokedex_entry": {
                "type": "document",
                "fields": {
                    "gameboy": {
                        "type": "document",
                        "fields": {
                            "red": {
                                "type": "string"
                            },
                            "yellow": {
                                "type": "string"
                            }
                        }
                    }
                }
            }
        }
    }
}

In the above example, the pokedex_entry field was changed slightly to have another level of objects. Probably not a realistic way to model data for this dataset, but it should get the point across about mapping deeper nested fields.

Changing the Options for Specific Mapped Fields

Up until now, each of the indexes have only had their types defined in the mapping. The default options are currently being applied to every field. Options are a way to refine the index further based on your data to ultimately get more relevant search results. Let’s play around with some of the options within the mappings of our index.

Most of the fields in our example use the string data type, so there’s so much more we can do using options. Let’s see what some of those are.

{
    "mappings": {
        "dynamic": false,
        "fields": {
            "name": {
                "type": "string",
                "searchAnalyzer": "lucene.spanish",
                "ignoreAbove": 3000
            }
        }
    }
}

In the above example, we are specifying that we want to use a language analyzer on the name field instead of the default standard analyzer. We’re also saying that the name field should not be indexed if the field value is greater than 3000 characters.

The 3000 characters is just a random number for this example, but adding a limit, depending on your use case, could improve performance or the index size.

In a future tutorial, we’re going to explore the finer details in regards to what the search analyzers are and what they can accomplish.

These are just some of the available options for the string data type. Each data type will have its own set of options. If you want to use the default for any particular option, it does not need to be explicitly added to the mapped field.

You can learn more about the data types and their indexing options in the official documentation.

Conclusion

You just received what was hopefully a gentle introduction to creating indexes to be used in Atlas Search. To use Atlas Search, you will need at least one index on your collection, even if it is a default dynamic index. However, if you know your schema and are able to create static mappings, it is usually the better way to go to fine-tune relevancy and performance.

To learn more about Atlas Search indexes and the various data types, options, and analyzers available, check out the official documentation.

To learn how to build more on Atlas Search, check out my other tutorials: Building an Autocomplete Form Element with Atlas Search and JavaScript and Visually Showing Atlas Search Highlights with JavaScript and HTML.

Have a question or feedback about this tutorial? Head to the MongoDB Community Forums and let’s chat!

This content first appeared on MongoDB.

Nic Raboy

Nic Raboy

Nic Raboy is an advocate of modern web and mobile development technologies. He has experience in C#, JavaScript, Golang and a variety of frameworks such as Angular, NativeScript, and Unity. Nic writes about his development experiences related to making web and mobile development easier to understand.