August 10th, 2023

Introducing Text Indexes in Azure Cosmos DB for MongoDB vCore

Sudhanshu Vishodia
Senior Product Manager

In the realm of modern databases, efficient querying of text-based data is pivotal to deliver smooth user experiences and provide valuable insights from textual content. To address this need, Azure Cosmos DB for MongoDB vCore now offers text indexing. In this blog, we’ll delve into the ins and outs of text indexes in Azure Cosmos DB for MongoDB vCore, exploring their significance, implementation, and capabilities.

What are Text Indexes

Text indexes are a specialized data structure within Azure Cosmos DB for MongoDB vCore that vastly improve the performance of text-based queries. These indexes optimize the querying process for textual data such as articles, documents, comments, and other content-rich sources. Employing techniques like tokenization, stemming, and stop words, text indexes build an index that accelerates text-based searches, making them faster and more efficient.

Defining a Text Index

Let’s envision a scenario where we are developing a job search platform. The job listings are structured as JSON documents, each containing fields such as job_title, company, location, description, skills_required, employment_type, and posted_date. Let’s walk through the process of creating a text index for this job search platform.

Consider the following example job listing:

{
"_id": ObjectId("628b45f8e91234cdef567890"),
"job_title": "Senior Software Engineer",
"company": "Big Tech Innovations Inc.",
"location": "San Francisco, CA",
"description": "We're seeking a skilled Senior Software Engineer...",
"skills_required": ["Java", "JavaScript", "React", "Node.js"],
"employment_type": "Full-time",
"posted_date": "2023-08-08T00:00:00Z"
}

To create a text index for the fields: “job_title,” “company,” “location,” and “description,” you can use the following syntax:

db.job_listings.createIndex({job_title: "text", company: "text", location: "text", description: "text"});

With this text index, text searches will search across all of these job-related fields. Only one text index is allowed per collection.

Text Index: Options and Weights

Text indexes in Azure Cosmos DB for MongoDB vCore come with a range of customizable options that allow you to fine-tune their behavior. You can define the default language for text analysis, assign weights to fields to prioritize certain attributes, and configure case-insensitive searches, similar to the previous example.

In our job search platform scenario, you could create a text index with options like this:

db.job_listings.createIndex(
{
job_title: "text",
company: "text",
location: "text",
description: "text"
},
{
default_language: "english",
weights: { job_title: 10, description: 5 },
caseSensitive: false
});

This example demonstrates a text index spanning multiple fields, with weighted importance assigned to the job title and description fields. By doing so, you enhance the relevance of search results, ensuring that job titles hold higher significance in search scoring.

To look at the score of documents in the query result, you can use the $meta projection operator along with the textScore field in your query projection.

db.job_listings.find(
{ $text: { $search: "Senior Software Engineer" } },
{ score: { $meta: "textScore" } }
)

Executing Text Searches

The text operator, integrated into queries, matches a search string against all fields in the text index, promptly retrieving pertinent job listings in our example.

Consider this query as an example:

db.job_listings.find({ $text: { $search: "Senior Software Engineer" } });

This query will return documents from the “job_listings” collection containing the terms “Senior,” “Software,” and “Engineer” in any order.

Conclusion

Text indexes in Azure Cosmos DB for MongoDB vCore optimize text-based searches. Whether you’re developing a job search platform, a content-rich application, or any use case involving textual data, the power of text indexes is at your disposal within Azure Cosmos DB for MongoDB vCore.

About Azure Cosmos DB

Azure Cosmos DB is a fully managed NoSQL and relational database for modern app development with SLA-backed speed and availability, automatic and instant scalability, and support for open-source PostgreSQL, MongoDB, and Apache Cassandra. To stay in the loop on Azure Cosmos DB updates, follow us on Twitter, YouTube, and LinkedIn.

Author

Sudhanshu Vishodia
Senior Product Manager

Sudhanshu is a Program Manager on the Azure Cosmos DB team, specializing in MongoDB offerings.

0 comments

Discussion are closed.