Introducing vCore-based Azure Cosmos DB for MongoDB’s Latest Features

Khelan Modi

April 5th, 20240 1

The vCore-based Azure Cosmos DB for MongoDB unveils significant AI enhancements. The updated integrated vector database introduces groundbreaking features to elevate developers’ management and interaction with data, promising faster, more scalable, and intuitive applications.

HNSW Vector Index: Now Generally Available!

Azure Cosmos DB for MongoDB proudly introduces the general availability of the HNSW Vector Index, a leap forward in AI-driven database capabilities. This update adds a powerful tool to the platform for fast and precise approximate nearest neighbor searches, perfectly suited for complex tasks like image recognition, creating dynamic recommendation systems etc. The HNSW Vector Index is great in handling and querying high-dimensional data, designed to make applications quicker, more scalable, and intuitively smarter.

Implementing HNSW in Azure Cosmos DB

Developers can now implement the HNSW vector index in their databases by specifying vector-hnsw during index creation. The setup also requires defining the max number of layer connections, dynamic candidate list size, similarity metric, and dimensions. For detailed guidance, refer to Azure Cosmos DB documentation.

{  
    "createIndexes": "<collection_name>", 
    "indexes": [ 
        { 
            "name": "<index_name>", 
            "key": { 
                "<path_to_property>": "cosmosSearch" 
            }, 
            "cosmosSearchOptions": {  
                "kind": "vector-hnsw",  
                "m": <integer_value>,  
                "efConstruction": <integer_value>,  
                "similarity": "<string_value>",  
                "dimensions": <integer_value>  
            }  
        }  
    ]  
}

Optimizing HNSW Usage: Tips and Warnings

While the HNSW index offers remarkable advantages, it’s crucial to deploy it wisely to avoid potential issues, such as memory shortages. Only M40 cluster tiers or above support the use of HNSW.

Leveraging Semantic Caching with Langchain

The integration of Langchain for semantic caching marks a leap towards smarter, more efficient databases. The feature actively optimizes data retrieval and significantly reduces latency by prefiltering search queries, ensuring faster access to relevant information. Check out the documentation for more information.

Refined Search Capabilities for Complex Queries using Filtered Vector Search

Azure Cosmos DB for MongoDB has improved its vector search capabilities through the introduction of pre-filter vector search. The filter expression in the search spec can compare an indexed single path field, effectively acting as a prefilter to significantly narrow the scope of the vector search. This enables developers to fine-tune their searches from the outset, excluding irrelevant data, focusing the search process, and improving efficiency. For example, in a real-world scenario, imagine a developer wants to perform a vector search that exclusively targets users residing in Seattle within their collection. They could utilize a “filter” expression in their search specification to only include documents where the “location” field matches “Seattle,” thereby ensuring the vector search is strictly limited to users from Seattle. For instance, an example of search spec might look like this:

    results = collection.aggregate([
        {
            '$search': {
                "cosmosSearch": {
                    "vector": query_embedding,
                    "path": "Embedding",
                    "k": num_results,
                    "filter": { "Location": {"$nin": ["Seattle"]}}
                },
                "returnStoredSource": True
            }},
        {'$project': { 'similarityScore': { '$meta': 'searchScore' }, 'document' : '$$ROOT' }
    }])

With this feature, developers can narrow down searches from the start by comparing an indexed field against various comparison and aggregation operators such as $eq, $gt, $lt, $gte, $lte, $ne, $in, $nin, $and, $or, and $regex. This enhancement not only increases the flexibility of Cosmos DB for MongoDB vCore but also significantly improves the efficiency of retrieving relevant data, making it a powerful tool for developers working with complex datasets. Check out the documentation for more details.

Get Started with Azure Cosmos DB for free!

Azure Cosmos DB is a fully managed NoSQL, relational, and vector database for modern app development with SLA-backed speed and availability, automatic and instant scalability, and support for open-source PostgreSQL, MongoDB, and Apache Cassandra. Learn more about Azure Cosmos DB for MongoDB vCore’s free tier here. To stay in the loop on Azure Cosmos DB updates, follow us on Twitter, YouTube, and LinkedIn.

Get Started now!