October 20th, 2025
0 reactions

Announcing latest Azure Cosmos DB Python SDK: Powering the Future of AI with OpenAI

Theo van Kraay
Principal Program Manager

We’re thrilled to announce the stable release of Azure Cosmos DB Python SDK version 4.14.0! This release brings together months of innovation and collaboration, featuring ground-breaking capabilities that have been battle-tested in production environments. Many of these features were developed in close partnership with OpenAI, who rely heavily on Cosmos DB to store chat data for ChatGPT at massive scale.

What Makes This Release Special

After extensive beta testing, we’re proud to deliver a stable release that combines performance, intelligence, and developer productivity. The features in this release have been proven in real-world scenarios, including powering some of the most demanding AI workloads in the world.

🚀 Major New Features

Semantic Reranking – AI powered document intelligence (Preview)

One of the most exciting additions is our new Semantic Reranking API, currently a private preview feature that brings AI-powered document reranking directly to your Cosmos DB containers. This feature leverages Azure’s inference services to intelligently rank documents based on semantic relevance. If you want to be onboarded to the semantic re-ranking private preview – sign up here. For more information, contact us at CosmosDBSemanticReranker@Microsoft.com. Check out our demo sample here to test drive this, and other powerful semantic search features, in Python for Azure Cosmos DB.

from azure.cosmos import CosmosClient

# Initialize your client
client = CosmosClient(endpoint, key)
container = client.get_database("MyDatabase").get_container("MyContainer")

# Perform semantic reranking
results = container.semantic_rerank(
    context="What is the capital of France?",
    documents=[
        "Berlin is the capital of Germany.",
        "Paris is the capital of France.", 
        "Madrid is the capital of Spain."
    ],
    options={
        "return_documents": True,
        "top_k": 10,
        "batch_size": 32,
        "sort": True
    }
)

# Results are intelligently ranked by relevance
print(results)
# Output:
# {
#   "Scores": [
#     {
#       "index": 1,
#       "document": "Paris is the capital of France.",
#       "score": 0.9921875
#     },
#     ...
#   ]
# }

This feature enables you to build more intelligent applications that can understand context and meaning, not just keyword matching. Perfect for RAG (Retrieval-Augmented Generation) patterns in AI applications.

Read Many Items – Optimized Batch Retrieval

The new read_items API revolutionizes how you retrieve multiple documents, offering significant performance improvements and cost savings over individual point reads.

# Define the items you want to retrieve
item_list = [
    ("item1", "partition1"),
    ("item2", "partition1"), 
    ("item3", "partition2")
]

# Retrieve all items in a single optimized request
items = list(container.read_items(
    item_list=item_list,
    max_degree_of_parallelism=4,
    max_items_per_batch=100
))

# The SDK intelligently groups items by partition and uses
# optimized backend queries (often IN clauses) to minimize
# network round trips and RU consumption

Performance Benefits:

  • Reduces network round trips by up to 90%
  • Lower RU consumption compared to individual reads
  • Intelligent query optimization based on partition distribution

Automatic Write Retries – Enhanced Resilience

Say goodbye to manual retry logic for write operations! The SDK now includes built-in retry capabilities for write operations that encounter transient failures.

# Enable retries at the client level
client = CosmosClient(
    endpoint, 
    key,
    connection_policy=ConnectionPolicy(retry_write=1)
)

# Or enable per-request
container.create_item(
    body=my_document,
    retry_write=1 # Automatic retry on timeouts/server errors
)

What Gets Retried:

  • Timeout errors (408)
  • Server errors (5xx status codes)
  • Transient connectivity issues

Smart Retry Logic:

  • Single-region accounts: One additional attempt to the same region
  • Multi-region accounts: Cross-regional failover capability
  • Patch operations require explicit opt-in due to potential non-idempotency

Enhanced Developer Experience

Client-Level Configuration Options

Custom User Agent: Identify your applications in telemetry:

# Set custom user agent suffix for better tracking
client = CosmosClient(
    endpoint, 
    key,
    user_agent_suffix="MyApplication/1.0"
)

Throughput Bucket Headers: Optimize performance monitoring (see here for more information on throughput buckets):

# Enable throughput bucket headers for detailed RU tracking
client = CosmosClient(
    endpoint, 
    key,
    throughput_bucket=2  # Set at client level
)

# Or set per request
container.create_item(
    body=document,
    throughput_bucket=2
)

Excluded Locations: Fine-tune regional preferences:

# Exclude specific regions at client level
client = CosmosClient(
    endpoint, 
    key,
    excluded_locations=["West US", "East Asia"]
)

# Or exclude regions for specific requests
container.read_item(
    item="item-id",
    partition_key="partition-key", 
    excluded_locations=["Central US"]
)

Return Properties with Container Operations

Streamline your workflows with the new return_properties parameter:

# Get both the container proxy and properties in one call
container, properties = database.create_container(
    id="MyContainer",
    partition_key=PartitionKey(path="/id"),
    return_properties=True
)

# Now you have immediate access to container metadata
print(f"Container RID: {properties['_rid']}")
print(f"Index Policy: {properties['indexingPolicy']}")

Feed Range Support in Queries

Unlock advanced parallel change feed processing capabilities:

# Get feed ranges for parallel processing
feed_ranges = container.get_feed_ranges()

# Query specific feed ranges for optimal parallelism
for feed_range in feed_ranges:
    items = container.query_items(
        query="SELECT * FROM c WHERE c.category = @category",
        parameters=[{"name": "@category", "value": "electronics"}],
        feed_range=feed_range
    )

Enhanced Change Feed: More flexible change feed processing:

# New change feed mode support for fine-grained control
change_feed_iter = container.query_items_change_feed(
    feed_range=feed_range,
    mode="Incremental",  # New mode support
    start_time=datetime.utcnow() - timedelta(hours=1)
)

Vector Embedding Policy Management

Enhanced support for AI workloads with vector embedding policy updates:

# Update indexing policy for containers with vector embeddings
indexing_policy = {
    "indexingMode": "consistent",
    "vectorIndexes": [
        {
            "path": "/vector",
            "type": "quantizedFlat"
        }
    ]
}

# Now you can replace indexing policies even when vector embeddings are present
container.replace_container(
    container=container_properties,
    indexing_policy=indexing_policy
)

Advanced Query Capabilities

Weighted RRF for Hybrid Search: Enhance your search relevance with Reciprocal Rank Fusion:

# Use weighted RRF in hybrid search queries
query = """
SELECT c.id, c.title, c.content 
FROM c 
WHERE CONTAINS(c.title, "machine learning") 
ORDER BY RRF(VectorDistance(c.embedding, @vector), 
             FullTextScore(c.content, "artificial intelligence"), 
             [0.7, 0.3])
"""

items = container.query_items(query=query, parameters=[
    {"name": "@vector", "value": search_vector}
])

Computed Properties (Now GA)

Computed Properties have graduated from preview to general availability:

# Define computed properties for efficient querying
computed_properties = [
    {
        "name": "lowerCaseName", 
        "query": "SELECT VALUE LOWER(c.name) FROM c"
    }
]

# Replace container with computed properties
container.replace_container(
    container=container_properties,
    computed_properties=computed_properties
)

# Query using computed properties for better performance
items = container.query_items(
    query="SELECT * FROM c WHERE c.lowerCaseName = 'john doe'"
)

Reliability and Performance Improvements

Advanced Session Management

The SDK now includes sophisticated session token management:

  • Automatically optimizes session tokens
  • Sends only relevant partition-local tokens for reads
  • Eliminates unnecessary session tokens for single-region writes
  • Improves performance and reduces request size

Circuit Breaker Support

Enable partition-level circuit breakers for enhanced fault tolerance:

import os

# Enable circuit breaker via environment variable
os.environ['AZURE_COSMOS_ENABLE_CIRCUIT_BREAKER'] = 'true'

# The SDK will automatically isolate failing partitions
# while keeping healthy partitions available

Enhanced Error Handling

More resilient retry logic with cross-regional capabilities.

Monitoring and Diagnostics

Enhanced Logging and Diagnostics

Automatic failover improvements:

  • Better handling of bounded staleness consistency
  • Cross-region retries when no preferred locations are set
  • Improved database account call resilience
import logging
from azure.cosmos import CosmosHttpLoggingPolicy # Set up enhanced logging logging.basicConfig(level=logging.INFO) client = CosmosClient( endpoint, key, logging_policy=CosmosHttpLoggingPolicy(logger=logging.getLogger()) )

The OpenAI Connection

Many of these features were developed in collaboration with OpenAI, who use Cosmos DB extensively for ChatGPT’s data storage needs. This partnership ensures our SDK can handle:

  • Massive Scale: Billions of operations per day
  • Low Latency: Sub-10ms response times for AI workloads
  • High Availability: 99.999% uptime requirements
  • Global Distribution: Seamless worldwide data replication

When you use the Python SDK for Azure Cosmos DB, you’re leveraging the same technology that powers some of the world’s most advanced AI applications.

Real-World Impact

Performance Benchmarks

Based on testing with synthetic workloads:

  • Read Many Items: Up to 85% reduction in latency for batch retrieval scenarios
  • Write Retries: 99.5% reduction in transient failure impact
  • Session Optimization: 60% reduction in session token overhead
  • Circuit Breaker: 90% faster recovery from partition-level failures

Cost Optimization

  • Reduced RU Consumption: Batch operations can reduce costs by up to 40%
  • Fewer Network Calls: Significant bandwidth savings in high-throughput scenarios
  • Optimized Retries: Intelligent retry logic prevents unnecessary RU charges

Breaking Changes (Important!)

If you have been using the beta versions of Python SDK (since the last stable version 4.9.0) there is one breaking change:

Changed retry_write Parameter Type

# Before (4.13.x and earlier)
retry_write = True  # boolean

# After (4.14.0)
retry_write = 3  # integer (number of retries)

This change aligns with other retry configuration options and provides more granular control.

Migration Guide

Upgrading from any beta higher than 4.9.0 to 4.14.0

  1. Update your dependencies:

    pip install azure-cosmos==4.14.0
    
  2. Update retry_write usage (if applicable):

    # Old way
    client = CosmosClient(endpoint, key, retry_write=True)
    
    # New way  
    client = CosmosClient(endpoint, key, retry_write=3)
    
  3. Leverage new features (optional but recommended):

    • Take advantage of read_items for batch operations

    • Enable automatic write retries for resilience

    • Use return_properties to reduce API calls

What’s Next?

This release establishes the foundation for even more exciting AI-focused features coming in future versions:

  • Enhanced vector search capabilities
  • Advanced semantic search integration
  • Expanded AI inference service integrations
  • Performance optimizations for RAG patterns

Additional Resources

Get Involved

Have feedback or questions? We’d love to hear from you!


Ready to upgrade? Install Azure Cosmos DB Python SDK v4.14.0 today and experience the power of AI-enhanced database operations!

pip install --upgrade azure-cosmos==4.14.0

The future of AI-powered applications starts with the right data foundation. With the latest Cosmos DB Python SDK, you have the tools to build intelligent, scalable, and resilient applications that can handle anything the world throws at them.

Author

Theo van Kraay
Principal Program Manager

Principal Program Manager on the Azure Cosmos DB engineering team. Currently focused on AI, programmability, and developer experience for Azure Cosmos DB.

0 comments