We are excited to announce a significant enhancement to Azure Cosmos DB, bringing substantial cost savings and performance improvements to our users. The new binary encoding feature is now available for new containers and will soon be available for existing ones.
What is Binary Encoding?
Binary encoding converts JSON or other text-based data formats into a compact binary representation. Instead of storing data as human-readable text, it stores it as a sequence of bytes, which takes up less space and can be processed more quickly by computers.
Key Benefits
- Storage Savings: With binary encoding, you can expect an average 20% reduction in documents size, but up to 70% for large documents with a substantial number of nested objects and arrays. This means you can store more data without increasing your storage.
- Enhanced Performance for Large Queries: The new binary encoding feature is especially beneficial for queries with aggregations or those that return large result sets, where performance improvements are most noticeable. This makes it an ideal solution for applications that demand high throughput and low latency.
When will it be available
Binary-encoding is automatically enabled for all new Azure Cosmos DB containers – for both new and existing database accounts. Existing containers will be automatically re-encoded over several months, starting in 2025. It is expected that customers monitoring their containers sizes to see storage metrics decrease when re-encoded. No action is required from users; and there will be no database service interruptions. You will receive a follow-up email notification for when existing containers are to be re-encoded.
Performance Improvements
All existing SDKs are supported, and no application changes are required. However, there are incremental performance benefits when using new SDKs. The next versions of our SDKs will start to implement optimizations that will extend performance improvements, stay tuned in our next releases. You can check the impact of binary encoding in some query examples in the table below.
Conclusion
We are thrilled to bring this new feature to our Azure Cosmos DB users. The binary encoding feature is part of our larger commitment to provide the best possible performance and cost-efficiency for our customers. Customers can achieve even greater perf/cost efficiency when combining this with Reserved Capacity
Stay tuned for more updates and enhancements as we continue to innovate and improve Azure Cosmos DB.
FAQ
Question 1: How does this impact Azure Cosmos DB’s document size limit?
Answer: Azure Cosmos DB will encode your documents before checking its size, meaning that documents bigger than 2 MB may be ingested because their encoded size is smaller than 2 MB.
Question 2: Is there a benefit for Write operations?
Answer: Yes. The smaller your documents are, the fewer RU/s are used.
Question 3: In the future, after my data is encoded, will I be able to reduce RU/s?
Answer: Probably. You can check your new RU/s and GBs usage and, aligned with Merge, you will be able to reduce the provisioned Throughput.
Question 4: Is there any change in the format of the data?
Answer: Yes. Trailing zeros are removed, so 1.0 becomes 1 as it was an integer. And integers beyond integer 64-bit limit will be converted to double precision floating point. If you don’t want this conversion, you can represent those huge integers as string and then convert them into a computed column in your queries.
Question 5: Which SDKs provide the best performance optimization?
Answer: Next releases of .NET and JAVA SDKs will provide the best performance. Currently, the last .NET SDK version has implemented partial improvements. Please keep track of the next releases to check the implementation of binary encoding optimizations. Other languages SDKs will implement optimizations in the future, please check the roadmap of them.
Leave a review
Tell us about your Azure Cosmos DB experience! Leave a review on PeerSpot and we’ll gift you $50. Get started here.
About Azure Cosmos DB
Azure Cosmos DB is a fully managed and serverless NoSQL and vector database for modern app development, including AI applications. With its SLA-backed speed and availability as well as instant dynamic scalability, it is ideal for real-time NoSQL and MongoDB applications that require high performance and distributed computing over massive volumes of NoSQL and vector data.
Try Azure Cosmos DB for free here. To stay in the loop on Azure Cosmos DB updates, follow us on X, YouTube, and LinkedIn.
Is there any tool that could be used to estimate the storage savings by e.g. giving an existing JSON document?
Hello! Not right now. But you can create a new collection and ingest a document and check the compression. Also, the benefits for queries also require a test. Tks