I’m happy to announce that SQL Server 2025 CTP 2.1 is now available, and it brings significant improvements to DiskANN support. While DiskANN remains in preview, each release continues to remove limitations and boost performance. In this release, we’ve made a particularly notable leap in vector index creation speed, with more enhancements on the horizon.
What’s New in DiskANN
DiskANN is Microsoft’s algorithm for large-scale vector search and recommendation systems. It’s designed to scale to web-sized datasets while maintaining high recall and performance. With SQL Server 2025, DiskANN is fully integrated into the engine, allowing developers to use familiar T-SQL syntax to build intelligent, AI-powered applications.
Key improvements for DiskANN in this release include:
- Much faster vector DiskANN index creation
- No more SCH-M lock during index creation
- New sys.vector_indexes catalog view
- Support for updated TDS 8.0 protocol to have vector data sent and received in binary format. (Drivers to take advantage of this will follow soon)
To showcase DiskANN end-to-end, I created a sample based on a dataset originally built by our friends on the Postgres team. Thanks to the developer experience improvements in SQL Server 2025, porting the sample took me less than 30 minutes. Along the way, I also got to highlight some of the new features:
- External Models
- Azure AI Integration
- Invoking REST Endpoints
- Native JSON Data Type
One of my favorite tricks is using sp_invoke_external_rest_endpoint
to download JSON Lines files directly from GitHub. It’s a small thing, but it really shows how SQL Server is evolving into a modern, developer-friendly platform: give me a REST endpoint and I’ll move the world!
Exact vs Approximate Vector Search
The sample also gave me a great opportunity to talk about a common question: when should you use Exact Search (ENN) versus Approximate Search (ANN)?
In small datasets (like the one in the sample, with fewer than 10,000 rows), Exact Search performs just as well as Approximate Search. But as your dataset grows, ANN becomes essential. A good rule of thumb: if you’re working with more than 50,000 vectors after applying predicates, ANN will likely give you better performance without sacrificing too much accuracy.
SQL Server supports both ENN and ANN. And thanks to its sophisticated query processor, ENN can be heavily optimized when predicates help reduce the number of vectors to scan.
Try It Yourself
You can explore the public preview of DiskANN and try the full sample here: Azure SQL DiskANN Sample
Let me know what you think—and stay tuned for more updates as we continue to evolve SQL Server engine into the most enterprise-ready, developer-friendly, AI-integrated database platform yet.
0 comments
Be the first to start the discussion.