In today’s data-driven world, delivering precise and contextually relevant search results is critical. SQL Server and Azure SQL Database now enable this through Hybrid Search—a technique that combines traditional full-text search with modern vector similarity search. This allows developers to build intelligent, AI-powered search experiences directly inside the database engine.
Funny enough one of the key algorithms needed for implementing Hybrid Search, BM25, has been included in SQL Server is almost forever, without much fanfare: RAG pattern and similarity search is bringing it to the light.
What Is Hybrid Search?
Hybrid Search blends two key search paradigms:
-
Full-Text Search (FTS): Uses the
FREETEXTTABLE
function, which ranks results using the BM25 algorithm. This provides high-quality keyword-based relevance scoring. -
Vector Search: Introduced via the new
VECTOR
data type andVECTOR_DISTANCE()
function, vector search enables semantic matching of embeddings—ideal for scenarios involving natural language queries.
By combining both, you can return results that are relevant not only because of exact term matches but also because of their underlying meaning.
Re-Ranking with RRF
Once you’ve run both search strategies, you can improve the final result list by applying Reciprocal Rank Fusion (RRF)—a simple, effective method for combining ranked lists from different sources.
While SQL Server and Azure SQL don’t offer a native RRF()
function, RRF is straightforward to implement manually with basic SQL logic. The formula is simple:
1 / (rank + k)
Where rank is the position of the item in its respective list, and k
is a constant (commonly 60, though it can be tuned for your scenario, I like to set it equal to the k
of requested results)
This makes it easy to write RRF logic directly in SQL, without needing any custom functions or extensions. You can assign RRF scores to each document using a SELECT statement, then sort by the final combined score.
Sample Implementation
You can find a working example of Hybrid Search with manual RRF in the Azure SQL + OpenAI samples repo. Here’s a simplified version of the process:
-- Keyword (Fulltext) search
SELECT TOP(@k)
id,
RANK() OVER (ORDER BY ft_rank DESC) AS rank,
INTO
#keyword_search
FROM
(
SELECT TOP(@k)
id,
ftt.[RANK] AS ft_rank,
FROM
dbo.wikipedia_articles_embeddings w
INNER JOIN
FREETEXTTABLE(dbo.wikipedia_articles_embeddings, *, @q) AS ftt ON w.id = ftt.[KEY]
ORDER BY
ft_rank DESC
) AS freetext_documents
ORDER BY
rank ASC
-- Semantic search
SELECT TOP(@k)
id,
RANK() OVER (ORDER BY cosine_distance) AS rank
INTO
#semantic_search
FROM (
SELECT TOP(@k)
id,
VECTOR_DISTANCE('cosine', @e, content_vector_ada2) AS cosine_distance
FROM
dbo.wikipedia_articles_embeddings w
ORDER BY
cosine_distance
) AS similar_documents
-- Reciprocal Rank Fusion (RRF)
SELECT TOP(@k)
COALESCE(ss.id, ks.id) AS id,
ss.rank AS semantic_rank,
ks.rank AS keyword_rank,
COALESCE(1.0 / (@k + ss.rank), 0.0) +
COALESCE(1.0 / (@k + ks.rank), 0.0) AS score
FROM
#semantic_search ss
FULL OUTER JOIN
#keyword_search ks ON ss.id = ks.id
ORDER BY
score DESC
This approach gives more weight to results that rank highly in either method, ensuring that strong keyword matches or highly semantically similar items surface at the top.
Practical Benefits of Hybrid Search + RRF
By combining full-text and vector-based search, and applying RRF for re-ranking, you’re able to build smarter and more flexible search solutions directly within SQL Server or Azure SQL. This approach is not only technically efficient but also developer-friendly and highly adaptable to real-world use cases.
The Reciprocal Rank Fusion (RRF) algorithm was introduced in the 2009 paper Reciprocal Rank Fusion Outperforms Condorcet and Individual Rank Learning Methods by Cormack et al. It’s a simple yet powerful method to merge ranked result lists from different sources, giving each source a fair chance to influence the final ranking.
Here are some key reasons why this pattern is worth adopting:
- No External Tools Required: Everything runs within SQL Server or Azure SQL—no need to call out to a vector database or separate search engine.
- Simple to Implement: Even without a native RRF function, the scoring logic is minimal and easy to express in SQL.
- More Relevant Results: Hybrid search enhances both precision and recall by blending lexical and semantic signals.
- Cost-Effective: Leveraging built-in database features reduces architecture complexity and operational costs.
- Scalable: Because it all happens within the database engine, performance scales predictably with your data and infrastructure.
Final Thoughts
Hybrid Search with BM25, vector similarity, and RRF-style re-ranking unlocks a new level of intelligence in your SQL-based search applications. With native support for full-text and vector queries, and easy-to-implement ranking logic, you can build powerful, production-ready search experiences using just SQL.
Give it a try with the sample code and explore how to take your application’s search capabilities to the next level—without ever leaving the database.
0 comments
Be the first to start the discussion.