Semantic Reranking in Azure SQL and SQL Server Explained

Supporting re‑ranking has been one of the most common requests lately. While not always essential, it can be a valuable addition to a solution when you want to improve the precision of your results. Unfortunately, there isn’t a universal, standardized API for a “re‑rank” call across providers, so the most reliable approach today is to issue a manual REST request and build the payload according to the documentation of the re‑ranker you choose.

How a Re-ranking Model Improves Retrieval

Vector search is excellent for quickly finding likely matches, but it can still surface items that aren’t the best answer. A re‑ranker, typically a cross‑encoder, takes your query and each candidate document and scores them for semantic relevance against the search request issued by the user, then sorts the list so the most useful items rise to the top. This two‑stage pattern, retrieve and then re‑rank, can significantly enhance RAG pipelines and enterprise search by improving relevance and reducing noise. Joe Sack wrote a great article about this here: “From 15th Place to Gold Medal” if you want to learn more.

Azure SQL DB Vector Search Sample

The Azure Samples repository for Azure SQL DB Vector Search demonstrates how to build retrieval with native vector functions, hybrid search, EF Core/SqlClient usage, and more, giving you the first stage retrieval that produces candidates to feed a re‑ranker. You can plug any re‑ranker behind it via a REST call. I have updated the existing DiskANN sample to include reranking and I have also have created a completely new example using the SemanticShoresDB sample database (which has been created by Joe too!)

SemanticShoresDB Reranking Sample: Uses Joe Sack’s sample database for a more realistic dataset. His excellent post, From 15th Place to Gold Medal, explains why semantic ranking matters and how it can dramatically improve relevance.
Wikipedia Reranking Sample: Provides a simple kickstart and demonstrates a full end-to-end scenario that combines hybrid search (vector search plus full-text search) with semantic re‑ranking.

Making the Re-rank REST call

Cohere’s Rerank models, also available through Azure AI Foundry, accept a query and a list of documents and return each item with a relevance score and a re‑ordered list. The essence of the payload looks like this:

{
  "model": "rerank-v3.5",
  "query": "Reset my SQL login password",
  "documents": [
    { "text": "How to change a password in Azure SQL Database..." },
    { "text": "Troubleshooting login failures in SQL Server..." },
    { "text": "Granting permissions to users..." }
  ],
  "top_n": 3
}

The response contains results with index and relevance_score, which you then use to reorder your candidate set before building the final context for your RAG answer. If you want to pass more than just text to the re-ranker, say for example you want to pass the Id of a product or an article in addition to the description, you need to pass everything as YAML. So, YAML inside JSON. An interesting approach if you ask me 🤣:

{
  "model": "Cohere-rerank-v4.0-fast",
  "query": "cozy bungalow with original hardwood and charm",
  "top_n": 3,
  "documents": [
    "Id: 48506\nContent: <text>",
    "Id: 29903\nContent: <text>",
    "Id: 12285\nContent: <text>"
  ]
}

One of the most powerful recent updates to the SQL engine is its ability to manipulate JSON and strings directly within T-SQL. This is helpful for building the payload required to communicate with models like Cohere’s Rerank. Thanks to features like REGEXP_SUBSTR, JSON Path expressions, and the string concatenation operator ||, constructing complex JSON or YAML structures is now straightforward and efficient. These capabilities allow you to dynamically assemble the query, documents, and parameters for the REST call without leaving the database context, making integration with external AI services much easier.

Interpreting the returned result from Cohere model, once the REST call is done via sp_invoke_external_rest_endpoint is also an interesting challenge as the returned results are positional based:

"results": [
    {
      "index": 4,
      "relevance_score": 0.812034
    },
    {
      "index": 0,
      "relevance_score": 0.8075214
    },
    {
      "index": 1,
      "relevance_score": 0.80415994
    }
  ],

which means that the new ability of SQL Server 2025 and Azure SQL to extract any specific item from a JSON array comes very handy and simple:

SELECT 
    -- Use REGEXP_SUBSTR to extract the ID from the document text
    CAST(REGEXP_SUBSTR(
        JSON_VALUE(@documents, '$[' || [index] || ']'),  -- Get nth document using JSON path
        'Id: (\d*)\n', 1, 1, '', 1  -- Extract the numeric ID
    ) AS INT) AS property_id,
    *
FROM 
    OPENJSON(@response, '$.result.results')
    WITH (
        [index] INT,
        [relevance_score] DECIMAL(18,10)
    )

Putting It All Together

The general approach is simple: retrieve candidates in Azure SQL using vector functions and optionally combine with full-text for hybrid retrieval, call your chosen re‑ranker via REST with the query and documents, reorder based on the returned relevance scores, and finally assemble the grounded context for your LLM. While not mandatory, adding re‑ranking can be a valuable enhancement to improve precision and deliver more relevant answers.

Check out the GitHub repo here: https://github.com/Azure-Samples/azure-sql-db-vector-search to start evaluating adoption of semantic re-ranking in your solutions.