{"id":10732,"date":"2025-07-17T09:50:15","date_gmt":"2025-07-17T16:50:15","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cosmosdb\/?p=10732"},"modified":"2025-07-17T09:50:15","modified_gmt":"2025-07-17T16:50:15","slug":"build-reliable-go-applications-configuring-azure-cosmos-db-go-sdk-for-real-world-scenarios","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cosmosdb\/build-reliable-go-applications-configuring-azure-cosmos-db-go-sdk-for-real-world-scenarios\/","title":{"rendered":"Build reliable Go applications: Configuring Azure Cosmos DB Go SDK for real-world scenarios"},"content":{"rendered":"<p>When building applications that interact with databases, developers frequently encounter scenarios where default SDK configurations don&#8217;t align with their specific operational requirements. They need to customize SDK behavior to address real-world challenges like network instability, performance bottlenecks, debugging complexity, monitoring requirements, and more. These factors become even more pronounced when working with a massively scalable, cloud-native, distributed database like Azure Cosmos DB.<\/p>\n<p>This blog post explores how to customize and configure the\u00a0<a href=\"https:\/\/pkg.go.dev\/github.com\/Azure\/azure-sdk-for-go\/sdk\/data\/azcosmos\" data-href=\"https:\/\/pkg.go.dev\/github.com\/Azure\/azure-sdk-for-go\/sdk\/data\/azcosmos\">Go SDK for Azure Cosmos DB<\/a>\u00a0beyond its default settings, covering techniques for modifying client behavior, implementing custom policies, accessing operational metrics, etc. These enable developers to build more resilient applications, troubleshoot issues effectively, and gain deeper insights into their database interactions.<\/p>\n<p>The Go SDK for Azure Cosmos DB\u00a0is built on top of the\u00a0<a href=\"https:\/\/pkg.go.dev\/github.com\/Azure\/azure-sdk-for-go\/sdk\/azcore\" target=\"_blank\" rel=\"noopener noreferrer\">core Azure Go SDK package<\/a>, which implements several patterns that are applied throughout the SDK. The core SDK is designed to be quite customizable, and its configurations can be applied with the\u00a0<a href=\"https:\/\/pkg.go.dev\/github.com\/Azure\/azure-sdk-for-go\/sdk\/data\/azcosmos#ClientOptions\" target=\"_blank\" rel=\"noopener noreferrer\">ClientOptions<\/a> struct when creating a new Azure Cosmos DB client object using <a href=\"https:\/\/pkg.go.dev\/github.com\/Azure\/azure-sdk-for-go\/sdk\/data\/azcosmos#NewClient\" target=\"_blank\" rel=\"noopener noreferrer\">NewClient<\/a> (and other similar functions). If you peek inside the\u00a0<a href=\"https:\/\/pkg.go.dev\/github.com\/Azure\/azure-sdk-for-go\/sdk\/azcore\/policy#ClientOptions\" target=\"_blank\" rel=\"noopener noreferrer\">azcore.ClientOptions<\/a> struct, you will notice that it has many options for configuring the HTTP client, retry policies, timeouts, and other settings. In this blog, we will cover how to make use of (and extend) these common options when building applications with the Go SDK for Azure Cosmos DB.<\/p>\n<p>Let&#8217;s dive into how to make use of (and extend) these common options when building Go applications with Azure Cosmos DB.<\/p>\n<p><div class=\"alert alert-primary\">I have provided code snippets throughout this blog. Refer to this <a href=\"https:\/\/github.com\/abhirockzz\/cosmosdb-go-sdk-config-examples\">GitHub repository<\/a> for runnable examples.<\/div><\/p>\n<h2>Retry policies<\/h2>\n<p>Common retry scenarios are handled in the SDK. Here is a summary of errors for which retries are attempted:<\/p>\n<table style=\"width: 87.5617%; height: 163px;\">\n<thead>\n<tr style=\"height: 23px;\">\n<th style=\"width: 44.9259%; height: 23px;\"><strong>Error Type \/ Status Code<\/strong><\/th>\n<th style=\"width: 54.1619%; height: 23px;\"><strong>Retry Logic<\/strong><\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr style=\"height: 47px;\">\n<td style=\"width: 44.9259%; height: 47px; text-align: center;\"><strong>Network Connection Errors<\/strong><\/td>\n<td style=\"width: 54.1619%; height: 47px; text-align: center;\">Retry after marking endpoint unavailable and waiting for\u00a0<code>defaultBackoff<\/code>.<\/td>\n<\/tr>\n<tr style=\"height: 47px;\">\n<td style=\"width: 44.9259%; height: 47px; text-align: center;\"><strong>403 Forbidden<\/strong>\u00a0(with specific substatuses)<\/td>\n<td style=\"width: 54.1619%; height: 47px; text-align: center;\">Retry after marking endpoint unavailable and updating the endpoint manager.<\/td>\n<\/tr>\n<tr style=\"height: 23px;\">\n<td style=\"width: 44.9259%; height: 23px; text-align: center;\"><strong>404 Not Found<\/strong>\u00a0(specific substatus)<\/td>\n<td style=\"width: 54.1619%; height: 23px; text-align: center;\">Retry by switching to another session or endpoint.<\/td>\n<\/tr>\n<tr style=\"height: 23px;\">\n<td style=\"width: 44.9259%; height: 23px; text-align: center;\"><strong>503 Service Unavailable<\/strong><\/td>\n<td style=\"width: 54.1619%; height: 23px; text-align: center;\">Retry by switching to another preferred location.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><div class=\"alert alert-success\">You can explore the source code in <a href=\"https:\/\/github.com\/Azure\/azure-sdk-for-go\/blob\/main\/sdk\/data\/azcosmos\/cosmos_client_retry_policy.go\">cosmos_client_retry_policy.go<\/a> if you want to see the details of how the retry policy is implemented.<\/div><\/p>\n<p>The upcoming sections demonstrate some of these in action.<\/p>\n<h3><a href=\"https:\/\/dev.to\/abhirockzz\/how-to-configure-and-customize-the-go-sdk-for-azure-cosmos-db-97a#nonretriable-errors\" name=\"nonretriable-errors\"><\/a>Non-Retriable Errors<\/h3>\n<p>When a request fails with a non-retriable error, the SDK does not retry the operation. This is useful for scenarios where the error indicates that the operation cannot succeed.<\/p>\n<p>For example, here is a function that tries to read a database that does not exist.<\/p>\n<pre class=\"prettyprint language-go\"><code class=\"language-go\">func retryPolicy1() {\r\n\r\n    c, err := auth.GetClientWithDefaultAzureCredential(\"https:\/\/demodb.documents.azure.com:443\/\", nil)\r\n\r\n    if err != nil {\r\n        log.Fatal(err)\r\n    }\r\n\r\n    azlog.SetListener(func(cls azlog.Event, msg string) {\r\n\r\n        \/\/ Log retry-related events\r\n        switch cls {\r\n        case azlog.EventRetryPolicy:\r\n            fmt.Printf(\"Retry Policy Event: %s\\n\", msg)\r\n\r\n        }\r\n    })\r\n\r\n    \/\/ Set logging level to include retries\r\n    azlog.SetEvents(azlog.EventRetryPolicy)\r\n\r\n    db, err := c.NewDatabase(\"i_dont_exist\")\r\n\r\n    if err != nil {\r\n        log.Fatal(\"NewDatabase call failed\", err)\r\n    }\r\n    _, err = db.Read(context.Background(), nil)\r\n\r\n    if err != nil {\r\n        log.Fatal(\"Read call failed: \", err)\r\n    }\r\n\r\n}<\/code><\/pre>\n<p>The\u00a0<code>azcore<\/code>\u00a0logging implementation is configured using\u00a0<code>SetListener<\/code>\u00a0and\u00a0<code>SetEvents<\/code> to write retry policy event logs to standard output. Refer to the <a href=\"https:\/\/pkg.go.dev\/github.com\/Azure\/azure-sdk-for-go\/sdk\/data\/azcosmos#readme-logging\">Logging<\/a> section in azcosmos package README for details.<\/p>\n<p><div class=\"alert alert-primary\">The <code>auth.GetEmulatorClientWithAzureADAuth<\/code> function is part of the <a href=\"https:\/\/github.com\/abhirockzz\/cosmosdb-go-sdk-helper\">cosmosdb-go-sdk-helper<\/a> package.<\/div><\/p>\n<p>Here are the logs from code execution:<\/p>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">\/\/....\r\nRetry Policy Event: exit due to non-retriable status code\r\nRetry Policy Event: =====&gt; Try=1 for GET https:\/\/demodb.documents.azure.com:443\/dbs\/i_dont_exist\r\nRetry Policy Event: response 404\r\nRetry Policy Event: exit due to non-retriable status code\r\nRead call failed: GET https:\/\/demodb-region.documents.azure.com:443\/dbs\/i_dont_exist\r\n--------------------------------------------------------------------------------\r\nRESPONSE 404: 404 Not Found\r\nERROR CODE: 404 Not Found\r\n\/\/...<\/code><\/pre>\n<p>When a request is made to read a non-existent database, the SDK gets a\u00a0<strong>404<\/strong>\u00a0(not found) response for the database. This is recognized as a non-retriable error and the SDK stops retrying. Retries are only performed for retriable errors (like network issues or certain status codes). The operation failed because the database does not exist.<\/p>\n<h3><a href=\"https:\/\/dev.to\/abhirockzz\/how-to-configure-and-customize-the-go-sdk-for-azure-cosmos-db-97a#retriable-errors-invalid-account\" name=\"retriable-errors-invalid-account\"><\/a>Retriable Errors<\/h3>\n<p>When a request fails with a retriable error, the SDK automatically retries the operation based on the retry policy. This is useful for transient errors that may resolve themselves after a few attempts.<\/p>\n<p>This function tries to create a Azure Cosmos DB client using an invalid account endpoint. It sets up logging for retry policy events and attempts to create a database.<\/p>\n<pre class=\"prettyprint language-go\"><code class=\"language-go\">func retryPolicy2() {\r\n\r\n    c, err := auth.GetClientWithDefaultAzureCredential(\"https:\/\/iamnothere.documents.azure.com:443\/\", nil)\r\n\r\n    if err != nil {\r\n        log.Fatal(err)\r\n    }\r\n\r\n    azlog.SetListener(func(cls azlog.Event, msg string) {\r\n\r\n        \/\/ Log retry-related events\r\n        switch cls {\r\n        case azlog.EventRetryPolicy:\r\n            fmt.Printf(\"Retry Policy Event: %s\\n\", msg)\r\n\r\n        }\r\n    })\r\n\r\n    \/\/ Set logging level to include retries\r\n    azlog.SetEvents(azlog.EventRetryPolicy)\r\n\r\n    _, err = c.CreateDatabase(context.Background(), azcosmos.DatabaseProperties{ID: \"test\"}, nil)\r\n    if err != nil {\r\n        log.Fatal(err)\r\n    }\r\n\r\n}<\/code><\/pre>\n<p>In the logs, you can see show how the SDK handles retries when the endpoint is unreachable:<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">\/\/....\r\nRetry Policy Event: error Get \"https:\/\/iamnothere.documents.azure.com:443\/\": dial tcp: lookup iamnothere.documents.azure.com: no such host\r\nRetry Policy Event: End Try #1, Delay=682.644105ms\r\nRetry Policy Event: =====&gt; Try=2 for GET https:\/\/iamnothere.documents.azure.com:443\/\r\nRetry Policy Event: error Get \"https:\/\/iamnothere.documents.azure.com:443\/\": dial tcp: lookup iamnothere.documents.azure.com: no such host\r\nRetry Policy Event: End Try #2, Delay=2.343322179s\r\nRetry Policy Event: =====&gt; Try=3 for GET https:\/\/iamnothere.documents.azure.com:443\/\r\nRetry Policy Event: error Get \"https:\/\/iamnothere.documents.azure.com:443\/\": dial tcp: lookup iamnothere.documents.azure.com: no such host\r\nRetry Policy Event: End Try #3, Delay=7.177314269s\r\nRetry Policy Event: =====&gt; Try=4 for GET https:\/\/iamnothere.documents.azure.com:443\/\r\nRetry Policy Event: error Get \"https:\/\/iamnothere.documents.azure.com:443\/\": dial tcp: lookup iamnothere.documents.azure.com: no such host\r\nRetry Policy Event: MaxRetries 3 exceeded\r\nfailed to retrieve account properties: Get \"https:\/\/iamnothere.docume\r\n<\/code><\/pre>\n<p>Each failed attempt is logged, and the SDK retries the operation several times (<strong>three<\/strong>\u00a0times to be specific), with increasing delays between attempts. After exceeding the maximum number of retries, the operation fails with an error indicating the host could not be found &#8211; the SDK automatically retries transient network errors before giving up.<\/p>\n<p>But you don&#8217;t have to stick to the default retry policy. You can customize the retry policy by setting the\u00a0<code>azcore.ClientOptions<\/code> when creating the Azure Cosmos DB client.<\/p>\n<h3>Configurable Retries<\/h3>\n<p>Let&#8217;s say you want to set a custom retry policy with a maximum of\u00a0<strong>two<\/strong>\u00a0retries and a delay of\u00a0<strong>one second<\/strong>\u00a0between retries. You can do this by creating a\u00a0<code>policy.RetryOptions<\/code>\u00a0struct and passing it to the\u00a0<code>azcosmos.ClientOptions<\/code>\u00a0when creating the client.<\/p>\n<pre class=\"prettyprint language-go\"><code class=\"language-go\">func retryPolicy3() {\r\n\r\n    retryPolicy := policy.RetryOptions{\r\n        MaxRetries: 2,\r\n        RetryDelay: 1 * time.Second,\r\n    }\r\n\r\n    opts := azcosmos.ClientOptions{\r\n        ClientOptions: policy.ClientOptions{\r\n            Retry: retryPolicy,\r\n        },\r\n    }\r\n\r\n    c, err := auth.GetClientWithDefaultAzureCredential(\"https:\/\/iamnothere.documents.azure.com:443\/\", &amp;opts)\r\n\r\n    if err != nil {\r\n        log.Fatal(err)\r\n    }\r\n\r\n    log.Println(c.Endpoint())\r\n\r\n    azlog.SetListener(func(cls azlog.Event, msg string) {\r\n\r\n        \/\/ Log retry-related events\r\n        switch cls {\r\n        case azlog.EventRetryPolicy:\r\n            fmt.Printf(\"Retry Policy Event: %s\\n\", msg)\r\n\r\n        }\r\n    })\r\n\r\n    azlog.SetEvents(azlog.EventRetryPolicy)\r\n\r\n    _, err = c.CreateDatabase(context.Background(), azcosmos.DatabaseProperties{ID: \"test\"}, nil)\r\n    if err != nil {\r\n        log.Fatal(err)\r\n    }\r\n\r\n}<\/code><\/pre>\n<p>Each failed attempt is logged, and the SDK retries the operation according to the custom policy \u2014 only\u00a0<strong>two<\/strong>\u00a0retries, with a\u00a0<strong>1-second<\/strong>\u00a0delay after the first attempt and a longer delay after the second. After reaching the maximum number of retries, the operation fails with an error indicating the host could not be found.<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">Retry Policy Event: =====&gt; Try=1 for GET https:\/\/iamnothere.documents.azure.com:443\/\r\n\/\/....\r\nRetry Policy Event: error Get \"https:\/\/iamnothere.documents.azure.com:443\/\": dial tcp: lookup iamnothere.documents.azure.com: no such host\r\nRetry Policy Event: End Try #1, Delay=1.211970493s\r\nRetry Policy Event: =====&gt; Try=2 for GET https:\/\/iamnothere.documents.azure.com:443\/\r\nRetry Policy Event: error Get \"https:\/\/iamnothere.documents.azure.com:443\/\": dial tcp: lookup iamnothere.documents.azure.com: no such host\r\nRetry Policy Event: End Try #2, Delay=3.300739653s\r\nRetry Policy Event: =====&gt; Try=3 for GET https:\/\/iamnothere.documents.azure.com:443\/\r\nRetry Policy Event: error Get \"https:\/\/iamnothere.documents.azure.com:443\/\": dial tcp: lookup iamnothere.documents.azure.com: no such host\r\nRetry Policy Event: MaxRetries 2 exceeded\r\nfailed to retrieve account properties: Get \"https:\/\/iamnothere.documents.azure.com:443\/\": dial tcp: lookup iamnothere.documents.azure.com: no such host\r\nexit status 1<\/code><\/pre>\n<p><div class=\"alert alert-info\">Note: The first attempt is not counted as a retry, so the total number of attempts is three (1 initial + 2 retries). <\/div><\/p>\n<\/div>\n<\/div>\n<div>\n<h3><a href=\"https:\/\/dev.to\/abhirockzz\/how-to-configure-and-customize-the-go-sdk-for-azure-cosmos-db-97a#fault-injection\" name=\"fault-injection\"><\/a>Fault Injection<\/h3>\n<p>You can customize this further by creating custom policies to inject faults into the request pipeline. This is useful for testing how your application handles various error scenarios without needing to rely on actual network failures or service outages.<\/p>\n<p>For example, you can create a custom policy that injects a fault into the request pipeline. Here, we use a custom policy (<code>FaultInjectionPolicy<\/code>) that simulates a network error on every request.<\/p>\n<pre class=\"prettyprint language-go\"><code class=\"language-go\">type FaultInjectionPolicy struct {\r\n    failureProbability float64 \/\/ e.g., 0.3 for 30% chance to fail\r\n}\r\n\r\n\/\/ Implement the Policy interface\r\nfunc (f *FaultInjectionPolicy) Do(req *policy.Request) (*http.Response, error) {\r\n    if rand.Float64() &lt; f.failureProbability {\r\n        \/\/ Simulate a network error\r\n        return nil, &amp;net.OpError{\r\n            Op:  \"read\",\r\n            Net: \"tcp\",\r\n            Err: errors.New(\"simulated network failure\"),\r\n        }\r\n    }\r\n    \/\/ no failure - continue with the request\r\n    return req.Next()\r\n}<\/code><\/pre>\n<\/div>\n<div class=\"highlight js-code-highlight\">\n<p>This function configures the Azure Cosmos DB client to use this policy, sets up logging for retry events, and attempts to create a database.<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"prettyprint language-go\"><code class=\"language-go\">func retryPolicy4() {\r\n\r\n    opts := azcosmos.ClientOptions{\r\n        ClientOptions: policy.ClientOptions{\r\n            PerRetryPolicies: []policy.Policy{&amp;FaultInjectionPolicy{failureProbability: 0.6}},\r\n        },\r\n    }\r\n\r\n    c, err := auth.GetClientWithDefaultAzureCredential(\"https:\/\/ACCOUNT_NAME.documents.azure.com:443\/\", &amp;opts) \/\/ Updated to use opts\r\n\r\n    if err != nil {\r\n        log.Fatal(err)\r\n    }\r\n\r\n    azlog.SetListener(func(cls azlog.Event, msg string) {\r\n\r\n        \/\/ Log retry-related events\r\n        switch cls {\r\n        case azlog.EventRetryPolicy:\r\n            fmt.Printf(\"Retry Policy Event: %s\\n\", msg)\r\n\r\n        }\r\n    })\r\n\r\n    \/\/ Set logging level to include retries\r\n    azlog.SetEvents(azlog.EventRetryPolicy)\r\n\r\n    _, err = c.CreateDatabase(context.Background(), azcosmos.DatabaseProperties{ID: \"test_1\"}, nil)\r\n    if err != nil {\r\n        log.Fatal(err)\r\n    }\r\n\r\n}\r\n<\/code><\/pre>\n<p>Take a look at the logs generated when this code is run &#8211; each request attempt fails due to the simulated network error. The SDK logs each retry, with increasing delays between attempts. After reaching the maximum number of retries (default = 3), the operation fails with an error indicating a simulated network failure.<\/p>\n<p><div class=\"alert alert-primary\">This can change depending on the failure probability you set in the <code>FaultInjectionPolicy<\/code>. In this case, we set it to 0.6 (60% chance of failure), so you may see different results each time you run the code. <\/div><\/p>\n<\/div>\n<div>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">Retry Policy Event: =====&gt; Try=1 for GET https:\/\/ACCOUNT_NAME.documents.azure.com:443\/\r\n\/\/....\r\nRetry Policy Event: MaxRetries 0 exceeded\r\nRetry Policy Event: error read tcp: simulated network failure\r\nRetry Policy Event: End Try #1, Delay=794.018648ms\r\nRetry Policy Event: =====&gt; Try=2 for GET https:\/\/ACCOUNT_NAME.documents.azure.com:443\/\r\nRetry Policy Event: error read tcp: simulated network failure\r\nRetry Policy Event: End Try #2, Delay=2.374693498s\r\nRetry Policy Event: =====&gt; Try=3 for GET https:\/\/ACCOUNT_NAME.documents.azure.com:443\/\r\nRetry Policy Event: error read tcp: simulated network failure\r\nRetry Policy Event: End Try #3, Delay=7.275038434s\r\nRetry Policy Event: =====&gt; Try=4 for GET https:\/\/ACCOUNT_NAME.documents.azure.com:443\/\r\nRetry Policy Event: error read tcp: simulated network failure\r\nRetry Policy Event: MaxRetries 3 exceeded\r\nRetry Policy Event: =====&gt; Try=1 for GET https:\/\/ACCOUNT_NAME.documents.azure.com:443\/\r\nRetry Policy Event: error read tcp: simulated network failure\r\nRetry Policy Event: End Try #1, Delay=968.457331ms\r\n2025\/05\/05 19:53:50 failed to retrieve account properties: read tcp: simulated network failure\r\nexit status 1\r\n<\/code><\/pre>\n<p><div class=\"alert alert-success\">Do take a look at <a href=\"http:\/\/learn.microsoft.com\/en-us\/azure\/developer\/go\/azure-sdk-core-concepts#custom-http-pipeline-policies\">Custom HTTP pipeline policies<\/a> in the Azure SDK for Go documentation for more information on how to implement custom policies. <\/div><\/p>\n<\/div>\n<div><\/div>\n<\/div>\n<div><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2025\/07\/request-response-pipeline-flow.png\"><img decoding=\"async\" class=\" wp-image-10734 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2025\/07\/request-response-pipeline-flow-300x71.png\" alt=\"request response pipeline flow image\" width=\"499\" height=\"118\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2025\/07\/request-response-pipeline-flow-300x71.png 300w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2025\/07\/request-response-pipeline-flow-768x182.png 768w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2025\/07\/request-response-pipeline-flow.png 903w\" sizes=\"(max-width: 499px) 100vw, 499px\" \/><\/a><\/div>\n<div>\n<h2>HTTP-level customizations<\/h2>\n<p>There are scenarios where you may need to customize the HTTP client used by the SDK. For example, when using the <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/cosmos-db\/emulator\">Azure Cosmos DB emulator<\/a> locally, you want to skip certificate verification to connect without SSL errors during development or testing.<\/p>\n<p><code>TLSClientConfig<\/code>\u00a0allows you to customize TLS settings for the HTTP client and setting\u00a0<code>InsecureSkipVerify: true<\/code> disables certificate verification \u2013 this is not recommended for production, but handy for testing.<\/p>\n<\/div>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"prettyprint language-go\"><code class=\"language-go\">func customHTTP1() {\r\n    \/\/ Create a custom HTTP client with a timeout\r\n    client := &amp;http.Client{\r\n        Transport: &amp;http.Transport{\r\n            TLSClientConfig: &amp;tls.Config{InsecureSkipVerify: true},\r\n        },\r\n    }\r\n\r\n    clientOptions := &amp;azcosmos.ClientOptions{\r\n        ClientOptions: azcore.ClientOptions{\r\n            Transport: client,\r\n        },\r\n    }\r\n\r\n    c, err := auth.GetEmulatorClientWithAzureADAuth(\"http:\/\/localhost:8081\", clientOptions)\r\n    if err != nil {\r\n        log.Fatal(err)\r\n    }\r\n    _, err = c.CreateDatabase(context.Background(), azcosmos.DatabaseProperties{ID: \"test\"}, nil)\r\n    if err != nil {\r\n        log.Fatal(err)\r\n    }\r\n\r\n}<\/code><\/pre>\n<p>All you need to do is pass the custom HTTP client to the\u00a0<code>ClientOptions<\/code> struct when creating the Azure Cosmos DB client. The SDK will use this for all requests.<\/p>\n<p>Another scenario is when you want to set a custom header for all requests to track requests or add metadata. All you need to do is implement the\u00a0<code>Do<\/code>\u00a0method of the\u00a0<code>policy.Policy<\/code>\u00a0interface and set the header in the request:<\/p>\n<pre class=\"prettyprint language-go\"><code class=\"language-go\">type CustomHeaderPolicy struct{}\r\n\r\nfunc (c *CustomHeaderPolicy) Do(req *policy.Request) (*http.Response, error) {\r\n    correlationID := uuid.New().String()\r\n    req.Raw().Header.Set(\"X-Correlation-ID\", correlationID)\r\n    return req.Next()\r\n}<\/code><\/pre>\n<p>Looking at the logs, notice the custom header\u00a0<code>X-Correlation-ID<\/code>\u00a0is added to each request:<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">\/\/...\r\nRequest Event: ==&gt; OUTGOING REQUEST (Try=1)\r\n   GET https:\/\/ACCOUNT_NAME.documents.azure.com:443\/\r\n   Authorization: REDACTED\r\n   User-Agent: azsdk-go-azcosmos\/v1.3.0 (go1.23.6; darwin)\r\n   X-Correlation-Id: REDACTED\r\n   X-Ms-Cosmos-Sdk-Supportedcapabilities: 1\r\n   X-Ms-Date: Tue, 06 May 2025 04:27:37 GMT\r\n   X-Ms-Version: 2020-11-05\r\n\r\nRequest Event: ==&gt; OUTGOING REQUEST (Try=1)\r\n   POST https:\/\/ACCOUNT_NAME-region.documents.azure.com:443\/dbs\r\n   Authorization: REDACTED\r\n   Content-Length: 27\r\n   Content-Type: application\/query+json\r\n   User-Agent: azsdk-go-azcosmos\/v1.3.0 (go1.23.6; darwin)\r\n   X-Correlation-Id: REDACTED\r\n   X-Ms-Cosmos-Sdk-Supportedcapabilities: 1\r\n   X-Ms-Date: Tue, 06 May 2025 04:27:37 GMT\r\n   X-Ms-Documentdb-Query: True\r\n   X-Ms-Version: 2020-11-05\r\n\/\/....<\/code><\/pre>\n<div class=\"highlight js-code-highlight\">\n<div>\n<h2 id=\"query-and-index-metrics\" class=\"code-line\" dir=\"auto\" data-line=\"548\">Query and Index Metrics<\/h2>\n<p class=\"code-line\" dir=\"auto\" data-line=\"550\">The Go SDK provides a way to access query and index metrics, which can help you optimize your queries and understand their performance characteristics.<\/p>\n<h3 id=\"query-metrics\" class=\"code-line\" dir=\"auto\" data-line=\"552\">Query Metrics<\/h3>\n<p>When executing queries, you can get basic metrics about the query execution. The Go SDK provides a way to access these metrics through the\u00a0<code>QueryResponse<\/code>\u00a0struct in the\u00a0<a href=\"https:\/\/pkg.go.dev\/github.com\/Azure\/azure-sdk-for-go\/sdk\/data\/azcosmos#QueryItemsResponse.QueryMetrics\" target=\"_blank\" rel=\"noopener noreferrer\">QueryItemsResponse<\/a>\u00a0object. This includes information about the query execution, including the number of documents retrieved, etc.<\/p>\n<pre class=\"prettyprint language-go\"><code class=\"language-go\">func queryMetrics() {\r\n    \/\/.... \r\n    container, err := c.NewContainer(\"existing_db\", \"existing_container\")\r\n    if err != nil {\r\n        log.Fatal(err)\r\n    }\r\n\r\n    query := \"SELECT * FROM c\"\r\n    pager := container.NewQueryItemsPager(query, azcosmos.NewPartitionKey(), nil)\r\n\r\n    for pager.More() {\r\n        queryResp, err := pager.NextPage(context.Background())\r\n        if err != nil {\r\n            log.Fatal(\"query items failed:\", err)\r\n        }\r\n\r\n        log.Println(\"query metrics:\\n\", *queryResp.QueryMetrics)\r\n        \/\/....\r\n    }\r\n}<\/code><\/pre>\n<\/div>\n<\/div>\n<p>The query metrics are provided as a simple raw string in a key-value format (semicolon-separated), which is very easy to parse. Here is an example:<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">totalExecutionTimeInMs=0.34;queryCompileTimeInMs=0.04;queryLogicalPlanBuildTimeInMs=0.00;queryPhysicalPlanBuildTimeInMs=0.02;queryOptimizationTimeInMs=0.00;VMExecutionTimeInMs=0.07;indexLookupTimeInMs=0.00;instructionCount=41;documentLoadTimeInMs=0.04;systemFunctionExecuteTimeInMs=0.00;userFunctionExecuteTimeInMs=0.00;retrievedDocumentCount=9;retrievedDocumentSize=1251;outputDocumentCount=9;outputDocumentSize=2217;writeOutputTimeInMs=0.02;indexUtilizationRatio=1.00\r\n<\/code><\/pre>\n<p>Here is a breakdown of the metrics you can obtain from the query response:<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">| Metric                         | Unit  | Description                                                  |\r\n| ------------------------------ | ----- | ------------------------------------------------------------ |\r\n| totalExecutionTimeInMs         | ms    | Total time taken to execute the query, including all phases. |\r\n| queryCompileTimeInMs           | ms    | Time spent compiling the query.                              |\r\n| queryLogicalPlanBuildTimeInMs  | ms    | Time spent building the logical plan for the query.          |\r\n| queryPhysicalPlanBuildTimeInMs | ms    | Time spent building the physical plan for the query.         |\r\n| queryOptimizationTimeInMs      | ms    | Time spent optimizing the query.                             |\r\n| VMExecutionTimeInMs            | ms    | Time spent executing the query.                              |\r\n| indexLookupTimeInMs            | ms    | Time spent looking up indexes.                               |\r\n| instructionCount               | count | Number of instructions executed for the query.               |\r\n| documentLoadTimeInMs           | ms    | Time spent loading documents from storage.                   |\r\n| systemFunctionExecuteTimeInMs  | ms    | Time spent executing system functions in the query.          |\r\n| userFunctionExecuteTimeInMs    | ms    | Time spent executing user-defined functions in the query.    |\r\n| retrievedDocumentCount         | count | Number of documents retrieved by the query.                  |\r\n| retrievedDocumentSize          | bytes | Total size of documents retrieved.                           |\r\n| outputDocumentCount            | count | Number of documents returned as output.                      |\r\n| outputDocumentSize             | bytes | Total size of output documents.                              |\r\n| writeOutputTimeInMs            | ms    | Time spent writing the output.                               |\r\n| indexUtilizationRatio          | ratio | Ratio of index utilization (1.0 means fully utilized).       |\r\n<\/code><\/pre>\n<div class=\"crayons-article__main \">\n<div id=\"article-body\" class=\"crayons-article__body text-styles spec__body\" data-article-id=\"2464803\">\n<h3 id=\"index-metrics\" class=\"code-line\" dir=\"auto\" data-line=\"609\">Index Metrics<\/h3>\n<p class=\"code-line\" dir=\"auto\" data-line=\"611\">Indexing metrics shows both utilized indexed paths and recommended indexed paths. You can use the indexing metrics to optimize query performance, especially in cases where you aren&#8217;t sure how to modify the indexing policy.<\/p>\n<p class=\"code-line\" dir=\"auto\" data-line=\"613\">To enable indexing metrics in Go SDK, set\u00a0<code>PopulateIndexMetrics<\/code>\u00a0to\u00a0<code>true<\/code>\u00a0in the\u00a0<a href=\"https:\/\/pkg.go.dev\/github.com\/Azure\/azure-sdk-for-go\/sdk\/data\/azcosmos#QueryOptions\" data-href=\"https:\/\/pkg.go.dev\/github.com\/Azure\/azure-sdk-for-go\/sdk\/data\/azcosmos#QueryOptions\">QueryOptions<\/a>. Index metrics data in the\u00a0<a href=\"https:\/\/pkg.go.dev\/github.com\/Azure\/azure-sdk-for-go\/sdk\/data\/azcosmos#QueryItemsResponse\" data-href=\"https:\/\/pkg.go.dev\/github.com\/Azure\/azure-sdk-for-go\/sdk\/data\/azcosmos#QueryItemsResponse\">QueryItemsResponse<\/a>\u00a0is\u00a0<code>base64<\/code>\u00a0encoded and needs to be decoded before they can be used.<\/p>\n<p dir=\"auto\" data-line=\"613\"><div class=\"alert alert-warning\">Enabling indexing metrics incurs overhead, so it should be done only when debugging slow queries and not recommended in production.<\/div><\/p>\n<pre class=\"prettyprint language-go\"><code class=\"language-go\">pager := container.NewQueryItemsPager(\"SELECT c.id FROM c WHERE CONTAINS(LOWER(c.description), @word)\", azcosmos.NewPartitionKey(), &amp;azcosmos.QueryOptions{\r\n\t\tPopulateIndexMetrics: true,\r\n\t\tQueryParameters: []azcosmos.QueryParameter{\r\n\t\t\t{\r\n\t\t\t\tName:  \"@word\",\r\n\t\t\t\tValue: \"happy\",\r\n\t\t\t},\r\n\t\t},\r\n\t})\r\n\r\n\tif pager.More() {\r\n\t\tpage, _ := pager.NextPage(context.Background())\r\n\t\t\r\n    \/\/ process results\r\n\r\n\t\tdecoded, _ := base64.StdEncoding.DecodeString(*page.IndexMetrics)\r\n\t\tlog.Println(\"Index metrics\", string(decoded))\r\n\t}<\/code><\/pre>\n<div>\n<div>Once decoded, the index metrics are available in JSON format. For example:<\/div>\n<\/div>\n<div>\n<pre class=\"prettyprint language-json\"><code class=\"language-json\">{\r\n    \"UtilizedSingleIndexes\": [\r\n        {\r\n            \"FilterExpression\": \"\",\r\n            \"IndexSpec\": \"\/description\/?\",\r\n            \"FilterPreciseSet\": true,\r\n            \"IndexPreciseSet\": true,\r\n            \"IndexImpactScore\": \"High\"\r\n        }\r\n    ],\r\n    \"PotentialSingleIndexes\": [],\r\n    \"UtilizedCompositeIndexes\": [],\r\n    \"PotentialCompositeIndexes\": []\r\n}<\/code><\/pre>\n<h2>OpenTelemetry support<\/h2>\n<p>The Azure Go SDK supports distributed tracing via OpenTelemetry. This allows you to collect, export, and analyze traces for requests made to Azure services, including Azure Cosmos DB.<\/p>\n<p>The\u00a0<a href=\"https:\/\/pkg.go.dev\/github.com\/Azure\/azure-sdk-for-go\/sdk\/tracing\/azotel\" target=\"_blank\" rel=\"noopener noreferrer\">azotel package<\/a>\u00a0is used to connect an instance of OpenTelemetry&#8217;s\u00a0<code>TracerProvider<\/code> to an Azure SDK client (in this case Azure Cosmos DB). You can then configure the <code>TracingProvider<\/code>\u00a0in\u00a0<a href=\"https:\/\/pkg.go.dev\/github.com\/Azure\/azure-sdk-for-go\/sdk\/azcore\/policy#ClientOptions\" target=\"_blank\" rel=\"noopener noreferrer\">azcore.ClientOptions<\/a>\u00a0to enable automatic propagation of trace context and emission of spans for SDK operations.<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"prettyprint language-go\"><code class=\"language-go\">func getClientOptionsWithTracing() (*azcosmos.ClientOptions, *trace.TracerProvider) {\r\n    exporter, err := stdouttrace.New(stdouttrace.WithPrettyPrint())\r\n    if err != nil {\r\n        log.Fatalf(\"failed to initialize stdouttrace exporter: %v\", err)\r\n    }\r\n    tp := trace.NewTracerProvider(trace.WithBatcher(exporter))\r\n\r\n    otel.SetTracerProvider(tp)\r\n\r\n    op := azcosmos.ClientOptions{\r\n        ClientOptions: policy.ClientOptions{\r\n            TracingProvider: azotel.NewTracingProvider(tp, nil),\r\n        },\r\n    }\r\n    return &amp;op, tp\r\n}\r\n<\/code><\/pre>\n<p>The above function creates a\u00a0<code>stdout<\/code>\u00a0exporter for OpenTelemetry (prints traces to the console). It sets up a\u00a0<code>TracerProvider<\/code>, registers this as the global tracer, and returns a\u00a0<code>ClientOptions<\/code>\u00a0struct with the\u00a0<code>TracingProvider<\/code> set, ready to be used with the Azure Cosmos DB client.<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"prettyprint language-go\"><code class=\"language-go\">func tracing() {\r\n\r\n    op, tp := getClientOptionsWithTracing()\r\n    defer func() { _ = tp.Shutdown(context.Background()) }() \r\n\r\n    c, err := auth.GetClientWithDefaultAzureCredential(\"https:\/\/ACCOUNT_NAME.documents.azure.com:443\/\", op)\r\n\r\n    \/\/....\r\n\r\n    container, err := c.NewContainer(\"existing_db\", \"existing_container\")\r\n    if err != nil {\r\n        log.Fatal(err)\r\n    }\r\n\r\n    \/\/ctx := context.Background()\r\n    tracer := otel.Tracer(\"tracer_app1\")\r\n\r\n    ctx, span := tracer.Start(context.Background(), \"query-items-operation\")\r\n    defer span.End()\r\n\r\n    query := \"SELECT * FROM c\"\r\n    pager := container.NewQueryItemsPager(query, azcosmos.NewPartitionKey(), nil)\r\n\r\n    for pager.More() {\r\n        queryResp, err := pager.NextPage(ctx)\r\n        if err != nil {\r\n            log.Fatal(\"query items failed:\", err)\r\n        }\r\n\r\n        for _, item := range queryResp.Items {\r\n            log.Printf(\"Queried item: %+v\\n\", string(item))\r\n        }\r\n    }\r\n}<\/code><\/pre>\n<p>The above function calls\u00a0<code>getClientOptionsWithTracing<\/code> to get tracing-enabled options and a tracer provider and ensures the tracer provider is shut down at the end (flushes traces). It creates a Azure Cosmos DB client with tracing enabled, executes an operation to query items in a container. The SDK call is traced automatically, and exported to stdout in this case.<\/p>\n<p><div class=\"alert alert-success\">You can plug in any OpenTelemetry-compatible tracer provider and traces can be exported to various backend. <a href=\"https:\/\/pkg.go.dev\/github.com\/Azure\/azure-sdk-for-go\/sdk\/tracing\/azotel#pkg-overview\">Here is a snippet<\/a> for Jaeger exporter. <\/div><\/p>\n<p>The traces are quite large \u2013 here is a small snippet of the output:<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">\/\/...\r\n{\r\n        \"Name\": \"query_items democontainer\",\r\n        \"SpanContext\": {\r\n                \"TraceID\": \"39a650bcd34ff70d48bbee467d728211\",\r\n                \"SpanID\": \"f2c892bec75dbf5d\",\r\n                \"TraceFlags\": \"01\",\r\n                \"TraceState\": \"\",\r\n                \"Remote\": false\r\n        },\r\n        \"Parent\": {\r\n                \"TraceID\": \"39a650bcd34ff70d48bbee467d728211\",\r\n                \"SpanID\": \"b833d109450b779b\",\r\n                \"TraceFlags\": \"01\",\r\n                \"TraceState\": \"\",\r\n                \"Remote\": false\r\n        },\r\n        \"SpanKind\": 3,\r\n        \"StartTime\": \"2025-05-06T17:59:30.90146+05:30\",\r\n        \"EndTime\": \"2025-05-06T17:59:36.665605042+05:30\",\r\n        \"Attributes\": [\r\n                {\r\n                        \"Key\": \"db.system\",\r\n                        \"Value\": {\r\n                                \"Type\": \"STRING\",\r\n                                \"Value\": \"cosmosdb\"\r\n                        }\r\n                },\r\n                {\r\n                        \"Key\": \"db.cosmosdb.connection_mode\",\r\n                        \"Value\": {\r\n                                \"Type\": \"STRING\",\r\n                                \"Value\": \"gateway\"\r\n                        }\r\n                },\r\n                {\r\n                        \"Key\": \"db.namespace\",\r\n                        \"Value\": {\r\n                                \"Type\": \"STRING\",\r\n                                \"Value\": \"demodb-gosdk3\"\r\n                        }\r\n                },\r\n\/\/.....<\/code><\/pre>\n<p><div class=\"alert alert-primary\">Refer to <a href=\"https:\/\/github.com\/open-telemetry\/semantic-conventions\/blob\/v1.27.0\/docs\/database\/cosmosdb.md\">Semantic Conventions for Azure Cosmos DB<\/a> <\/div><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h2>Conclusion<\/h2>\n<p>The Go SDK for Azure Cosmos DB is designed to be flexible and customizable, allowing you to tailor it to your specific needs. In this blog, we covered how to configure and customize the Go SDK for Azure Cosmos DB. We looked at retry policies, HTTP-level customizations, OpenTelemetry support, and how to access metrics.<\/p>\n<p>For more information, refer to the\u00a0<a href=\"https:\/\/pkg.go.dev\/github.com\/Azure\/azure-sdk-for-go\/sdk\/data\/azcosmos\" target=\"_blank\" rel=\"noopener noreferrer\">package documentation<\/a>\u00a0and the\u00a0<a href=\"https:\/\/github.com\/Azure\/azure-sdk-for-go\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub repository<\/a> for the Go SDK. I hope you find this useful!<\/p>\n<h2>About Azure Cosmos DB<\/h2>\n<article id=\"post-10622\" class=\"middle-column pe-xl-198\" data-clarity-region=\"article\">\n<div class=\"entry-content sharepostcontent \" data-bi-area=\"body_article\" data-bi-id=\"post_page_body_article\">\n<p>Azure Cosmos DB is a fully managed and serverless distributed database for modern app development, with SLA-backed speed and availability, automatic and instant scalability, and support for open-source PostgreSQL, MongoDB, and Apache Cassandra. To stay in the loop on Azure Cosmos DB updates, follow us on\u00a0<a href=\"https:\/\/twitter.com\/AzureCosmosDB\" target=\"_blank\" rel=\"noopener\">X<\/a>,\u00a0<a href=\"https:\/\/aka.ms\/AzureCosmosDBYouTube\" target=\"_blank\" rel=\"noopener\">YouTube<\/a>, and\u00a0<a href=\"https:\/\/www.linkedin.com\/company\/azure-cosmos-db\/\" target=\"_blank\" rel=\"noopener\">LinkedIn<\/a>.<\/p>\n<p>To easily build your first database, watch our\u00a0<a href=\"https:\/\/youtube.com\/playlist?list=PLmamF3YkHLoLLGUtSoxmUkORcWaTyHlXp\" target=\"_blank\" rel=\"noopener\">Get Started videos<\/a>\u00a0on YouTube and explore ways to\u00a0<a href=\"https:\/\/docs.microsoft.com\/azure\/cosmos-db\/optimize-dev-test\" target=\"_blank\" rel=\"noopener\">dev\/test free.<\/a><\/p>\n<\/div>\n<\/article>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>When building applications that interact with databases, developers frequently encounter scenarios where default SDK configurations don&#8217;t align with their specific operational requirements. They need to customize SDK behavior to address real-world challenges like network instability, performance bottlenecks, debugging complexity, monitoring requirements, and more. These factors become even more pronounced when working with a massively scalable, [&hellip;]<\/p>\n","protected":false},"author":181737,"featured_media":10742,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[14,1935],"tags":[499],"class_list":["post-10732","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-core-sql-api","category-go-sdk","tag-azure-cosmos-db"],"acf":[],"blog_post_summary":"<p>When building applications that interact with databases, developers frequently encounter scenarios where default SDK configurations don&#8217;t align with their specific operational requirements. They need to customize SDK behavior to address real-world challenges like network instability, performance bottlenecks, debugging complexity, monitoring requirements, and more. These factors become even more pronounced when working with a massively scalable, [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/10732","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/users\/181737"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/comments?post=10732"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/10732\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media\/10742"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media?parent=10732"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/categories?post=10732"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/tags?post=10732"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}