{"id":3319,"date":"2021-08-23T09:50:51","date_gmt":"2021-08-23T16:50:51","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cosmosdb\/?p=3319"},"modified":"2021-08-23T12:31:42","modified_gmt":"2021-08-23T19:31:42","slug":"cost-optimized-metrics-through-log-analytics","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cosmosdb\/cost-optimized-metrics-through-log-analytics\/","title":{"rendered":"Cost Optimized Metrics through Log Analytics"},"content":{"rendered":"<p>Azure Cosmos DB is designed for high throughput and low latency workloads, with the ability to serve millions of requests per second and billions of requests a day. In addition to seeking a scalable database that can serve massive volumes of traffic, users also look for granular server-side metrics for monitoring the health and performance of their mission-critical workloads built on Azure Cosmos DB.<\/p>\n<p><span style=\"font-size: 1rem;\">Azure Cosmos DB can be integrated as an opt-in feature with <\/span><a style=\"background-color: #f7f7f9; font-size: 1rem;\" href=\"https:\/\/aka.ms\/AAdjkgm\" target=\"_blank\" rel=\"noopener\">Azure Log Analytics<\/a><span style=\"font-size: 1rem;\"> \u2013 <\/span>an Azure service that provides querying, monitoring, and alerting capabilities for Azure metrics.<\/p>\n<p>However, with billions of requests to the database, coupled with the need to retain this data for application performance analysis, the data footprint of these metrics can start to add up. While Log Analytics only charges for storage, not for the compute required for querying and retrieving metrics, the cost of ingestion and retention of these metrics can get large for a high throughput database.<\/p>\n<p>&nbsp;<\/p>\n<h3>Savings with the new pipeline<\/h3>\n<p>Azure Cosmos DB recently launched improvements to significantly reduce the cost of the Log Analytics integration, by migrating to a new metrics pipeline called Resource Specific tables. Previously, all Azure services pushed metrics to a common table in Log Analytics, which made the format of these tables rigid. Furthermore, additional data was needed to distinguish metrics for each service in the common platform, leading to a higher footprint for metrics.<\/p>\n<p>By moving to the new pipeline, Azure Cosmos DB metrics are now available in \u201cResource Specific\u201d tables with a schema of their own. This has significantly lowered the data footprint and by extension, lowered the cost of ingestion and retention of Azure Cosmos DB metrics in Log Analytics.<\/p>\n<p>With a simple toggle in the UI along with minor changes to the queries, metrics can now be retrieved through Log Analytics at a significantly reduced cost. Tabulated below are cost savings for the most frequently used diagnostic settings in Azure Cosmos DB.<\/p>\n<p style=\"text-align: center;\"><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/Cost-Optimized-Log-Analytics.png\"><img decoding=\"async\" class=\"wp-image-3324 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/Cost-Optimized-Log-Analytics-300x136.png\" alt=\"Image Cost Optimized Log Analytics\" width=\"828\" height=\"377\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/Cost-Optimized-Log-Analytics-300x136.png 300w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/Cost-Optimized-Log-Analytics-768x349.png 768w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/Cost-Optimized-Log-Analytics.png 782w\" sizes=\"(max-width: 828px) 100vw, 828px\" \/><\/a><span style=\"font-size: 10pt;\"><strong><em>Figure 1: Expected savings for each Diagnostic Setting capturing server-side metrics for Azure Cosmos DB<\/em><\/strong><\/span><\/p>\n<p>&nbsp;<\/p>\n<h3>Additional Improvements<\/h3>\n<p>Previously, all metrics were published as strings requiring the casting of numeric values. Additionally, column names for different categories of diagnostics were not uniform. With resource specific tables, naming conventions are consistent and numeric fields no longer need to be cast to be queried effectively.<\/p>\n<p>&nbsp;<\/p>\n<h3>Querying Server-Side Metrics<\/h3>\n<p>Detailed below are a flowchart and queries, highlighting performance monitoring with Log Analytics.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/Metrics-Debugging-Flowchart.png\"><img decoding=\"async\" class=\"wp-image-3320 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/Metrics-Debugging-Flowchart-292x300.png\" alt=\"Image Metrics Debugging Flowchart\" width=\"840\" height=\"862\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/Metrics-Debugging-Flowchart-292x300.png 292w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/Metrics-Debugging-Flowchart-768x789.png 768w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/Metrics-Debugging-Flowchart-24x24.png 24w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/Metrics-Debugging-Flowchart-48x48.png 48w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/Metrics-Debugging-Flowchart.png 781w\" sizes=\"(max-width: 840px) 100vw, 840px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><strong><span style=\"font-size: 10pt;\"><em>Figure 2 &#8211; Flowchart for debugging application latencies by leveraging Azure Cosmos DB metrics in Log Analytics<\/em><\/span><\/strong><\/p>\n<p>&nbsp;<\/p>\n<p>Log Analytics (Resource Specific) Queries for each of the steps above:<\/p>\n<ol>\n<li style=\"text-align: left;\">Spikes in server latencies?\n<pre class=\"prettyprint\">CDBDataPlaneRequests\r\n| where TimeGenerated &gt; now(-6h)\r\n| where AccountName == \"LA-METRICS-DEMO\"\r\n| summarize max(DurationMs) by bin(TimeGenerated, 10m)\r\n| render timechart<\/pre>\n<\/li>\n<li>Few or multiple physical partitions?\n<pre class=\"prettyprint\">CDBDataPlaneRequests\r\n| where TimeGenerated &gt; now(-6h)\r\n| where AccountName == \"LA-METRICS-DEMO\"\r\n| summarize max(DurationMs) by bin(TimeGenerated, 10m), PartitionId\r\n| render timechart<\/pre>\n<\/li>\n<li>Throttled requests in the time window?\n<pre class=\"prettyprint\">CDBDataPlaneRequests\r\n| where TimeGenerated &gt; now(-6h)\r\n| where AccountName == \"LA-METRICS-DEMO\"\r\n| where StatusCode == 429\r\n| summarize count() by bin(TimeGenerated, 10m)\r\n| render timechart<\/pre>\n<\/li>\n<li>Larger request volume in time window?\n<pre class=\"prettyprint\">CDBDataPlaneRequests\r\n| where TimeGenerated &gt; now(-6h)\r\n| where AccountName == \"LA-METRICS-DEMO\"\r\n| summarize count() by bin(TimeGenerated, 10m)\r\n| render timechart<\/pre>\n<\/li>\n<li>Fetch specific requests that spiked\n<pre class=\"prettyprint\">CDBDataPlaneRequests\r\n| where TimeGenerated &gt; now(-6h)\r\n| where AccountName == \"LA-METRICS-DEMO\"\r\n| summarize count() by bin(TimeGenerated, 10m), OperationName\r\n| render timechart<\/pre>\n<\/li>\n<li>Higher RU\/s per operation in time window\n<pre class=\"prettyprint\">CDBDataPlaneRequests\r\n| where TimeGenerated &gt; now(-6h)\r\n| where AccountName == \"LA-METRICS-DEMO\"\r\n| summarize max(RequestCharge) by bin(TimeGenerated, 10m), OperationName\r\n| render timechart<\/pre>\n<\/li>\n<li>Larger payload size for write operations?\n<pre class=\"prettyprint\">CDBDataPlaneRequests\r\n| where TimeGenerated &gt; now(-6h)\r\n| where AccountName == \"LA-METRICS-DEMO\"\r\n| where OperationName in (\"Create\", \"Upsert\", \"Delete\", \"Execute\")\r\n| summarize max(RequestLength) by bin(TimeGenerated, 10m), OperationName\r\n| render timechart<\/pre>\n<\/li>\n<li>Larger response size for reads?\n<pre class=\"prettyprint\">CDBDataPlaneRequests\r\n| where TimeGenerated &gt; now(-6h)\r\n| where AccountName == \"LA-METRICS-DEMO\"\r\n| where OperationName in (\"Read\", \"Query\")\r\n| summarize max(ResponseLength) by bin(TimeGenerated, 10m), OperationName\r\n| render timechart<\/pre>\n<\/li>\n<li>Is a common logical partition the culprit?\n<pre class=\"prettyprint\">CDBPartitionKeyRUConsumption\r\n| where TimeGenerated &gt; now(-6h)\r\n| where AccountName == \"LA-METRICS-DEMO\"\r\n| summarize sum(RequestCharge) by PartitionKey, PartitionKeyRangeId<\/pre>\n<pre class=\"prettyprint\">\/\/ 9.1 Is a common logical partition consuming more storage than others?\r\nCDBPartitionKeyStatistics\r\n| where TimeGenerated &gt; now(-6h)\r\n| where AccountName == \"LA-METRICS-DEMO\"\r\n| summarize StorageConsumed = sum(SizeKb) by PartitionKey\r\n| order by StorageConsumed desc<\/pre>\n<\/li>\n<li>Server side timeouts observed?\n<pre class=\"prettyprint\">CDBDataPlaneRequests\r\n| where TimeGenerated &gt;= now(-6h)\r\n| where AccountName == \"LA-METRICS-DEMO\"\r\n| where StatusCode == 408\r\n| summarize count() by bin(TimeGenerated, 10m)\r\n| render timechart<\/pre>\n<\/li>\n<li>(&amp;12) Jump in cross region calls? Observed in a few or several client VMs?\n<pre class=\"prettyprint\">\/\/ Confirm the regions against which Client IPs are sending requests to Cosmos DB\r\n\/\/ Setting the region name in the SDK's 'UserAgent' can also be used to check for cross region calls\r\nCDBDataPlaneRequests\r\n| where TimeGenerated &gt;= now(-6h)\r\n| where AccountName == \"LA-METRICS-DEMO\"\r\n| summarize count() by ClientIpAddress, RegionName<\/pre>\n<\/li>\n<\/ol>\n<h3><\/h3>\n<p>&nbsp;<\/p>\n<h3>Next Steps<\/h3>\n<p>To learn more about the new resource-specific tables in Azure Log Analytics see, <a href=\"https:\/\/aka.ms\/AAdjkgh\">Monitoring Azure Cosmos DB with Resource Specific Tables in Log Analytics<\/a>.<\/p>\n<p>Also, stay tuned to this blog for an upcoming post on monitoring client-side performance through diagnostic metrics in the <a href=\"https:\/\/aka.ms\/AAdjcug\">Azure Cosmos DB Java v4 Client<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Azure Cosmos DB is designed for high throughput and low latency workloads, with the ability to serve millions of requests per second and billions of requests a day. In addition to seeking a scalable database that can serve massive volumes of traffic, users also look for granular server-side metrics for monitoring the health and performance [&hellip;]<\/p>\n","protected":false},"author":64774,"featured_media":3324,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[14],"tags":[1788],"class_list":["post-3319","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-core-sql-api","tag-log-analytics"],"acf":[],"blog_post_summary":"<p>Azure Cosmos DB is designed for high throughput and low latency workloads, with the ability to serve millions of requests per second and billions of requests a day. In addition to seeking a scalable database that can serve massive volumes of traffic, users also look for granular server-side metrics for monitoring the health and performance [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/3319","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/users\/64774"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/comments?post=3319"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/3319\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media\/3324"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media?parent=3319"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/categories?post=3319"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/tags?post=3319"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}