{"id":4976,"date":"2022-10-04T07:00:33","date_gmt":"2022-10-04T14:00:33","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cosmosdb\/?p=4976"},"modified":"2024-07-11T16:32:52","modified_gmt":"2024-07-11T23:32:52","slug":"azure-synapse-link-support-for-cosmos-db-gremlin-api-now-in-preview","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cosmosdb\/azure-synapse-link-support-for-cosmos-db-gremlin-api-now-in-preview\/","title":{"rendered":"Azure Synapse Link support for Azure Cosmos DB Gremlin API now in preview"},"content":{"rendered":"<p>Azure Cosmos DB&#8217;s Gremlin API combines the power of graph database algorithms with highly scalable, managed infrastructure to provide a unique, flexible solution to most common data problems associated with lack of flexibility and relational approaches. For more information, click <a href=\"https:\/\/docs.microsoft.com\/azure\/cosmos-db\/graph\/graph-introduction\">here<\/a>.<\/p>\n<h2><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/10\/SL-for-Gremlin.png\"><img decoding=\"async\" class=\"size-full wp-image-4978 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/10\/SL-for-Gremlin.png\" alt=\"Image SL for Gremlin\" width=\"1140\" height=\"583\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/10\/SL-for-Gremlin.png 1140w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/10\/SL-for-Gremlin-300x153.png 300w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/10\/SL-for-Gremlin-1024x524.png 1024w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/10\/SL-for-Gremlin-768x393.png 768w\" sizes=\"(max-width: 1140px) 100vw, 1140px\" \/><\/a><\/h2>\n<h2>Use Cases<\/h2>\n<p>The objective of this new capability is to unlock Graph Analytics workloads, so that customers can analyze the relationships between their graph entities. Typical use cases are social networks, recommendation engines, Customer 365, telecommunications networks, supply-chain\u00a0, and IoT. For more information, click <a href=\"https:\/\/docs.microsoft.com\/azure\/cosmos-db\/graph\/graph-introduction#scenarios-that-use-gremlin-api\">here<\/a>.<\/p>\n<p>As an example, customers now can use Azure Synapse Link for network analysis like centrality, connectivity, shortest path, and community detection. This can be achieved by:<\/p>\n<ul>\n<li>Batch GRAPH analytics, and then write results elsewhere, potentially bulk update some Graph properties on existing vertexes\/edges or create new edges for the discovered relationships as an outcome of this.<\/li>\n<li>Large scan and aggregation reporting, to avoid expensive group() and sort() cross-partition Gremlin queries, which likely will have large RUs cost and slow performance. The objective is to produce tabular reporting on graph data, populating reports and dashboards.<\/li>\n<\/ul>\n<h2>How to enable Synapse Link for Gremlin API<\/h2>\n<p>Currently customers can use Azure CLI to enable Synapse Link for Gremlin API.\u00a0 PowerShell will be supported soon. The required steps are:<\/p>\n<p><strong>First, enable Synapse Link in your Gremlin Database account:<\/strong><\/p>\n<pre class=\"prettyprint\">az cosmosdb update --capabilities EnableGremlin --name MyCosmosDBGremlinDatabaseAccount --resource-group MyResourceGroup --enable-analytical-storage true<\/pre>\n<p>&nbsp;<\/p>\n<p><strong>Then, enable Synapse Link in your graph:<\/strong><\/p>\n<pre class=\"prettyprint\">az cosmosdb gremlin graph update --g MyResourceGroup --a MyCosmosDBGremlinDatabaseAccount --d MyGremlinDB --n MyGraph --analytical-storage-ttl -1<\/pre>\n<p>&nbsp;<\/p>\n<p><strong>Do you need to create Gremlin database account, database, or Graph?<\/strong><\/p>\n<p>Check <a href=\"https:\/\/docs.microsoft.com\/azure\/cosmos-db\/scripts\/cli\/gremlin\/create#run-the-script\">these<\/a> Gremlin CLI scripts. Please note that you can also enable Synapse Link when creating your Gremlin database account and your graph. Just use <strong><em>&#8211;enable-analytical-storage true<\/em><\/strong> with <strong><em>az cosmosdb create<\/em><\/strong> to create your Synapse Link enabled database account and <strong><em>&#8211;analytical-storage-ttl \u20131<\/em><\/strong> with <strong><em>az cosmosdb gremlin graph update<\/em><\/strong> to create your Synapse Link enabled graph.<\/p>\n<p>For more information about Synapse Link time to live (ttl) and analytical store data retention, click <a href=\"https:\/\/docs.microsoft.com\/azure\/cosmos-db\/analytical-store-introduction#analytical-ttl\">here<\/a>. After you enable analytical store in your graph, you can view the analytical ttl in the Azure portal Data Explorer.<\/p>\n<p>Another important detail is that well defined schema is the default option for Gremlin API. For more information about schema representation, click <a href=\"https:\/\/learn.microsoft.com\/azure\/cosmos-db\/analytical-store-introduction#schema-representation\">here<\/a>.<\/p>\n<p>&nbsp;<\/p>\n<h2>How to analyze your data with Synapse Workspaces<\/h2>\n<p>You need to use Azure Synapse Workspaces to analyze Cosmos DB data through Synapse Link. And the Azure Synapse Studio is the tool that within the workspace that is used to create SQL queries or Spark notebooks. This is true for all Cosmos DB APIs that support Synapse Link: SQL, MongoDB, and Gremlin.<\/p>\n<p>Since Gremlin is in preview, there are some limitations on Azure Synapse Studio:<\/p>\n<h3>Linked Service<\/h3>\n<p>A linked service isn\u2019t required to use Azure Synapse Link for Cosmos DB, but it\u2019s a great option to reduce coding and better visualize your Cosmos DB data. The steps to create a linked service for your Gremlin API are:<\/p>\n<ul>\n<li>Create a Linked Service and use the <strong>Azure<\/strong> <strong>Cosmos DB (SQL API) <\/strong><\/li>\n<li>Instead of using <strong>\u201cFrom Azure subscription\u201d<\/strong> default option, use <strong>\u201cEnter manually\u201d<\/strong>.<\/li>\n<li>Copy\/paste the <strong>.Net SDK URI<\/strong> and the account key for your Gremlin API account. You can use any key, primary or secondary.<\/li>\n<li>Enter your Graph database.<\/li>\n<\/ul>\n<p>Now you will be able to see your linked service in the data explorer tree view. Currently you won&#8217;t be able to see your graphs listed in the tree view, but you can still query your data.<\/p>\n<h3>Querying data with Azure Synapse SQL Serverless<\/h3>\n<p>To query your graphs using Azure Synapse SQL serverless. Let\u2019s assume these names for a hypothetical scenario:<\/p>\n<ul>\n<li>The database account name is MyGremlinAccount<\/li>\n<li>The database name is MyGremlinDB<\/li>\n<li>The graph name is MyGraph<\/li>\n<\/ul>\n<p>You can query your data using 2 different syntaxes:<\/p>\n<h4>OPENROWSET with your Gremlin API account key<\/h4>\n<pre class=\"prettyprint\">SELECT TOP 10 *\u00a0FROM OPENROWSET(\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 'CosmosDB',\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 'Account=MyGremlinAccount;Database=MyGremlinDB;Key=&lt;your-account-key&gt;',\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 MyGraph) as MyGraph\r\nGO<\/pre>\n<p>&nbsp;<\/p>\n<h4>OPENROWSET with credential<\/h4>\n<pre class=\"prettyprint\">CREATE CREDENTIAL MyGremlinCredential WITH IDENTITY = 'SHARED ACCESS SIGNATURE', SECRET = '&lt;your-account-key&gt;'\r\nGO\r\n\r\nSELECT TOP 10 *\r\nFROM OPENROWSET(\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 PROVIDER = 'CosmosDB',\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 CONNECTION = 'Account=MyGremlinAccount;Database=MyGremlinDB,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 OBJECT = 'MyGraph',\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 SERVER_CREDENTIAL = ' MyGremlinCredential'\r\n\u00a0\u00a0\u00a0 ) as MyGraph\r\nGO<\/pre>\n<p>Please note that the credential is created once and prevents you from pasting your database account key for every single query. For more information about Synapse Link and SQL Serverless, click <a href=\"https:\/\/docs.microsoft.com\/azure\/synapse-analytics\/sql\/query-cosmos-db-analytical-store?toc=%2Fazure%2Fcosmos-db%2Ftoc.json&amp;bc=%2Fazure%2Fcosmos-db%2Fbreadcrumb%2Ftoc.json&amp;tabs=openrowset-credential#overview\">here<\/a>.<\/p>\n<h3>Querying data with Azure Synapse Spark<\/h3>\n<p>The example below uses GraphFrames, a package for Apache Spark which provides DataFrame-based Graphs. It provides high-level APIs in Scala, Java, and Python. It aims to provide both the functionality of GraphX and extended functionality taking advantage of Spark DataFrames. For more information, click <a href=\"https:\/\/graphframes.github.io\/graphframes\/docs\/_site\/index.html\">here<\/a>.<\/p>\n<pre class=\"prettyprint\">import org.apache.spark.sql._\r\nimport org.apache.spark.sql.functions._\r\nimport org.graphframes._\r\nimport org.graphframes.GraphFrame\r\n\r\nval df_olap = spark.read.format(\"cosmos.olap\").option(\"spark.synapse.linkedService\", \"&lt;Your-Linked-Service-Name&gt;\").option(\"spark.cosmos.container\", \"MyGraph\").load()\r\n\r\n\/\/display first 10 entries\r\n\/\/display(df_olap)\r\n\r\nvar vertices = df_olap.filter($\"_sink\".isNull).select($\"id\", $\"name\",$\"age\".getItem(0).getItem(\"_value\").as(\"age\"))\r\n\r\nval df_edges = (df_olap.filter($\"_sink\".isNotNull).drop(\"_isEdge\"))\r\n\r\nvar edges = df_edges.select(\"_vertexId\", \"_sink\", \"label\")\r\n\r\nedges = edges.withColumnRenamed(\"_vertexId\", \"src\")\r\n\r\nedges = edges.withColumnRenamed(\"_sink\", \"dst\")\r\n\r\nedges = edges.withColumnRenamed(\"label\", \"relationship\")\r\n\r\n\u00a0\/\/Optional\r\ndisplay(vertices)\u00a0\r\n\/\/Optional\r\ndisplay(edges)\r\n\r\nval graph = GraphFrame(vertices, edges)\r\n\r\n\/\/ Label Propagation Algo to detect comunities of Vertices based on thier connections. \r\n\r\nimport org.apache.spark.sql.DataFrame\r\n\r\nval result = graph.labelPropagation.maxIter(5).run()result.select(\"id\", \"name\", \"label\").orderBy(\"label\").show()<\/pre>\n<p>&nbsp;<\/p>\n<p>Please note that unlike Synapse SQL serverless, Synapse Spark can take advantage of linked service.<\/p>\n<p>For more information about Synapse Link and Spark, click <a href=\"https:\/\/docs.microsoft.com\/azure\/synapse-analytics\/synapse-link\/how-to-query-analytical-store-spark-3?toc=%2Fazure%2Fcosmos-db%2Ftoc.json&amp;bc=%2Fazure%2Fcosmos-db%2Fbreadcrumb%2Ftoc.json\">here<\/a>.<\/p>\n<h2>Conclusion<\/h2>\n<p>Now customers can create powerful graph analytics workloads to unlock BI, insights, and advanced analytics on top of Azure Cosmos DB Gremlin API data. Stay tuned to our <a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/\">blog<\/a> for more updates about Azure Synapse Link for Cosmos DB. And please contact our <a href=\"mailto:cosmosdbsynapselink@microsoft.com\">team<\/a> for any questions that you may have.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Azure Cosmos DB&#8217;s Gremlin API combines the power of graph database algorithms with highly scalable, managed infrastructure to provide a unique, flexible solution to most common data problems associated with lack of flexibility and relational approaches. For more information, click here. Use Cases The objective of this new capability is to unlock Graph Analytics workloads, [&hellip;]<\/p>\n","protected":false},"author":21894,"featured_media":4978,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1216,12,17],"tags":[1743],"class_list":["post-4976","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-analytics","category-announcements","category-gremlin-api","tag-gremlin"],"acf":[],"blog_post_summary":"<p>Azure Cosmos DB&#8217;s Gremlin API combines the power of graph database algorithms with highly scalable, managed infrastructure to provide a unique, flexible solution to most common data problems associated with lack of flexibility and relational approaches. For more information, click here. Use Cases The objective of this new capability is to unlock Graph Analytics workloads, [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/4976","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/users\/21894"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/comments?post=4976"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/4976\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media\/4978"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media?parent=4976"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/categories?post=4976"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/tags?post=4976"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}