{"id":9391,"date":"2022-01-31T08:00:52","date_gmt":"2022-01-31T16:00:52","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/microsoft365dev\/?p=9391"},"modified":"2022-05-26T15:37:31","modified_gmt":"2022-05-26T22:37:31","slug":"combining-microsoft-graph-data-connect-data-sets-in-azure-synapse-analytics","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/microsoft365dev\/combining-microsoft-graph-data-connect-data-sets-in-azure-synapse-analytics\/","title":{"rendered":"Combining Microsoft Graph Data Connect data sets in Azure Synapse Analytics"},"content":{"rendered":"<p>This article covers the process of combining two data sets extracted via an Azure Synapse pipeline using Microsoft Graph Data Connect (MGDC). The steps to deploy the baseline Azure Synapse Analytics workspace to follow this demo are described in my blog <a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/microsoft-graph-data-connect-deploying-azure-synapse-analytics\" target=\"_blank\" rel=\"noopener\">here<\/a>. For users who are not familiar with Azure Synapse analytics, it is a solution that provides a full Extract\/Transform\/Load (ETL) stack for your data.<\/p>\n<p>We will demo two different approaches to combining data sets via Azure Synapse Analytics pipelines. The first approach will use Synapse Spark Notebook with PySpark scripts, while the second one will use the no-code Data flow approach instead. In both cases we will demo how to effectively combine the Microsoft 365 Messages data set, which contains emails, with the Microsoft 365 Users data set. We will be joining the two data sets on the pUser property which is common to both sets and identifies the record of specific users.<\/p>\n<h2>Option 1 &#8211; Using Synapse Spark Notebook<\/h2>\n<p>To get started, we will need to create a new Synapse pipeline. To do so, navigate to your Azure Synapse workspace, and open the Synapse Studio. From the main workbench, click on the <strong>Integrate<\/strong> button in the left navigation bar.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-1.png\"><img decoding=\"async\" class=\"wp-image-9418 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-1.png\" alt=\"Azure Synapse Analytics' Integrate feature\" width=\"400\" height=\"471\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-1.png 660w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-1-255x300.png 255w\" sizes=\"(max-width: 400px) 100vw, 400px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 10pt;\"><em>Figure 1 &#8211; Azure Synapse Analytics&#8217; Integrate feature<\/em><\/span><\/p>\n<p>Beside the <strong>Integrate<\/strong> header, click on the <strong>+<\/strong> button and select <strong>Pipeline<\/strong> from the drop-down menu.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-2.png\"><img decoding=\"async\" class=\"aligncenter size-large wp-image-9417\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-2-1024x687.png\" alt=\"- Creating a new Azure Synapse Analytics pipeline\" width=\"640\" height=\"429\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-2-1024x687.png 1024w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-2-300x201.png 300w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-2-768x515.png 768w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-2.png 1119w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><em><span style=\"font-size: 10pt;\">Figure 2 &#8211; Creating a new Azure Synapse Analytics pipeline<\/span><\/em><\/p>\n<p>We now need to add two <strong>Copy Data<\/strong> activities to our pipeline: one that will copy the Users (BasicDataSet_v0.User_v1) data set and one that will copy the Messages (BasicDataSet_v0.Message_v1) one. In our case, we have defined both activities so that they will return all fields from both data sets. For more details on how to configure Microsoft Graph Data Connect via the Copy Data activity, you can refer to the following article: <a href=\"https:\/\/docs.microsoft.com\/en-us\/graph\/data-connect-quickstart?tabs=Microsoft365&amp;tutorial-step=4\">Build your first Microsoft Graph Data Connect application<\/a>.<\/p>\n<p>For our demos, the extracted data will be copied to an Azure Data Lake Storage Gen 2 location. Also, it is important to note that the activity to extract information about messages will have a dynamic filter on the sentDateTime so that it only extracts information from the past 24 hours. Figure 3 below shows the new pipeline with the 2 Copy Data activities on the workbench.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-3.png\"><img decoding=\"async\" class=\"aligncenter size-large wp-image-9416\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-3-1024x776.png\" alt=\"Azure Synapse Analytics pipeline with MGDC copy data activities\" width=\"640\" height=\"485\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-3-1024x776.png 1024w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-3-300x227.png 300w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-3-768x582.png 768w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-3.png 1287w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><em><span style=\"font-size: 10pt;\">Figure 3 &#8211; Azure Synapse Analytics pipeline with MGDC copy data activities<\/span><\/em><\/p>\n<p>Every time our pipeline executes, it will retrieve all emails that were sent in the past 24 hours (based on the dynamic filter mentioned previously) and will retrieve information about all users and have it stored in our storage account as binary files with JSON rows. We\u2019ve configured our pipeline so that each activity copies its extracted data in a folder named accordingly (e.g., messages and users folders). For our scenario, we will schedule our pipeline to automatically get triggered every 24 hours, which means that every day it will generate a new binary file containing all emails from the past day. The issue, however, is that the same thing will happen with the users data extract, which will result in duplicate users being listed across files. We want to make sure that every time we run our pipeline we start with a fresh and most recent list of all users in our environment. To enforce this, we will add a new <strong>Delete<\/strong> activity (under the General category) and ensure all files are deleted from our <strong>users<\/strong> folder as a prerequisites to extracting the users data set.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-4.png\"><img decoding=\"async\" class=\"aligncenter size-large wp-image-9415\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-4-1024x608.png\" alt=\"Adding a new Delete activity to our pipeline\" width=\"640\" height=\"380\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-4-1024x608.png 1024w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-4-300x178.png 300w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-4-768x456.png 768w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-4.png 1287w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><em><span style=\"font-size: 10pt;\">Figure 4 &#8211; Adding a new Delete activity to our pipeline<\/span><\/em><\/p>\n<p>We will configure the delete activity to delete every file under the specified location as shown in Figure 5 below.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-5.png\"><img decoding=\"async\" class=\"aligncenter size-large wp-image-9414\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-5-1024x587.png\" alt=\"Delete all files in the specified location\" width=\"640\" height=\"367\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-5-1024x587.png 1024w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-5-300x172.png 300w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-5-768x440.png 768w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-5.png 1188w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 10pt;\"><em>Figure 5 &#8211; Delete all files in the specified location<\/em><\/span><\/p>\n<p>Under the <strong>Logging settings<\/strong> tab, we will simply uncheck the <strong>Enable logging<\/strong> checkbox.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-6.png\"><img decoding=\"async\" class=\"aligncenter wp-image-9413\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-6.png\" alt=\"Disabling logging on the delete activity\" width=\"600\" height=\"236\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-6.png 943w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-6-300x118.png 300w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-6-768x302.png 768w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 10pt;\"><em>Figure 6 &#8211; Disabling logging on the delete activity<\/em><\/span><\/p>\n<p>We are now ready to tackle the bulk of pipeline\u2019s logic. From the <strong>Activities<\/strong> list, under the <strong>Synapse<\/strong> category, drag and drop a new <strong>Notebook<\/strong> activity onto the workbench, and make the two copy data activity prerequisites (see Figure 7).<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-7.png\"><img decoding=\"async\" class=\"aligncenter size-large wp-image-9412\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-7-1024x597.png\" alt=\"Add a new Synapse Runbook to the pipeline\" width=\"640\" height=\"373\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-7-1024x597.png 1024w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-7-300x175.png 300w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-7-768x448.png 768w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-7.png 1287w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 10pt;\"><em>Figure 7 &#8211; Add a new Synapse Runbook to the pipeline<\/em><\/span><\/p>\n<p>From the workbench, select the newly added Notebook activity and in the property panel at the bottom, select the <strong>Settings<\/strong> tab. From there, click on the <strong>+ New<\/strong> button to create a new notebook.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-8.png\"><img decoding=\"async\" class=\"aligncenter size-large wp-image-9411\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-8-1024x742.png\" alt=\"Creating a new Azure Synapse Notebook\" width=\"640\" height=\"464\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-8-1024x742.png 1024w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-8-300x217.png 300w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-8-768x556.png 768w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-8.png 1287w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><em><span style=\"font-size: 10pt;\">Figure 8 &#8211; Creating a new Azure Synapse Notebook<\/span><\/em><\/p>\n<p>This will launch the Synapse Notebook editor. In the top menu bar, click on <strong>Attach to<\/strong> and select your Apache Spark pool from the list.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-9.png\"><img decoding=\"async\" class=\"aligncenter size-large wp-image-9410\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-9-1024x337.png\" alt=\"Selecting Apache Spark Pool to run an Azure Synapse Notebook\" width=\"640\" height=\"211\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-9-1024x337.png 1024w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-9-300x99.png 300w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-9-768x252.png 768w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-9.png 1287w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 10pt;\"><em>Figure 9 &#8211; Selecting Apache Spark Pool to run an Azure Synapse Notebook<\/em><\/span><\/p>\n<p>In the script editor, copy and paste the following PySpark script, making sure to update the value of the <strong>filePathMessages<\/strong> and <strong>filePathUsers<\/strong> variables with the path to the location where your data is being extracted.<\/p>\n<pre class=\"prettyprint\">from pyspark.sql import functions as f\r\nfilePathMessages = \"wasbs:\/\/&lt;messages folder location&gt;\"\r\nfilePathUsers = \"wasbs:\/\/&lt;users folder location&gt;\"\r\nstorageAccountName = \"&lt;storage account name&gt;\"\r\nstorageAccountKey = \"&lt;storage account key&gt;\"\r\nfileStorageUrl = \"fs.azure.account.key.\" + storageAccountName + \r\n\".blob.core.windows.net\" \r\n\r\nspark.conf.set(fileStorageUrl, storageAccountKey)\r\n\r\nmessageDataframe = spark.read.json(filePathMessages).withColumn(\"sender_name\", \r\nf.col(\"sender.emailAddress.name\")).withColumn(\"sender_address\", \r\nf.col(\"sender.emailAddress.address\")).withColumn(\"from_name\", \r\nf.col(\"from.emailAddress.name\")).withColumn(\"from_address\", \r\nf.col(\"from.emailAddress.address\")).withColumn(\"to_name\", \r\nf.col(\"toRecipients.emailAddress.name\")).withColumn(\"to_address\", \r\nf.col(\"toRecipients.emailAddress.address\"))\r\ntry:\r\n    <em>  # There is a chance that all entries in the ccRecipients column be empty \r\nwhich will detect the schema as being a string array.<\/em>\r\n <em>     # If it's the case, simply add the derived name and address columns as null<\/em>\r\n      data_type = dict(messageDataframe.dtypes)['ccRecipients']\r\n      if data_type == \"array&lt;string&gt;\":\r\n            messageDataframe = messageDataframe.withColumn(\"cc_name\", f.lit(\"\"))\r\n            messageDataframe = messageDataframe.withColumn(\"cc_address\", \r\nf.lit(\"\"))\r\n      else:\r\n            messageDataframe = messageDataframe.withColumn(\"cc_name\", \r\nf.col(\"ccRecipients.emailAddress.name\"))\r\n            messageDataframe = messageDataframe.withColumn(\"cc_address\", \r\nf.col(\"ccRecipients.emailAddress.address\"))\r\n\r\n<em>      # There is a chance that all entries in the bccRecipients column be empty \r\nwhich will detect the schema as being a string array.\r\n      # If it's the case, simply add the derived name and address columns as null<\/em>\r\n      data_type = dict(messageDataframe.dtypes)['bccRecipients']\r\n      if data_type == \"array&lt;string&gt;\":\r\n            messageDataframe = messageDataframe.withColumn(\"bcc_name\", f.lit(\"\"))\r\n            messageDataframe = messageDataframe.withColumn(\"bcc_address\", f.lit(\"\"))\r\n      else:\r\n            messageDataframe = messageDataframe.withColumn(\"bcc_name\", \r\nf.col(\"bccRecipients.emailAddress.name\"))\r\n            messageDataframe = messageDataframe.withColumn(\"bcc_address\", \r\nf.col(\"bccRecipients.emailAddress.address\"))\r\nfinally:\r\n      messageDataframe = messageDataframe.drop(\"from\", \"sender\", \"toRecipients\", \r\n\"ccRecipients\", \"bccRecipients\")\r\n\r\nusersDataframe = spark.read.json(filePathUsers).withColumn(\"skuIds\", \r\nf.col(\"assignedLicenses.skuId\")).withColumn(\"assignedPlanDates\", \r\nf.col(\"assignedPlans.assignedDateTime\")).withColumn(\"assignedPlanServices\", \r\nf.col(\"assignedPlans.service\")).withColumn(\"puser2\", \r\nf.col(\"puser\")).drop(\"createddatetime\",\"datarow\",\"id\",\"padditionalInfo\",\"ptenant\", \r\n\"rowInformation\", \"userrow\", \"puser\", \"pagerow\")\r\nresults = messageDataframe.join(usersDataframe, messageDataframe.puser == \r\nusersDataframe.puser2,\"inner\").drop(\"puser2\")\r\n\r\nresults.write.mode(\"overwrite\").saveAsTable(\"Results\")<\/pre>\n<p>&nbsp;<\/p>\n<p>The script will flatten the JSON\u2019s hierarchy and rename some fields that are duplicated between the two data sets. It will then merge the two data sets and store them in a new table named <strong>Results<\/strong> in our default Synapse Lake database. We are now ready to publish our pipeline and then trigger its execution.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-10.png\"><img decoding=\"async\" class=\"aligncenter size-large wp-image-9409\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-10-1024x360.png\" alt=\"Manually trigger an Azure Synapse pipeline\" width=\"640\" height=\"225\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-10-1024x360.png 1024w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-10-300x105.png 300w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-10-768x270.png 768w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-10.png 1287w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 10pt;\"><em>Figure 10 &#8211; Manually trigger an Azure Synapse pipeline<\/em><\/span><\/p>\n<p>Executing the pipeline can take several minutes to an hour to complete its execution once approvers have approved the request, depending on the size of the data you are trying to export. Make sure you do not have any pending approvals for your pipeline so that the pipeline can successfully execute. Once its execution succeeds you can go and browse the default Synapse Lake database to visualize the combined data. To do so, click on the <strong>Data<\/strong> button in the left navigation (database icon), expand the <strong>Lake database<\/strong> menu, then the <strong>default<\/strong> database. Expand the <strong>table<\/strong> folder and click on the ellipses beside the newly created <strong>Results<\/strong> database. From the flyout menu, select <strong>New SQL Script<\/strong> and then <strong>Select TOP 100 rows<\/strong>.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-11.png\"><img decoding=\"async\" class=\"aligncenter size-large wp-image-9408\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-11-1024x635.png\" alt=\"Select Top 100 Rows in Lake Database\" width=\"640\" height=\"397\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-11-1024x635.png 1024w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-11-300x186.png 300w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-11-768x476.png 768w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-11.png 1269w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 10pt;\"><em>Figure 11 &#8211; Select Top 100 Rows in Lake Database<\/em><\/span><\/p>\n<p>Next, execute the query by clicking on the <strong>Run<\/strong> button at the top of the query editor.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-12.png\"><img decoding=\"async\" class=\"aligncenter size-large wp-image-9407\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-12-1024x323.png\" alt=\"Executing a SQL query in Azure Synapse Analytics\" width=\"640\" height=\"202\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-12-1024x323.png 1024w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-12-300x95.png 300w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-12-768x242.png 768w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-12.png 1287w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><em><span style=\"font-size: 10pt;\">Figure 12 &#8211; Executing a SQL query in Azure Synapse Analytics<\/span><\/em><\/p>\n<p>This will take a few seconds to execute. Once it succeeds, you should be able to review the merged data in your lake database.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-13.png\"><img decoding=\"async\" class=\"aligncenter size-large wp-image-9406\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-13-1024x454.png\" alt=\"Image MGDC Synapse blog image 13\" width=\"640\" height=\"284\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-13-1024x454.png 1024w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-13-300x133.png 300w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-13-768x341.png 768w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-13.png 1287w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><em><span style=\"font-size: 10pt;\">Figure 13 &#8211; Reviewing Merge Data in Query Editor<\/span><\/em><\/p>\n<h2>Option 2 \u2013 Using the No Code Data Flow Activity<\/h2>\n<p>If you are less familiar with Spark coding or if you simply want a no code solution for your pipeline, you can use the <strong>Data Flow<\/strong> activity inside of your pipeline to perform transformation on your extracted data. For this demo, we will clone the pipeline we created with Option 1 above, with the one difference that we will remove the <strong>Notebook<\/strong> activity and replace it with a <strong>Data flow <\/strong>activity instead. The figure below shows the overview of what the pipeline will look like.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-14.png\"><img decoding=\"async\" class=\"aligncenter size-large wp-image-9405\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-14-1024x598.png\" alt=\"Replacing the Notebook activity by a Data flow activity\" width=\"640\" height=\"374\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-14-1024x598.png 1024w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-14-300x175.png 300w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-14-768x449.png 768w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-14.png 1287w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><em><span style=\"font-size: 10pt;\">Figure 14 &#8211; Replacing the Notebook activity by a Data flow activity<\/span><\/em><\/p>\n<p>To edit the flow\u2019s logic, select the new Data flow activity from the workbench and in the property panel at the bottom navigate to the <strong>Settings<\/strong> tab. From there, click on the <strong>+ New<\/strong> button to create a new flow.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-15.png\"><img decoding=\"async\" class=\"aligncenter size-large wp-image-9404\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-15-1014x1024.png\" alt=\"Creating a new Data flow\" width=\"640\" height=\"646\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-15-1014x1024.png 1014w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-15-297x300.png 297w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-15-150x150.png 150w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-15-768x776.png 768w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-15-24x24.png 24w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-15-48x48.png 48w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-15-96x96.png 96w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-15.png 1287w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><em><span style=\"font-size: 10pt;\">Figure 15 &#8211; Creating a new data flow<\/span><\/em><\/p>\n<p>This will launch the data flow editor. The first thing we want to do is load the data we\u2019ve exported as binary file and flatten their hierarchies. Let us start with the Messages data set. From the workbench, click on the arrow beside the <strong>Add Source<\/strong> shape and select <strong>Add Source<\/strong> from the flyout menu.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-16.png\"><img decoding=\"async\" class=\"aligncenter wp-image-9403\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-16.png\" alt=\"Adding a new data source to our flow\" width=\"600\" height=\"224\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-16.png 805w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-16-300x112.png 300w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-16-768x287.png 768w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 10pt;\"><em>Figure 16 &#8211; Adding a new data source to our flow<\/em><\/span><\/p>\n<p>From the property panel at the bottom, click on the <strong>+ New<\/strong> button beside the <strong>Dataset<\/strong> field.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-17.png\"><img decoding=\"async\" class=\"aligncenter size-large wp-image-9402\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-17-1024x590.png\" alt=\"Creating a new data set link\" width=\"640\" height=\"369\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-17-1024x590.png 1024w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-17-300x173.png 300w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-17-768x442.png 768w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-17.png 1287w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><em><span style=\"font-size: 10pt;\">Figure 17 &#8211; Creating a new data set link<\/span><\/em><\/p>\n<p>From the Integration data set blade, select <strong>Azure Data Lake Storage Gen2<\/strong> and click on the <strong>Continue<\/strong> button at the bottom.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-18.png\"><img decoding=\"async\" class=\"aligncenter size-large wp-image-9401\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-18-978x1024.png\" alt=\"New Azure Data Lake Storage Gen2 integration data set\" width=\"640\" height=\"670\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-18-978x1024.png 978w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-18-286x300.png 286w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-18-768x804.png 768w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-18-24x24.png 24w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-18.png 1223w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 10pt;\">Figure 18 &#8211; New Azure Data Lake Storage Gen2 integration data set<\/span><\/p>\n<p>On the data format selection blade, select <strong>JSON<\/strong> and click on the <strong>Continue <\/strong>button at the bottom.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-19.png\"><img decoding=\"async\" class=\"aligncenter wp-image-9400 size-large\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-19-904x1024.png\" alt=\"Selecting JSON format for our integration data set\" width=\"640\" height=\"725\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-19-904x1024.png 904w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-19-265x300.png 265w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-19-768x870.png 768w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-19.png 1191w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 10pt;\"><em>Figure 19 &#8211; Selecting JSON format for our integration data set<\/em><\/span><\/p>\n<p>On the <strong>Set properties<\/strong> blade, select the linked service you\u2019ve previously created to your Azure Data Lake Storage location, specify the path to the folder where your messages data is being exported and click on the <strong>OK<\/strong> button at the bottom.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-20.png\"><img decoding=\"async\" class=\"aligncenter wp-image-9399\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-20-535x1024.png\" alt=\"Selecting files location for our integration data set\" width=\"450\" height=\"861\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-20-535x1024.png 535w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-20-157x300.png 157w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-20.png 651w\" sizes=\"(max-width: 450px) 100vw, 450px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 10pt;\"><em>Figure 20 &#8211; Selecting files location for our integration data set<\/em><\/span><\/p>\n<p>Repeat the same process, but this time for the Users data set. Make sure you set the path to point to the Users folder. At this stage, you should have a Data Flow that looks like the following:<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-21.png\"><img decoding=\"async\" class=\"aligncenter wp-image-9398 size-large\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-21-1024x370.png\" alt=\"Data flow with just two sources\" width=\"640\" height=\"231\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-21-1024x370.png 1024w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-21-300x108.png 300w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-21-768x277.png 768w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-21.png 1287w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><em><span style=\"font-size: 10pt;\">Figure 21 &#8211; Data flow with just two sources<\/span><\/em><\/p>\n<p>Next, we need to combine the two data sources. To do so, click on the <strong>+<\/strong> sign on the right of the Messages source in the workbench. From the list of actions, select <strong>Join<\/strong>.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-22.png\"><img decoding=\"async\" class=\"aligncenter size-large wp-image-9397\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-22-1024x671.png\" alt=\"Adding a Join Action to a data flow\" width=\"640\" height=\"419\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-22-1024x671.png 1024w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-22-300x197.png 300w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-22-768x504.png 768w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-22.png 1243w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 10pt;\"><em>Figure 22 &#8211; Adding a Join Action to a data flow<\/em><\/span><\/p>\n<p>In the editor panel at the bottom, select the <strong>Users source <\/strong>in the <strong>Right Stream<\/strong> drop down. For the <strong>Join condition<\/strong> make sure to select the <strong>pUser<\/strong> field for both sources.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-23.png\"><img decoding=\"async\" class=\"aligncenter size-large wp-image-9396\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-23-1024x380.png\" alt=\"Configuring a Join Action on the pUser Field\" width=\"640\" height=\"238\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-23-1024x380.png 1024w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-23-300x111.png 300w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-23-768x285.png 768w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-23.png 1287w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 10pt;\"><em>Figure 23 &#8211; Configuring a Join Action on the pUser Field<\/em><\/span><\/p>\n<p>Because both data sets will contain fields that are named the same, running the flow as-is would throw errors complaining about duplicate columns (e.g. pageRow). To remediate to the issue, we will need to add a select action after the Join action to skip duplicate columns since we do not need them both for our scenario. In the workbench, beside the Join action added above, click on the <strong>+ <\/strong>sign and select the <strong>Select<\/strong> action from the list.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-24.png\"><img decoding=\"async\" class=\"aligncenter size-large wp-image-9395\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-24-1024x790.png\" alt=\"Adding a Select Action to a data flow\" width=\"640\" height=\"494\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-24-1024x790.png 1024w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-24-300x231.png 300w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-24-768x593.png 768w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-24.png 1287w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 10pt;\"><em>Figure 24 &#8211; Adding a Select Action to a data flow<\/em><\/span><\/p>\n<p>Simply leave the default values, which by default will skip the duplicate columns.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-25.png\"><img decoding=\"async\" class=\"aligncenter size-large wp-image-9394\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-25-1024x451.png\" alt=\"Configuring the Select Action\" width=\"640\" height=\"282\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-25-1024x451.png 1024w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-25-300x132.png 300w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-25-768x338.png 768w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-25.png 1287w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><em><span style=\"font-size: 10pt;\">Figure 25 &#8211; Configuring the Select Action<\/span><\/em><\/p>\n<p>The last step in our data flow is to determine where to store the combined data. In our case, we will be selecting a new workspace as the data sink. In the workbench, beside the newly added Select action, click on the <strong>+<\/strong> sign and select <strong>Sink <\/strong>from the list.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-26.png\"><img decoding=\"async\" class=\"aligncenter size-large wp-image-9393\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-26-1024x594.png\" alt=\"Adding a Sink Action to a data flow\" width=\"640\" height=\"371\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-26-1024x594.png 1024w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-26-300x174.png 300w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-26-768x445.png 768w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-26.png 1287w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 10pt;\"><em>Figure 26 &#8211; Adding a Sink Action to a data flow<\/em><\/span><\/p>\n<p>In the editor panel at the bottom, select <strong>Workspace DB<\/strong> from the Sink type list, select the <strong>default<\/strong> database from the database list, and type in <strong>Results<\/strong> as the name of the table. Note that this will create a new table named Results in the default database and store the combined data sets in it.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-27.png\"><img decoding=\"async\" class=\"aligncenter wp-image-9392 size-large\" src=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-27-1024x498.png\" alt=\"Configuring the Sink Action for our data flow\" width=\"640\" height=\"311\" srcset=\"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-27-1024x498.png 1024w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-27-300x146.png 300w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-27-768x374.png 768w, https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-content\/uploads\/sites\/73\/2022\/01\/MGDC-Synapse-blog_image-27.png 1287w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 10pt;\"><em>Figure 27 &#8211; Configuring the Sink Action for our data flow<\/em><\/span><\/p>\n<p>This completes the steps required to combine our data sets using a Data flow activity. You can now publish your pipeline and execute it. Once it successfully completes, you can follow the steps from the previous section (Option 1) to check the merged data using the <strong>SELECT TOP 100 ROWS<\/strong> query.<\/p>\n<h2>Conclusion<\/h2>\n<p>In this article, we have shown you two ways of combining data sets extracted from Microsoft Graph Data Connect using Azure Synapse Analytics. While developers may tend to prefer the Spark Notebook approach (Option 1) over the low-code approach described in Option 2, note that both approaches will result in the same outcome. You should choose the approach you and your organization are more familiar with. The data we have exported and combined could then be connected to a Power BI dashboard using its Azure Synapse Analytics connector to generate valuable Insights and Analysis for your organization.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This article covers the process of combining two data sets extracted via an Azure Synapse pipeline using Microsoft Graph Data Connect (MGDC).<\/p>\n","protected":false},"author":72603,"featured_media":9451,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[3],"tags":[61,161,31],"class_list":["post-9391","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-microsoft-graph","tag-azure","tag-azure-synapse","tag-microsoft-graph-data-connect"],"acf":[],"blog_post_summary":"<p>This article covers the process of combining two data sets extracted via an Azure Synapse pipeline using Microsoft Graph Data Connect (MGDC).<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-json\/wp\/v2\/posts\/9391","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-json\/wp\/v2\/users\/72603"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-json\/wp\/v2\/comments?post=9391"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-json\/wp\/v2\/posts\/9391\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-json\/wp\/v2\/media\/9451"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-json\/wp\/v2\/media?parent=9391"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-json\/wp\/v2\/categories?post=9391"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/microsoft365dev\/wp-json\/wp\/v2\/tags?post=9391"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}