December 16th, 2021

Microsoft Graph Data Connect: Demystifying User Scopes

Microsoft Graph Data Connect allows you to extract data in bulk from your Microsoft 365 environments using Azure Data Factory pipelines. When creating a pipeline to extract Microsoft 365 data using Microsoft Graph Data Connect, you need to define what I refer to as a “Data Contract”. That contract represents the scope, properties and filters that you will apply to the data that you’re retrieving. Users will always need to define what Output columns they wish to extract (e.g. toRecipients and cc Recipients for email messages, ReplyTo and Body for Teams Chat messages, etc.) and for most data sets they will also need to specify a date range for the objects to be extracted. Some of the data sets that are available in Microsoft Graph Data Connect also allow users to specify a user scope for the extraction (e.g. messages data set, Teams chat data set, contacts data set, etc.)

User Scope

The user scope option lets you either specify that you wish to extract objects for All users in the Office 365 tenant or to Select groups from the Office 365 tenant. Selecting the latter option will allow you to select one or more Azure Active Directory Security or Microsoft 365 groups. In talking to customers and partners, we’ve come to realize that there is some confusion around how this user scope option works, and what effect it will have on the resulting data extracted. This article covers different scenarios to set the record straight and help you understand how to properly leverage the user scope in your Microsoft Graph Data Connect solutions.

Examples

To better illustrate the functioning of the Microsoft Graph Data Connect user scope, we will be using a few examples we designed using one of our test Microsoft 365 tenant. In our scenarios, we have created a new Azure Active Directory security group named Alex and Allan which contains two members from our organization: Alex Wilber and Allan Deyoung. Both members have been assigned an Office 365 license and have been exchanging emails with their peers.

We have built an Azure Data Factory pipeline which extracts information using the Microsoft Graph Data Connect BasicDataSet_v0.Message_v1 data set which represents all emails in users’ mailboxes. We filtered the Output columns to only keep the id, from, toRecipients, ccRecipients and bccRecipients fields. In the Date filter section, we only selected the past 24 hours as the timespan to retrieve emails based on their sent date. Using the User scope option, we selected the newly created security group named Alex and Allan as the only group of users to extract information from.

Overview of the Copy Data activity's source

Figure 1: Overview of the Copy Data activity’s source

Now that we have our pipeline defined, let us take a look at various instances of emails revolving around Alex and Allan. The following table summarizes various test emails that were sent across members of our test organization and whether or not these are going to be captured by our pipeline.

From To CC BCC Included by MGDC Messages?
Nestor Wilke Alex Wilber ✔️
Nestor Wilke Allan Deyoung ✔️
Nestor Wilke Alex Wilber J.Smith@external.com ✔️
Nestor Wilke Miriam Graham
Nestor Wilke Miriam Graham Alex Wilber ✔️
Nestor Wilke Miriam Graham Allan Deyoung ✔️
Nestor Wilke Miriam Graham

Allan Deyoung

✔️
Allan Deyoung Megan Graham ✔️
Allan Deyoung Megan Graham Patti Fernandez ✔️
Allan Deyoung Megan Graham Patti Fernandez ✔️
Allan Deyoung Alex Wilber ✔️
Allan Deyoung J.Smith@external.com ✔️
Alex Wilber Megan Graham ✔️

Table 1 – Scenarios for the messages data set

From the table above, we can see that we are essentially returning every object where Alex Wilber or Allan Deyoung are at least in the From, To, CC or BCC field even if they include other actors. Only objects that only include actors other than Alex and Allan are being filtered out. Let us now consider the same examples, but this time for the Sent Items data set.

The Sent Items data set only contains emails that were sent from the selected users’ mailboxes. Using this data set over the Messages data set can sometime be a great way to filter noise from external emails and reduce duplication. We can see from Table 1-2 that we are getting different results than with the Messages data set:

From To CC BCC Included by MGDC SentItems?
Nestor Wilke Alex Wilber
Nestor Wilke Allan Deyoung
Nestor Wilke Alex Wilber J.Smith@external.com
Nestor Wilke Miriam Graham
Nestor Wilke Miriam Graham Alex Wilber
Nestor Wilke Miriam Graham Allan Deyoung
Nestor Wilke Miriam Graham

Allan Deyoung

Allan Deyoung Megan Graham ✔️
Allan Deyoung Megan Graham Patti Fernandez ✔️
Allan Deyoung Megan Graham Patti Fernandez ✔️
Allan Deyoung Alex Wilber ✔️
Allan Deyoung J.Smith@external.com ✔️
Alex Wilber Megan Graham ✔️

Table 2 – Scenarios for the sent items data set

The third data set is the newly released Inbox data set. This data set represents all emails that are currently sitting in a user’s Inbox folder. It is very important to note that if a user uses folders inside of their mailbox for email classification, the emails in these folders will not be captured by the Inbox data set. The following two screenshots represent Alex Wilber’s mailbox. Figure 2 shows the emails that Alex has stored directly within its Inbox folder, whereas Figure 3 depicts emails are moved under a folder named Project Arkana.

Alex Wilber's Inbox folder

Figure 2 – Alex Wilber’s Inbox folder

Alex Wilber's Project Arkana folder content

Figure 3 – Alex Wilber’s Project Arkana folder content

Let us now configure our Azure Data Factory pipeline and extract emails for today using the Inbox data set. Just like the two previous examples, I am setting a sentDateTime filter to extract only emails for today, and I am using the User scope filter to only extract items for Alex and Allan.

Inbox pipeline configuration

Figure 4 – Inbox pipeline configuration

 

Folder From Subject Included by MGDC Inbox?
Inbox Nestor Wilke Email from Nestor 1 ✔️
Inbox Allan Deyoung Email from Allan 2 ✔️
Inbox Allan Deyoung Email from Allan 3 ✔️
Project Arkana Allan Deyoung Email from Allan 1
Project Arkana Nestor Wilke Email from Nestor 2

Table 3 – Scenarios for the inbox data set

Get Started Today

Hopefully this article helped shed some light on how the User scope filter works inside of Microsoft Graph Data Connect and that it helped you better understand the differences between the Messages, Sent Items and Inbox data set. If you want to learn more about Microsoft Graph Data Connect you can refer to our official documentation and tutorials at Overview of Microsoft Graph Data Connect – Microsoft Graph | Microsoft Docs

Happy Coding!

0 comments

Discussion are closed.