Introduction
In today’s data-driven world, businesses need a structured approach to managing foundational data assets. The Unified Data Model (UDM) provides a scalable and governed framework for modeling key entities while keeping data assets maintainable and extensible.
In this post, we will use a hypothetical business entity as an example to demonstrate how UDM effectively structures data.
We’ll model a Base Profile, an Extension, and a Dimension to show how the same data assets can be reused across multiple scenarios.
We will also explore how the UDM approach simplifies data storage, making it easier to query and build future scenarios. Additionally, we will discuss its role in validation at every step, minimizing problem identification time and reducing potential re-statement costs.
Moreover, we’ll highlight how this method decreases the time required to construct future scenarios.
Hypothetical Business Scenario: Modeling the “Game Developer Profile”
Imagine we are a gaming company aiming to better understand our game developers and the challenges they encounter. Our goal is to analyze this by utilizing data effectively.
Our strategy involves creating a Game Developer Profile and segmenting the data based on various aspects, such as:
- Region
- Age group
- Game pricing
- Customer game count
- Other relevant developer attributes
Let’s break down how this data can be structured using Base Profiles, Extensions, and Dimensions to improve clarity and implementation.
Step 1: Creating the Base Profile
Let’s establish a foundational profile for this use case. A Profile represents a standard business concept, such as a user or a purchase order. Most organizational data assets can be linked to or directly define these profile entities.
Structuring data in this way:
- Simplifies data discovery and usage
- Avoids redundancy and repetitive definitions
- Provides a scalable foundation for extensions
In our system, game developers are a fundamental business entity, and thus, they are modeled as a Profile in UDM.
Game Developer Profile Schema
Column Name | Data Type | Nullable | Privacy Category | Description |
---|---|---|---|---|
DeveloperId | GUID | No | Internal | Unique identifier for each developer |
- Primary Key: Developer Id
- Team Responsible: Game Analytics Team
- Business Context: This dataset will monitor all game developers across all platforms.
- Use Case: This profile will lay a foundation for various extensions, such as:
- Developer financial performance analysis
- Engagement analytics
- User behavior tracking
Important to note, we have set the data type of DeveloperId to Guid to enhance its performance when joining with other data assets.
Step 2: Introducing the Developer Core Properties Extension
Let’s extend the newly created profile with additional developer core properties.
An Extension is a data asset that enhances a Profile by adding new properties without modifying the base profile definition. Extensions help capture frequently changing or event-driven data associated with the base profile.
In this context, we will introduce an extension for game developers that includes attributes that change slowly over time. This approach keeps the core profile lean and efficient, while allowing extensions to operate independently. The extension helps answer questions like:
- “Who is the developer?”
- “What are their key attributes?”
Developer Core Properties Extension Schema
Column Name | Data Type | Nullable | Privacy Category | Description |
---|---|---|---|---|
DeveloperId | GUID | No | Internal | Unique identifier for each developer |
DeveloperName | String | No | Public | Name of the game developer |
FoundedYear | DateTime | Yes | Public | Year the company was founded |
CountryId | Long | Yes | Public | Foreign key linking to the Country Dimension |
TotalGamesPublished | Int | Yes | Public | Total number of games published by the developer |
PrimaryGenre | String | Yes | Public | The main game genre the developer specializes in |
- Associated Base Profile: Game Developer Profile
- Join Cardinality: 1:1
- Primary Key: DeveloperId
- Responsible Team: Game Analytics Team
- Business Scenario: Tracks developer key attributes over time.
- Use Case: Provides insights into developer attributes and publishing activity
Note
-
This extension’s join cardinality with the Game Developer Profile is 1:1, meaning each developer has exactly one corresponding row.
-
The extension includes CountryId, which links to the Country Dimension to ensure geographic standardization.
Step 3: Introducing the Country Dimension
Instead of storing Country as a free-text attribute in our profile, we normalize this data using a Dimension.
Why Use a Dimension?
- Ensures consistency across datasets.
- Prevents data duplication and redundancy.
- Optimizes performance by using foreign keys instead of raw text values.
- Allows easy updates without affecting other datasets.
For this use case, we link the developer’s country to a standardized Country Dimension, ensuring uniformity.
Country Dimension Schema
Column Name | Data Type | Nullable | Privacy Category | Description |
---|---|---|---|---|
CountryId | Long | No | Internal | Unique identifier for the country (e.g., ISO 3166 code) |
CountryName | String | No | Public | Full name of the country |
Region | String | Yes | Public | Geographic region (e.g., North America, Europe) |
Subregion | String | Yes | Public | More granular geographic grouping (e.g., Western Europe, Southeast Asia) |
- Primary Key: CountryId
- Team Responsible: Microsoft Sales Data Team
- Business Scenario: Provides a single source of truth for geographic data.
- Use Case: Used in reporting and analytics for geographic segmentation.
Step 4: Creating an Extension for Revenue Insights
Instead of adding revenue-related attributes directly to the Game Developer Profile, we create an Extension to store financial data separately.
Game Developer Revenue Extension Schema
Column Name | Data Type | Nullable | Privacy Category | Description |
---|---|---|---|---|
DeveloperId | GUID | No | Internal | Foreign key linking to Game Developer Profile |
RevenueMonth | String | No | Internal | Reporting month (YYYY-MM) |
TotalRevenue | Float | Yes | Internal | Total revenue generated by the developer |
NumberOfTransactions | Int | Yes | Internal | Number of game purchases contributing to revenue |
Platform | String | Yes | Internal | The platform where revenue was generated (PC, Console, Mobile) |
- Associated Base Profile: Game Developer Profile
- Join Cardinality: 1:Many
- Responsible Team: Game Analytics Team
Step 5: Querying the Structured Data
Using U-SQL, we can efficiently analyze top-earning game developers by country:
@DeveloperRevenue = SELECT d.DeveloperId, d.DeveloperName, c.CountryName, r.RevenueMonth, r.TotalRevenue FROM GameDeveloperProfile AS d INNER JOIN GameDeveloperRevenueExtension AS r ON d.DeveloperId = r.DeveloperId INNER JOIN CountryDimension AS c ON d.CountryId = c.CountryId WHERE r.RevenueMonth = "2025-01";OUTPUT @DeveloperRevenueTO "/reports/top_earning_developers_by_country.csv"USING Outputters.Csv();
Why use UDM for this?
By structuring our data using UDM principles:
- Scalability – The Game Developer Profile remains lean, avoiding unnecessary updates due to frequently changing attributes.
- Performance – Queries are more efficient since extensions allow us to store and access dynamic data separately.
- Governance – Using a Country Dimension ensures that geographic data is standardized and centrally managed.
- Consistency – Referencing eo avoids data duplication and prevents inconsistencies in country names across different datasets.
- Easy Maintenance – Since each extension has its own validations, it makes it easy to isolate the issue and fix
Would you structure your business data differently? Share your thoughts in the comments!
0 comments
Be the first to start the discussion.