October 2nd, 2025
intriguing1 reaction

Leveraging Fabric Eventhouse to Store Great Expectations Validation Results

Dipika Varpe
Senior Software Engineer

In a previous post, we explored how to perform data validations with Great Expectations (GX) in Microsoft Fabric. We walked through how to set up the validation framework and ensure your data meets the quality standards expected. Now, in this follow-up post, we’ll show how to store the GX validation results in Fabric Eventhouse and how to easily query the data for insights. We’ll also demonstrate how Eventhouse can act as an audit store, keeping track of all the validation results and providing real-time monitoring capabilities.

Why Store GX Validation Results in Fabric Eventhouse?

Integrating GX validation results with Fabric Eventhouse brings powerful capabilities for organizations seeking scalable, observable, and well-governed data validation workflows. While other databases may offer basic data storage, Eventhouse is uniquely suited to the nature of validation data due to the following:

  • Native Event Model for Observability: Eventhouse is purpose-built for storing and querying event-based data — making it a natural fit for capturing GX validation runs as discrete, timestamped events. This allows for real-time monitoring, anomaly detection, and pattern analysis over time.
  • Integrated with the Microsoft Fabric Ecosystem: If you’re already using Microsoft Fabric for data engineering or analytics, storing GX validation results in Eventhouse allows seamless integration with your existing pipelines, dashboards (Power BI), and governance tools — reducing friction and increasing visibility across teams.
  • Efficient Querying at Scale: Validation data tends to grow quickly, especially in CI/CD or automated testing contexts. Eventhouse’s optimized architecture for high-throughput ingest and fast analytical queries ensures that you can access both recent and historical validation results efficiently, even at scale.
  • Centralized and Immutable Audit Trail: Unlike traditional relational databases that may require custom schema design for auditing, Eventhouse offers out-of-the-box support for event history and change tracking, making it easier to maintain a robust and transparent audit trail for compliance and debugging.

Setting Up Fabric Eventhouse

Before we can store the validation results from Great Expectations, we need to set up Fabric Eventhouse. Here’s how you can do it:

  1. Create an Eventhouse in Microsoft Fabric:
    • Navigate to the Microsoft Fabric portal.
    • In the Fabric UI, go to Eventhouse and click on Create New Eventhouse.
    • Choose the appropriate name.
    • Click Create to provision your Eventhouse.

    Create Eventhouse in Fabric

This process will automatically create a KQL database with the same name inside Eventhouse, which we are going to use to store our data validation results.

Storing GX Validation Results in Eventhouse

Now that we have Eventhouse ready, let’s look at how to store your GX validation results in Eventhouse after running your data validation.

In the previous post, we executed data validations using Great Expectations (GX) and obtained the validation results. Now, let’s see how to store this validation result in Eventhouse for future analysis and auditing.

💡 Tip : You can continue working in the same notebook you created in the previous blog post. Just add the following steps to enrich it further by storing and analyzing the validation outputs.

We will use the following validation result from previous blog post:

{
  "results": [
    {
      "success": true,
      "expectation_config": {
        "type": "expect_column_values_to_be_unique",
        "kwargs": {
          "batch_id": "test_data_source-test_data",
          "column": "ID"
        }
      },
      "result": {
        "element_count": 5,
        "unexpected_count": 0,
        "unexpected_percent": 0.0
      }
    },
    {
      "success": false,
      "expectation_config": {
        "type": "expect_column_values_to_be_between",
        "kwargs": {
          "batch_id": "test_data_source-test_data",
          "column": "Score",
          "min_value": 80.0
        }
      },
      "result": {
        "element_count": 5,
        "unexpected_count": 1,
        "unexpected_percent": 20.0,
        "partial_unexpected_list": [
          78.9000015258789
        ]
      }
    }
  ]
  // ... additional metadata and summary fields
}

Preparing the Data

We will convert this result into a Spark DataFrame and enrich it with some additional data fields that will help us identify each validation run in the future. Here’s the code for preparing the data:

from pyspark.sql.types import StructType, StructField, StringType
from datetime import datetime
import json
import uuid

# Define the schema for the Spark DataFrame to ensure type consistency

schema = StructType([
    StructField("GX Validation ID", StringType(), True),   # Unique ID for each validation entry
    StructField("GX Validation Rule", StringType(), True), # Rule type and parameters applied
    StructField("Validation date", StringType(), True),    # Timestamp of validation run
    StructField("Data Record ID", StringType(), True),     # ID of the validated record/rule
    StructField("Detailed Result", StringType(), True),    # Full detailed result (JSON string)
    StructField("Result", StringType(), True)              # Overall success/failure ("True"/"False")
])

# Build the list of dictionaries based on validation_results (assuming validation_results is already defined)
gx_data = []

# Get metadata for the validation
data_source = validation_results["meta"]["active_batch_definition"]["datasource_name"]
validation_date = datetime.now().strftime("%Y-%m-%d %H:%M:%S")

# Loop through the validation results and structure the data
for result in validation_results["results"]:
    gx_data.append({
        "GX Validation ID": str(uuid.uuid4()),  # Use uuid4 to generate unique IDs
        "GX Validation Rule": f'{result["expectation_config"]["type"]}, kwargs = {result["expectation_config"]["kwargs"]}',
        "Validation Date": validation_date,
        "Data Record ID": result["expectation_config"]["id"],
        "Detailed Result": json.dumps(result["result"]),
        "Result": str(result["success"]),  # Store success as a string ("True"/"False")
    })

# Create a Spark DataFrame from the list of dictionaries and schema
validation_df = spark.createDataFrame(gx_data, schema=schema)

# Display the DataFrame to inspect the data
display(validation_df)

This will generate output as follows:

Data Preparation Result

Storing the Data in Eventhouse

Next, we’ll store this data into the KQL database (Eventhouse) for querying and analysis. The following code shows how to write the DataFrame to Eventhouse using batch ingestion:

# Replace your connection details here
kustoUri = "<YOUR_CLUSTER_URI>"
database_name = "<YOUR_DATABASE_NAME>"

accessToken = mssparkutils.credentials.getToken(kustoUri)

# Write to validation result in KQL DB
validation_df.write \
  .format("com.microsoft.kusto.spark.datasource") \
  .option("kustoCluster", kustoUri) \
  .option("kustoDatabase", database_name) \
  .option("kustoTable", "MyEventTable") \
  .option("tableCreationMode", "CreateIfNotExist") \
  .option("accessToken", accessToken )\
  .option("tableCreateOptions", "CreateIfNotExist") \
  .mode("Append") \
  .save()

Note: The above example uses batch ingestion, which is well suited for periodic data uploads (e.g., hourly, daily validations). For real-time or near real-time data ingestion, Eventhouse (Kusto) also supports streaming ingestion. ➡️ Learn more about streaming ingestion here in the official documentation.

Query the Data Using KQL (Fabric Notebook in PySpark)

Now that your data is written to Eventhouse, you can query it using KQL in your Fabric Notebook. Here’s a simple KQL query to fetch 5 records from the table:

kustoQuery = "MyEventTable | take 5"

# Replace your connection details here
kustoUri = "<YOUR_CLUSTER_URI>"
database_name = "<YOUR_DATABASE_NAME>"

accessToken = mssparkutils.credentials.getToken(kustoUri)

# Query Eventhouse using KQL in Fabric Notebook
kustoDf = spark.read \
    .format("com.microsoft.kusto.spark.synapse.datasource") \
    .option("accessToken", accessToken) \
    .option("kustoCluster", kustoUri) \
    .option("kustoDatabase", database_name) \
    .option("kustoQuery", kustoQuery) \
    .load()

# Display the results
kustoDf.show()

This query retrieves 5 records from the MyEventTable in your Eventhouse. Replace the <YOUR_CLUSTER_URI> and <DATABASE> placeholders with your actual cluster URI and database name.

Visualizing the Data

View the Data Inside Eventhouse Using QuerySet

You can view the data directly in KQL by using the following queries:

  1. Simple KQL Query to View the Data: Use this query to view the top 100 rows from your MyEventTable:

    MyEventTable
    | take 100

    View Data using KQL Queryset

  2. KQL Query to view Nested JSON Data: If you have nested data (such as DetailedResult), you can flatten it using KQL:

    MyEventTable
    | extend result_json = parse_json(['Detailed Result'])
    | extend 
        element_count = toint(result_json.element_count),
        unexpected_count = toint(result_json.unexpected_count),
        unexpected_percent = todouble(result_json.unexpected_percent),
        partial_unexpected_list = tostring(result_json.partial_unexpected_list)
    | project 
        validation_id = ['GX Validation ID'],
        validation_rule = ['GX Validation Rule'],
        validation_date = ['Validation date'],
        result = Result,
        element_count, 
        unexpected_count, 
        unexpected_percent, 
        partial_unexpected_list

    This will flatten the Detailed Result field and give you an element_count, unexpected_count, unexpected_percent etc.

    KQL Query Result

  3. KQL Query to render the Charts: You can easily visualize KQL query results as charts in tools that support the render operator, such as KQL Queryset in Microsoft Fabric.

      MyEventTable
      | extend status = iff(tolower(Result) == "true", "✅ Passed", "❌ Failed")
      | summarize total = count() by status
      | render piechart

KQL Result Pie Chart

Similarly, you can visualize the data in various formats like bar charts, line charts, scatter plots, or even as tables, depending on the data and the type of analysis you wish to perform.

➡️ Learn more about the `render` operator.

Visualizing Your Data in Power BI

Once your GX validation results are stored in Fabric Eventhouse, you can easily visualize them using Power BI — and the best part is, you can kick it all off directly from the Fabric UI itself!

Steps to Create a Power BI Report from Fabric

  1. Run a KQL query in the Queryset (inside Fabric).
  2. Once the results are displayed, simply click on “Create Power BI report”.
  3. The query results will automatically be used to create a semantic model.
  4. From there, you can design your Power BI report — add pie charts, bar graphs, filters, and any visuals you like.

Steps to create power bi report from fabric

This integration makes it incredibly quick and intuitive to go from raw validation results to a full-blown analytics dashboard—all without leaving the Fabric ecosystem.

Conclusion

By integrating Great Expectations with Fabric Eventhouse, you not only store your validation results but also gain the ability to run complex queries, visualize the data, and build powerful dashboards. This setup enables you to monitor your data validation efforts in real-time, providing you with the insights needed to ensure data quality and make informed decisions.

If you haven’t yet explored the possibilities of Fabric Eventhouse and Great Expectations together, now is the perfect time to start. Your data management strategy will benefit from more transparency, easier querying, and the ability to make better, faster decisions.

References

Category
CSEISE
Topics
ISE

Author

Dipika Varpe
Senior Software Engineer

I’m a passionate software developer who loves solving problems with technology. I work with large data, cloud, and AI to build practical solutions. I enjoy learning new things and sharing ideas with other professionals. Connecting with peers and exchanging knowledge truly inspires me.