{"id":16408,"date":"2025-10-02T00:00:00","date_gmt":"2025-10-02T07:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/ise\/?p=16408"},"modified":"2025-10-02T06:04:51","modified_gmt":"2025-10-02T13:04:51","slug":"leveraging-fabric-eventhouse-to-store-great-expectations-validation-results","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/ise\/leveraging-fabric-eventhouse-to-store-great-expectations-validation-results\/","title":{"rendered":"Leveraging Fabric Eventhouse to Store Great Expectations Validation Results"},"content":{"rendered":"<p>In a <a href=\"https:\/\/devblogs.microsoft.com\/ise\/data-validations-with-great-expectations-in-ms-fabric\/\">previous post<\/a>, we explored how to perform data validations with Great Expectations (GX) in Microsoft Fabric. We walked through how to set up the validation framework and ensure your data meets the quality standards expected. Now, in this follow-up post, we\u2019ll show how to store the GX validation results in <strong>Fabric Eventhouse<\/strong> and how to easily query the data for insights. We\u2019ll also demonstrate how Eventhouse can act as an <strong>audit store<\/strong>, keeping track of all the validation results and providing real-time monitoring capabilities.<\/p>\n<h2>Why Store GX Validation Results in Fabric Eventhouse?<\/h2>\n<p>Integrating GX validation results with Fabric Eventhouse brings powerful capabilities for organizations seeking scalable, observable, and well-governed data validation workflows. While other databases may offer basic data storage, Eventhouse is uniquely suited to the nature of validation data due to the following:<\/p>\n<ul>\n<li><strong>Native Event Model for Observability<\/strong>: Eventhouse is purpose-built for storing and querying event-based data \u2014 making it a natural fit for capturing GX validation runs as discrete, timestamped events. This allows for real-time monitoring, anomaly detection, and pattern analysis over time.<\/li>\n<li><strong>Integrated with the Microsoft Fabric Ecosystem<\/strong>: If you&#8217;re already using Microsoft Fabric for data engineering or analytics, storing GX validation results in Eventhouse allows seamless integration with your existing pipelines, dashboards (Power BI), and governance tools \u2014 reducing friction and increasing visibility across teams.<\/li>\n<li><strong>Efficient Querying at Scale<\/strong>: Validation data tends to grow quickly, especially in CI\/CD or automated testing contexts. Eventhouse\u2019s optimized architecture for high-throughput ingest and fast analytical queries ensures that you can access both recent and historical validation results efficiently, even at scale.<\/li>\n<li><strong>Centralized and Immutable Audit Trail<\/strong>: Unlike traditional relational databases that may require custom schema design for auditing, Eventhouse offers out-of-the-box support for event history and change tracking, making it easier to maintain a robust and transparent audit trail for compliance and debugging.<\/li>\n<\/ul>\n<h2>Setting Up Fabric Eventhouse<\/h2>\n<p>Before we can store the validation results from Great Expectations, we need to set up <strong>Fabric Eventhouse<\/strong>. Here\u2019s how you can do it:<\/p>\n<ol>\n<li><strong>Create an Eventhouse in Microsoft Fabric<\/strong>:\n<ul>\n<li>Navigate to the Microsoft Fabric portal.<\/li>\n<li>In the <strong>Fabric UI<\/strong>, go to <strong>Eventhouse<\/strong> and click on <strong>Create New Eventhouse<\/strong>.<\/li>\n<li>Choose the appropriate name.<\/li>\n<li>Click <strong>Create<\/strong> to provision your Eventhouse.<\/li>\n<\/ul>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2025\/10\/create-eventhouse.png\" alt=\"Create Eventhouse in Fabric\" \/><\/li>\n<\/ol>\n<p>This process will automatically create a <code>KQL database<\/code> with the same name inside Eventhouse, which we are going to use to store our data validation results.<\/p>\n<h2>Storing GX Validation Results in Eventhouse<\/h2>\n<p>Now that we have Eventhouse ready, let&#8217;s look at how to store your GX validation results in Eventhouse after running your data validation.<\/p>\n<p>In the previous post, we executed data validations using Great Expectations (GX) and obtained the validation results. Now, let\u2019s see how to store this validation result in Eventhouse for future analysis and auditing.<\/p>\n<p>\ud83d\udca1 <strong>Tip<\/strong> : You can continue working in the same notebook you created in the previous blog post. Just add the following steps to enrich it further by storing and analyzing the validation outputs.<\/p>\n<p>We will use the following <a href=\"https:\/\/devblogs.microsoft.com\/ise\/data-validations-with-great-expectations-in-ms-fabric\/#validation-results\"><strong>validation result<\/strong><\/a> from previous blog post:<\/p>\n<pre><code class=\"language-json\">{\r\n  \"results\": [\r\n    {\r\n      \"success\": true,\r\n      \"expectation_config\": {\r\n        \"type\": \"expect_column_values_to_be_unique\",\r\n        \"kwargs\": {\r\n          \"batch_id\": \"test_data_source-test_data\",\r\n          \"column\": \"ID\"\r\n        }\r\n      },\r\n      \"result\": {\r\n        \"element_count\": 5,\r\n        \"unexpected_count\": 0,\r\n        \"unexpected_percent\": 0.0\r\n      }\r\n    },\r\n    {\r\n      \"success\": false,\r\n      \"expectation_config\": {\r\n        \"type\": \"expect_column_values_to_be_between\",\r\n        \"kwargs\": {\r\n          \"batch_id\": \"test_data_source-test_data\",\r\n          \"column\": \"Score\",\r\n          \"min_value\": 80.0\r\n        }\r\n      },\r\n      \"result\": {\r\n        \"element_count\": 5,\r\n        \"unexpected_count\": 1,\r\n        \"unexpected_percent\": 20.0,\r\n        \"partial_unexpected_list\": [\r\n          78.9000015258789\r\n        ]\r\n      }\r\n    }\r\n  ]\r\n  \/\/ ... additional metadata and summary fields\r\n}<\/code><\/pre>\n<h3>Preparing the Data<\/h3>\n<p>We will convert this result into a <strong>Spark DataFrame<\/strong> and enrich it with some additional data fields that will help us identify each validation run in the future. Here&#8217;s the code for preparing the data:<\/p>\n<pre><code class=\"language-python\">from pyspark.sql.types import StructType, StructField, StringType\r\nfrom datetime import datetime\r\nimport json\r\nimport uuid\r\n\r\n# Define the schema for the Spark DataFrame to ensure type consistency\r\n\r\nschema = StructType([\r\n    StructField(\"GX Validation ID\", StringType(), True),   # Unique ID for each validation entry\r\n    StructField(\"GX Validation Rule\", StringType(), True), # Rule type and parameters applied\r\n    StructField(\"Validation date\", StringType(), True),    # Timestamp of validation run\r\n    StructField(\"Data Record ID\", StringType(), True),     # ID of the validated record\/rule\r\n    StructField(\"Detailed Result\", StringType(), True),    # Full detailed result (JSON string)\r\n    StructField(\"Result\", StringType(), True)              # Overall success\/failure (\"True\"\/\"False\")\r\n])\r\n\r\n# Build the list of dictionaries based on validation_results (assuming validation_results is already defined)\r\ngx_data = []\r\n\r\n# Get metadata for the validation\r\ndata_source = validation_results[\"meta\"][\"active_batch_definition\"][\"datasource_name\"]\r\nvalidation_date = datetime.now().strftime(\"%Y-%m-%d %H:%M:%S\")\r\n\r\n# Loop through the validation results and structure the data\r\nfor result in validation_results[\"results\"]:\r\n    gx_data.append({\r\n        \"GX Validation ID\": str(uuid.uuid4()),  # Use uuid4 to generate unique IDs\r\n        \"GX Validation Rule\": f'{result[\"expectation_config\"][\"type\"]}, kwargs = {result[\"expectation_config\"][\"kwargs\"]}',\r\n        \"Validation Date\": validation_date,\r\n        \"Data Record ID\": result[\"expectation_config\"][\"id\"],\r\n        \"Detailed Result\": json.dumps(result[\"result\"]),\r\n        \"Result\": str(result[\"success\"]),  # Store success as a string (\"True\"\/\"False\")\r\n    })\r\n\r\n# Create a Spark DataFrame from the list of dictionaries and schema\r\nvalidation_df = spark.createDataFrame(gx_data, schema=schema)\r\n\r\n# Display the DataFrame to inspect the data\r\ndisplay(validation_df)<\/code><\/pre>\n<p>This will generate output as follows:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2025\/10\/data-prep-result.png\" alt=\"Data Preparation Result\" \/><\/p>\n<h3>Storing the Data in Eventhouse<\/h3>\n<p>Next, we&#8217;ll store this data into the <strong>KQL database<\/strong> (Eventhouse) for querying and analysis. The following code shows how to write the DataFrame to Eventhouse using <strong>batch ingestion<\/strong>:<\/p>\n<pre><code class=\"language-python\"># Replace your connection details here\r\nkustoUri = \"&lt;YOUR_CLUSTER_URI&gt;\"\r\ndatabase_name = \"&lt;YOUR_DATABASE_NAME&gt;\"\r\n\r\naccessToken = mssparkutils.credentials.getToken(kustoUri)\r\n\r\n# Write to validation result in KQL DB\r\nvalidation_df.write \\\r\n  .format(\"com.microsoft.kusto.spark.datasource\") \\\r\n  .option(\"kustoCluster\", kustoUri) \\\r\n  .option(\"kustoDatabase\", database_name) \\\r\n  .option(\"kustoTable\", \"MyEventTable\") \\\r\n  .option(\"tableCreationMode\", \"CreateIfNotExist\") \\\r\n  .option(\"accessToken\", accessToken )\\\r\n  .option(\"tableCreateOptions\", \"CreateIfNotExist\") \\\r\n  .mode(\"Append\") \\\r\n  .save()<\/code><\/pre>\n<blockquote><p><strong>Note:<\/strong>\nThe above example uses <strong>batch ingestion<\/strong>, which is well suited for periodic data uploads (e.g., hourly, daily validations). For <strong>real-time or near real-time<\/strong> data ingestion, Eventhouse (Kusto) also supports <strong>streaming ingestion<\/strong>.\n\u27a1\ufe0f Learn more about streaming ingestion <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/data-explorer\/ingest-data-streaming\">here in the official documentation<\/a>.<\/p><\/blockquote>\n<h3>Query the Data Using KQL (Fabric Notebook in PySpark)<\/h3>\n<p>Now that your data is written to <strong>Eventhouse<\/strong>, you can query it using <strong>KQL<\/strong> in your <strong>Fabric Notebook<\/strong>. Here&#8217;s a simple KQL query to fetch 5 records from the table:<\/p>\n<pre><code class=\"language-python\">kustoQuery = \"MyEventTable | take 5\"\r\n\r\n# Replace your connection details here\r\nkustoUri = \"&lt;YOUR_CLUSTER_URI&gt;\"\r\ndatabase_name = \"&lt;YOUR_DATABASE_NAME&gt;\"\r\n\r\naccessToken = mssparkutils.credentials.getToken(kustoUri)\r\n\r\n# Query Eventhouse using KQL in Fabric Notebook\r\nkustoDf = spark.read \\\r\n    .format(\"com.microsoft.kusto.spark.synapse.datasource\") \\\r\n    .option(\"accessToken\", accessToken) \\\r\n    .option(\"kustoCluster\", kustoUri) \\\r\n    .option(\"kustoDatabase\", database_name) \\\r\n    .option(\"kustoQuery\", kustoQuery) \\\r\n    .load()\r\n\r\n# Display the results\r\nkustoDf.show()<\/code><\/pre>\n<p>This query retrieves 5 records from the <strong>MyEventTable<\/strong> in your Eventhouse. Replace the <code>&lt;YOUR_CLUSTER_URI&gt;<\/code> and <code>&lt;DATABASE&gt;<\/code> placeholders with your actual cluster URI and database name.<\/p>\n<h2>Visualizing the Data<\/h2>\n<h3>View the Data Inside Eventhouse Using QuerySet<\/h3>\n<p>You can view the data directly in <strong>KQL<\/strong> by using the following queries:<\/p>\n<ol>\n<li><strong>Simple KQL Query to View the Data<\/strong>:\nUse this query to view the top 100 rows from your <code>MyEventTable<\/code>:<\/p>\n<pre><code class=\"language-kql\">MyEventTable\r\n| take 100<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2025\/10\/view-data-kql-queryset.png\" alt=\"View Data using KQL Queryset\" \/><\/li>\n<li><strong>KQL Query to view Nested JSON Data<\/strong>:\nIf you have nested data (such as DetailedResult), you can flatten it using KQL:<\/p>\n<pre><code class=\"language-kql\">MyEventTable\r\n| extend result_json = parse_json(['Detailed Result'])\r\n| extend \r\n    element_count = toint(result_json.element_count),\r\n    unexpected_count = toint(result_json.unexpected_count),\r\n    unexpected_percent = todouble(result_json.unexpected_percent),\r\n    partial_unexpected_list = tostring(result_json.partial_unexpected_list)\r\n| project \r\n    validation_id = ['GX Validation ID'],\r\n    validation_rule = ['GX Validation Rule'],\r\n    validation_date = ['Validation date'],\r\n    result = Result,\r\n    element_count, \r\n    unexpected_count, \r\n    unexpected_percent, \r\n    partial_unexpected_list<\/code><\/pre>\n<p>This will flatten the <code>Detailed Result<\/code> field and give you an element_count, unexpected_count, unexpected_percent etc.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2025\/10\/nested-json-example.png\" alt=\"KQL Query Result\" \/><\/li>\n<li><strong>KQL Query to render the Charts<\/strong>:\nYou can easily visualize KQL query results as charts in tools that support the <code>render<\/code> operator, such as KQL Queryset in <strong>Microsoft Fabric<\/strong>.<\/p>\n<pre><code class=\"language-kql\">  MyEventTable\r\n  | extend status = iff(tolower(Result) == \"true\", \"\u2705 Passed\", \"\u274c Failed\")\r\n  | summarize total = count() by status\r\n  | render piechart<\/code><\/pre>\n<\/li>\n<\/ol>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2025\/10\/piechart.png\" alt=\"KQL Result Pie Chart\" \/><\/p>\n<pre>Similarly, you can visualize the data in various formats like bar charts, line charts, scatter plots, or even as tables, depending on the data and the type of analysis you wish to perform.\r\n\r\n\u27a1\ufe0f <a href=\"https:\/\/learn.microsoft.com\/en-us\/kusto\/query\/render-operator?view=microsoft-fabric\">Learn more about the `render` operator.<\/a><\/pre>\n<h3>Visualizing Your Data in Power BI<\/h3>\n<p>Once your GX validation results are stored in <strong>Fabric Eventhouse<\/strong>, you can easily visualize them using <strong>Power BI<\/strong> \u2014 and the best part is, you can kick it all off directly from the <strong>Fabric UI<\/strong> itself!<\/p>\n<h3>Steps to Create a Power BI Report from Fabric<\/h3>\n<ol>\n<li><strong>Run a KQL query<\/strong> in the Queryset (inside Fabric).<\/li>\n<li>Once the results are displayed, simply click on <strong>&#8220;Create Power BI report&#8221;<\/strong>.<\/li>\n<li>The query results will automatically be used to create a <strong>semantic model<\/strong>.<\/li>\n<li>From there, you can design your Power BI report \u2014 add pie charts, bar graphs, filters, and any visuals you like.<\/li>\n<\/ol>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2025\/10\/create-power-bi-report.png\" alt=\"Steps to create power bi report from fabric\" \/><\/p>\n<p>This integration makes it incredibly quick and intuitive to go from raw validation results to a full-blown analytics dashboard\u2014all without leaving the Fabric ecosystem.<\/p>\n<h3>Conclusion<\/h3>\n<p>By integrating <strong>Great Expectations<\/strong> with <strong>Fabric Eventhouse<\/strong>, you not only store your validation results but also gain the ability to run complex queries, visualize the data, and build powerful dashboards. This setup enables you to monitor your data validation efforts in real-time, providing you with the insights needed to ensure data quality and make informed decisions.<\/p>\n<p>If you haven\u2019t yet explored the possibilities of <strong>Fabric Eventhouse<\/strong> and <strong>Great Expectations<\/strong> together, now is the perfect time to start. Your data management strategy will benefit from more transparency, easier querying, and the ability to make better, faster decisions.<\/p>\n<h3>References<\/h3>\n<ul>\n<li><a href=\"https:\/\/docs.greatexpectations.io\/docs\/core\/introduction\/\">Great Expectations Overview<\/a>\nLearn about Great Expectations, the powerful data validation and documentation tool.<\/li>\n<li><a href=\"https:\/\/learn.microsoft.com\/en-us\/fabric\/real-time-intelligence\/eventhouse\">Fabric Eventhouse Overview<\/a>\nExplore how Fabric Eventhouse helps with real-time event processing and storage.<\/li>\n<li><a href=\"https:\/\/learn.microsoft.com\/en-us\/kusto\/query\/?view=microsoft-fabric\">Kusto Query Language Overview<\/a>\nGet familiar with Kusto Query Language (KQL) for querying your data in Fabric.<\/li>\n<li><a href=\"https:\/\/learn.microsoft.com\/en-us\/fabric\/real-time-intelligence\/create-powerbi-report\">Create Power BI Reports from Fabric Eventhouse<\/a>\nLearn how to create insightful Power BI reports directly from Fabric Eventhouse data.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>A step-by-step guide to storing data validation results into the Eventhouse &amp; visualize data using queries and dashboard.<\/p>\n","protected":false},"author":176854,"featured_media":16409,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1,3451],"tags":[3400],"class_list":["post-16408","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cse","category-ise","tag-ise"],"acf":[],"blog_post_summary":"<p>A step-by-step guide to storing data validation results into the Eventhouse &amp; visualize data using queries and dashboard.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/16408","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/users\/176854"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/comments?post=16408"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/16408\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media\/16409"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media?parent=16408"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/categories?post=16408"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/tags?post=16408"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}