Enhancing reliability in Microsoft Fabric and Azure Synapse through load testing

Predrag Vlatković

March 4th, 20240 3

Introduction

At Microsoft, our commitment to modernizing services remains steadfast. We strive to make services faster and more reliable, often through extensive testing. Here we’ll cover how we’ve been using Azure Load Testing (ALT) to ensure the reliability of Microsoft Fabric and Azure Synapse.

Azure Synapse Analytics is an analytics service that accelerates time to insight across data warehouses and big data systems. It brings together the best of SQL technologies used in enterprise data warehousing, Apache Spark technologies for big data, and Azure Data Explorer for log and time series analytics.

Microsoft Fabric is an all-in-one analytics solution for enterprises that covers everything from data movement to data science, real-time analytics, and business intelligence. It offers a comprehensive suite of services including data lake, data engineering, and data integration all in one place. Azure Synapse and Microsoft Fabric use the same underlying architecture for querying data.

Goal

The main goal is to subject the SQL analytics runtime of Microsoft Fabric and Azure Synapse to an overwhelming amount of load and stress, scrutinizing both server-side and APIs used by client side for any issues. We focused on identifying and rectifying potential issues within the product. Load testing is an integral part of the sign-off procedure, so we need to stress test the system every day. The aim is to validate that a specific version of code, when operated in a particular environment, performs in accordance with the criteria established for the best user experience.

Requirements

Selecting the right tool for this task was paramount. We had a checklist of requirements, some of which include:

The ability to conduct a substantial number of concurrent queries within a specified timeframe, ranging from a few thousand to several hundred thousand.
The testing tool should support parameterization for easy modification of load intensity, test endpoints, and overall test behaviour.
It should ensure the assertion of test results and enable the creation of client telemetry, allowing correlation with server telemetry.
The system must utilize Azure for test execution and scalability, run tests seamlessly from Azure DevOps pipelines, store test definitions in a source control repository, guarantee secure execution without revealing secrets, and be extendable with custom-written components.

Azure Load Testing (ALT) fulfilled all our criteria. Azure Load Testing is a fully managed load-testing service that enables you to generate high-scale load. It has high-fidelity support for JMeter, a widely used open source software designed to measure performance and load test applications. With JMeter, we could perform necessary queries and API calls for testing purposes. The service seamlessly integrates with Azure Repos and Azure Pipelines, provides support for Azure Key Vault, and allows the incorporation of custom code to extend the functionality. Remarkably, the consistent experience between running tests in the local environment and Azure Load Testing stands out, requiring no alterations to the test script (jmx).

Approach

Test scenario

Our tests go beyond the ordinary, delving deep into the resilience of Microsoft Fabric and Azure Synapse under high stress for varying data sizes, queries, scenarios, and environments. Operations such as workspace creation, data loading, query execution, and workspace deletion undergo meticulous scrutiny. Tests are run daily and weekly ensuring a comprehensive evaluation under different conditions.

How our testing project is organized

The process for each new test begins on a developer’s computer, where they establish the specific scenario they wish to test. Individual test suites are organized into separate folders within a repository. These contain jmx files, which are executable both locally and through pipelines. The specifics of test execution are outlined in ALT YAML files and JMeter properties files. These documents detail various aspects of the test, such as:

The environment in which the test will run (e.g., test, stage, prod)
The dataset or database used
The number of concurrent users
Wait times
Ramp-up time
Duration of the test
The JMeter test is designed to verify that the correct results are returned after each query execution

Upon a test’s completion, the developer pushes it to the repository. A pull request approval triggers an artifact creation Pipeline, which not only builds binaries and performs all unit tests but also publishes all new files to an artefact storage. Subsequently, these files become accessible for the Main Pipeline. This pipeline acquires the artifacts, downloads the required Azure Key Vault secrets, and conducts the ALT tests. Following the test’s conclusion, it transmits client telemetry to a dedicated Azure Synapse Analytics workspace. This workspace processes the incoming user data, amalgamates it with the client telemetry, and ultimately renders the consolidated data available for comprehensive analysis and visualization through Power BI.

We have defined the tests in JMeter against both Microsoft Fabric and Azure Synapse. The extensibility of JMeter allows us to customize the script to fit our requirements. We have re-used the same test script across many scenarios by using JMeter user properties. We also have implemented some custom samplers for JMeter. Custom samplers target loads, extract specific client telemetry, and modify the internal state of our test environment and services. Crucially, Azure Load Testing provides a safe and secure environment for running custom samplers.

Automated test execution

All our test plan files are committed to the source control repository. The established pipeline template in Azure Pipelines allows the execution of multiple test suites. Several pipelines are in place with, one runs on a daily schedule and another on a weekly basis. A custom tool has been developed to convert results for compatibility with NUnit. Azure Pipelines can read and publish these results. Moreover, we generate metadata for the test results and transfer both the results and metadata to blob storage. Once all the results in the Synapse workspace are received, they undergo processing, and a report is generated in Power BI.

Results

The capability of being able to export the results enables us to generate custom dashboards that meet our specific requirements. For our case, Power BI has been an effective tool to generate these reports. It helps us generate insights into what are the different errors reported during the test run.

For each identified error, we generate a work item to facilitate resolution. Our observations encompass a spectrum of errors, including client-side connection issues and server-side errors. Reviewing these reports as an integral part of our sign-off procedure has proven instrumental in addressing product issues, reducing overall server-side errors. While our day-to-day focus remains on result analysis, the surrounding processes, such as pipeline execution, test case generation, test execution, and result publication, are fully automated.

Conclusion

Testing Microsoft Fabric and Azure Synapse using Azure Load Testing has made these services more reliable and set up a solid system for making continuous improvements. By checking the results daily and relying on automation when possible, we make sure everything works seamlessly. This comprehensive approach strengthens our goal of stressing the system to ensure it can handle the expected load and keeps us right at the forefront of adopting latest technology.

If you want to get started with your load testing journey, visit Azure Load Testing here to get started.