Load testing AAD-based authentication for Azure Cache for Redis
At Microsoft, we continue working on modernizing our services to make them faster, more reliable, and up to date with the latest technologies. In this blog post, we’ll cover how Azure Load Testing helped ensure that the Azure Active Directory (AAD) based authentication mechanism for Azure Cache for Redis met the performance criteria.
Azure Cache for Redis is a fully managed, in-memory cache that enables high-performance and scalable architecture. In May 2023, Azure Cache for Redis launched a password-free authentication mechanism by integrating with AAD. This integration also included role-based access control functionality provided through access control lists (ACLs) supported in open source Redis.
Azure Cache for Redis is a powerful distributed cache that handles tens of thousands of connections concurrently. Since the applications connecting to Redis are extremely sensitive to latency, it was crucial to minimize the time taken for authentication and establishing new connections. Because of the single-threaded nature of Redis, delays in connection creation can lead to adverse effects on the Redis Server and result in high CPU utilization.
Introducing AAD token-based authentication adds an enhanced security measure with password-less connections but adds an additional step during authentication to validate the client’s token. It was important to ensure that the performance of AAD token-based authentication was on par with that of the existing access key-based authentication. The goal was to conduct performance benchmarking on the AAD authentication-based connections to the cache during high load and compare the results with similar stress conditions encountered by the access key-based connections.
Previously, we used the Redis benchmarking tool to assess performance in various stressful conditions. However, reliance solely on the tool became cumbersome as manual effort increased with the number of test scenarios. Consequently, we began exploring alternative methods to achieve high-quality testing with maximum automation capabilities.
We chose Azure Load Testing, a solution that allows us to run our test plans based on JMeter scripts. We started digging deeper and developed Java sampler classes per our test plan requirements. We leveraged Maven dependencies to compile and package the code into a JAR file, and then performed local testing on Apache JMeter to validate the workflow before we integrated with Azure Load Testing. Since committing AAD client ID and client secrets in our code repo is unsafe, Azure Load Testing also enabled us to seamlessly integrate with Azure Key Vault, our secrete store, and eliminate the possibility of security issues. Connecting to key vault is not a straightforward approach with the Redis benchmarking tool. It requires writing additional code for setting up the key vault client which in turn adds an extra authentication layer to connect with Azure Key Vault resource as well.
After performing local validation and testing, we set up tests on Azure Load Testing to enable automated runs for these tests. For creating tests, we uploaded the JAR file consisting of samplers, JMX test plan, and the properties file for the configuration settings. We also set up concurrent tests by selecting multiple engine instances which varied between 6 to 15 as per test scenario and added over 200 virtual users. Each test is configured to run for a duration of 30 minutes so that we can get a good picture of both client-side and server-side metrics. The portal experience was very developer friendly, which allowed us to quickly set up the tests.
Since we were introducing an authentication-based feature, we had to test all cache Tiers or SKUs (Basic, Standard, Premium) that we offer. Due to the diverse range of SKUs and various scenarios, the number of tests multiplies significantly to over 50, making automation crucial. As the number of tests spiked up, running tests manually from the portal became a hectic task. Then we discovered the Azure Pipelines integration for Azure Load Testing, allowing us to run all the tests with a single click as shown below. This was quite simple to set up following the documentation where we added each test as a job and provide the corresponding test configuration YAML file. This allowed us to achieve full automation over all these tests which was almost impossible to run using the benchmarking tool.
We set up tests to measure the throughput and operations per second under high load conditions. Specifically, we performed tests for DDoS (Distributed Denial of Service) attacks by deliberately sending incorrect auth tokens to the server at a high rate. Additionally, we carried out stress testing, which involved re-authenticating every two minutes while the server was experiencing high load. Furthermore, we also attempted to simulate the failover experience in cache by rebooting the primary node during maximum connections and high load. This allowed us to capture the time taken for reconnecting with the new primary node.
The test results provided exciting insights into the different scenarios and the visibility of both client-side and service-side metrics–all within a single, consolidated dashboard. Per our test plan, we evaluated metrics such as throughput, operations per second, connection rate, server load, latency and response time for the different cases mentioned above.
Our primary objective was to compare the results between AAD token-based and access key-based connections under identical stress conditions, and we successfully achieved this goal.
Below is the screenshot of one of our test results capturing server-side metrics for access key and AAD token-based connections for Azure Cache for Redis.
Access key-based connection:
As you can see above, the cache latency and the operations per second were similar in both the scenarios. We observed that AAD authentication-based tests performance is quite comparable and close to the access key-based tests in various scenarios.
Seeing the value and benefits of stress testing Azure Cache for Redis with Azure Load Testing, we will continue to run these tests. Now, it just takes one click to run the tests, and they get completed in a couple of hours with detailed analysis, which is a big improvement over earlier flow where we needed to spend around 8-10 hours using the benchmarking tool to run and analyze the results.
If you’re ready to evaluate the performance of your Azure Cache for Redis resource using AAD token-based authentication like we’ve discussed in this post, visit Azure Load Testing here to get started.