{"id":16194,"date":"2025-05-08T00:00:00","date_gmt":"2025-05-08T07:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/ise\/?p=16194"},"modified":"2025-08-07T06:11:11","modified_gmt":"2025-08-07T13:11:11","slug":"external-data-handling-learnings","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/ise\/external-data-handling-learnings\/","title":{"rendered":"Integration testing with Dapr and Testcontainers"},"content":{"rendered":"<h1>Introduction<\/h1>\n<p>In this article, we&#8217;ll walk through setting up a <a href=\"https:\/\/docs.pytest.org\/en\/stable\/\">Pytest<\/a> integration test framework for a multi-component system utilizing <a href=\"https:\/\/dapr.io\/\">Dapr<\/a> and <a href=\"https:\/\/testcontainers.com\/\">Testcontainers<\/a>. This setup was initially developed for a customer and later extracted into a sample project to serve as a foundational starting point.<\/p>\n<h2>Background<\/h2>\n<p>As part of our most recent engagement, we took some time to create a testing framework for the customer that would cover <code>unit tests<\/code>, <code>end to end tests<\/code> and <code>integration tests<\/code>. For our use case, we decided that an <code>integration test<\/code> was going to be &#8220;Any test that works across two or more components, but not all of them&#8221;, as that&#8217;d be an <code>e2e<\/code> test. <\/p>\n<p>Furthermore, we wanted to test the real services, or as close to them as we could get. For us that meant Dapr, <a href=\"https:\/\/testcontainers.com\/modules\/redis\/?language=python\">Redis<\/a>,  <a href=\"https:\/\/testcontainers.com\/modules\/cosmodb\/\">Cosmos<\/a>, and a <a href=\"https:\/\/testcontainers.com\/modules\/azurite\/\">Storage Account<\/a>. The last 2 aren&#8217;t included in this article. <\/p>\n<p>The problem with testing a real service is that they can be polluted with test and other dev&#8217;s data, and the test needs to account for this leading to increased boilerplate code in each test. For example, a test might check that a table has 1 more row after an insert, but what if something else has done an insert in between? True, we can keep track of IDs and such but it&#8217;d be a lot easier if we had the guarantee that we&#8217;re testing on a brand new clean environment every time. This example might be easy, but you can image how the complexity will increase with multiple services that the dev team is accessing for their own development. <\/p>\n<p>Most real services are, and rightfully so, very well secured behind RBAC, VNets, Firewalls, credentials&#8230; you name it. How can we design a test framework that can test the integration with these services without having to go through the sometimes painful process of connecting a dev machine or a pipeline to a real system? <\/p>\n<p>The clue was in the title of this article, <a href=\"https:\/\/testcontainers.com\/\">Testcontainers<\/a>. That solved it for us, and I hope this article helps you to decide if our approach would also work for you and your customers. <\/p>\n<p>To help with the clarity of this article, and to have a working and usable sample, we have distilled the solution into a bare bones repository that still maintains most of the components that were used, albeit with the names changed. Here&#8217;s a simplified diagram that illustrates the application we&#8217;ll be using for this demo. <\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2025\/08\/application-diagram-scaled.png\" alt=\"Application Diagram\" \/><\/p>\n<h2>Test requirements<\/h2>\n<p>A test is only useful if it gets run. We were quite mindful about how we designed the test framework to ensure that both us, and the customer, were empowered to write and run more tests, more easily. <\/p>\n<ul>\n<li><em>Easy to run:<\/em> <code>Pytest<\/code> integrates well with most IDEs and test plugins.<\/li>\n<li><em>Minimal configuration:<\/em> Avoid maintaining duplicate configurations.<\/li>\n<li><em>Real dependencies:<\/em> Use actual ports and <em>SDKs<\/em> instead of mocks.<\/li>\n<li><em>Test locally:<\/em> Ensure tests run both in a developer\u2019s machine and in CI\/CD pipelines.<\/li>\n<\/ul>\n<h2>Project Structure<\/h2>\n<p>The following is a high-level representation of the project structure, showing key components for integration testing:<\/p>\n<pre><code class=\"language-plain\">\u251c\u2500\u2500 dapr-components\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 pubsub-component.yaml\n\u251c\u2500\u2500 order-processor\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 Dockerfile\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 app.py\n\u251c\u2500\u2500 order-publisher\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 Dockerfile\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 app.py\n\u2514\u2500\u2500 tests\n    \u251c\u2500\u2500 conftest.py\n    \u251c\u2500\u2500 docker-images\n    \u2502\u00a0\u00a0 \u251c\u2500\u2500 dapr\n    \u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u251c\u2500\u2500 Dockerfile\n    \u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 pubsub-component.yaml\n    \u2502\u00a0\u00a0 \u2514\u2500\u2500 orders\n    \u2502\u00a0\u00a0     \u251c\u2500\u2500 Dockerfile\n    \u2502\u00a0\u00a0     \u2514\u2500\u2500 run.sh\n    \u2514\u2500\u2500 test_orders.py<\/code><\/pre>\n<h3>Dockerfile<\/h3>\n<p>The <code>Dockerfile<\/code> that builds the <code>order-processor<\/code> and <code>order-publisher<\/code> images is quite simple. It&#8217;s just a <code>Python<\/code> image with the app copied into it. <\/p>\n<pre><code class=\"language-Dockerfile\">FROM python:3.11-slim\n\nCOPY requirements.txt .\nRUN pip install -r requirements.txt\n\nWORKDIR \/app\nCOPY . \/app\n\nEXPOSE 8000\nCMD [\"uvicorn\", \"app:app\", \"--host\", \"0.0.0.0\", \"--port\", \"8000\"]<\/code><\/pre>\n<h3>order-publisher<\/h3>\n<p>This app just sends a <code>json<\/code> to Dapr on a <code>POST<\/code> endpoint. It&#8217;s a simple app that doesn&#8217;t do much, but it&#8217;s enough for our sample. <\/p>\n<pre><code class=\"language-py\">@app.post(\"\/order\")\nasync def publish_order(request: Request):\n    order_data = await request.json()\n\n    # Starting the client here uses the default values and the ones in the\n    # environment to create the client, therefore there's no extra config here.\n    with DaprClient() as client:\n        client.publish_event(\n            pubsub_name='orders-pubsub',\n            topic_name='orders',\n            data=json.dumps(order_data),\n        )\n    return {\"status\": \"Order published\"}<\/code><\/pre>\n<h3>order-processor<\/h3>\n<p>The order processor makes use of the <code>Dapr<\/code> SDK to subscribe to the <code>orders<\/code> topic, once a message is received we just print it the console. <\/p>\n<pre><code class=\"language-py\">app = Flask(__name__)\n\n# Register Dapr pub\/sub subscriptions\n@app.route('\/dapr\/subscribe', methods=['GET'])\ndef subscribe():\n    subscriptions = [{\n        'pubsubname': 'orders-pubsub',\n        'topic': 'orders',\n        'route': 'orders'\n    }]\n    print('Dapr pub\/sub is subscribed to: ' + json.dumps(subscriptions))\n    return jsonify(subscriptions)\n\n# Dapr subscription in \/dapr\/subscribe sets up this route\n@app.route('\/orders', methods=['POST'])\ndef orders_subscriber():\n    event = from_http(request.headers, request.get_data())\n\n    # More complicated processing would go here in a real application\n    print('Received order: ' + event.data, flush=True)\n    return 'OK', 200\n\napp.run(port=8001)<\/code><\/pre>\n<h3>Dapr components<\/h3>\n<p>And finally, the component that configures <code>Redis<\/code> to be used with <code>Dapr<\/code>.<\/p>\n<pre><code class=\"language-yaml\">apiVersion: dapr.io\/v1alpha1\nkind: Component\nmetadata:\n  name: orders-pubsub\n  namespace: ${NAMESPACE}\nspec:\n  type: pubsub.redis\n  version: v1\n  # Variables documented here\n  # https:\/\/docs.dapr.io\/reference\/components-reference\/supported-pubsub\/setup-redis-pubsub\/\n  metadata:\n    - name: redisHost\n      value: 127.0.0.1:6379<\/code><\/pre>\n<h2>Proposed workflow<\/h2>\n<p>As a brief summary, when an <code>integration test<\/code> runs, we use <code>Pytest<\/code> and <code>testcontainers<\/code> to build, configure and start the necessary containers to run the test in an isolated environment. When the test finishes, we clean everything up for the next run. <\/p>\n<h3>Diagram<\/h3>\n<p>We&#8217;ll jump to the walkthrough very quickly, but first let&#8217;s go through a diagram of what happens when a test runs: <\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2025\/08\/test-run-flow-scaled.png\" alt=\"Test run flow\" \/><\/p>\n<h2>Code Walkthrough<\/h2>\n<h3>TL;DR<\/h3>\n<p>This is what happens when you run the tests<\/p>\n<pre><code class=\"language-plain\">1. Build standalone images\n2. Build Dapr image\n3. Create test image with Standalone and Dapr\n4. For each test\n   1. Create a Docker Network\n   2. Start and configure Containers\n   3. Run the test\n   4. Cleanup<\/code><\/pre>\n<h3>Using Fixtures to control setup and take down<\/h3>\n<p>We decided to use <a href=\"https:\/\/docs.pytest.org\/en\/6.2.x\/fixture.html\">fixtures<\/a> to control the flow of the test. Fixtures allows us to call bits of code automatically and they have a great feature that&#8217;s incredibly useful for this framework so that we can define the scope in which each fixture is called. <\/p>\n<p>We use the scope <code>session<\/code> for the code that we want to run only once per <code>session<\/code>. In this case we&#8217;re building the Docker images, and that&#8217;s something that we can do only once. Also notice the parameter <code>autouse<\/code> which enables this fixture to be called every time regardless of which test starts the session. <\/p>\n<pre><code class=\"language-python\">@pytest.fixture(scope=\"session\", autouse=True)\ndef images():\n    # This is just a helper function to simplify the code\n    def create_docker_image(path: str, tag: str, buildargs: dict = None) -&gt; DockerImage:\n        return DockerImage(path=path, tag=tag).build(buildargs=buildargs)\n\n    # Build the base images as they'd be deployed in production\n    create_docker_image(\".\/order-processor\", processor_base)\n    create_docker_image(\".\/order-publisher\", publisher_base)\n\n    # This uses the base images and extends them to include test-specific\n    #  dependencies. In this case... just Dapr but it could also include other\n    #  things such as az cli or test volumes for sample payloads\n    create_docker_image(\".\/tests\/docker-images\/dapr\", dapr)\n    create_docker_image(\".\/tests\/docker-images\/orders\", processor,\n        buildargs={\"image\": processor_base)\n<\/code><\/pre>\n<p>The rest of the fixtures in <code>conftest.py<\/code> have the scope of <code>function<\/code> which means that they&#8217;re created and destroyed in every test execution. This guarantees that tests and side effects will not pollute each other. <\/p>\n<p>Another thing to note is the use of the keyword <code>with<\/code>, which in combination with <code>yield<\/code> allows us to keep the containers in scope and in <code>Docker<\/code> until the execution resolves the <code>with<\/code> at the end of the test. <code>Testcontainers<\/code> will make sure to stop and cleanup at the end of the <code>with<\/code> scope. <\/p>\n<pre><code class=\"language-python\">@pytest.fixture(scope=\"function\")\ndef network():\n    with Network() as network:\n        yield network\n\n@pytest.fixture(scope=\"function\")\ndef redis(network):\n    with (RedisContainer(image=\"redis:7.4.2-alpine\")\n        .with_network(network)\n        .with_name(\"redis-integration\")\n        .with_bind_ports(6379, 6380)) as redis_container:\n            yield redis_container<\/code><\/pre>\n<p>Having these fixtures properly configured makes it very easy to cherry pick which containers a test needs to start. in this snippet of code we can see how this particular test would start 2 containers: <code>redis<\/code> and <code>publisher<\/code>. Note that the cleanup is managed entirely by the fixture and not the test. <\/p>\n<pre><code class=\"language-python\">def test_order_publisher(redis, publisher_container):\n    # Contents omitted for simplicity\n    assert True<\/code><\/pre>\n<h3>Building the images<\/h3>\n<p>Each app has its own <code>Dockerfile<\/code> which creates a standalone image that can be started locally or deployed to <code>Kubernetes<\/code>. This isn&#8217;t enough for our test as we need it to be able to work with <code>Dapr<\/code>. It&#8217;s true that we should also have the <code>Dapr cli<\/code> installed locally for development but if we were to use that one, there&#8217;d be conflicts with app names and ports. Let&#8217;s remember that one of our priorities is to test the real configuration and dependencies as much as possible. <\/p>\n<p>By starting the test containers in their own isolated <code>Docker Network<\/code> we can isolate the test and maintain the same configuration, but we lose access to <code>Dapr<\/code> without extra work and configuration. <\/p>\n<p>We decided to embed Dapr and the app in the same container, which Dapr tells us that it&#8217;s ok to do, albeit <a href=\"https:\/\/docs.dapr.io\/operations\/hosting\/self-hosted\/self-hosted-with-docker\/#run-both-app-and-dapr-in-a-single-docker-container\">for development purposes only<\/a>. <\/p>\n<p>We start with a basic <code>Dapr<\/code> image:<\/p>\n<pre><code class=\"language-Dockerfile\">\nFROM alpine:edge\n\n# Install dapr CLI\nRUN apk add --no-cache bash\nADD https:\/\/raw.githubusercontent.com\/dapr\/cli\/master\/install\/install.sh \/tmp\/install.sh\nRUN \/bin\/bash \/tmp\/install.sh\n\n# Install daprd\nARG DAPR_BUILD_DIR\nCOPY $DAPR_BUILD_DIR \/opt\/dapr\nENV PATH=\"\/opt\/dapr\/:${PATH}\"\nRUN dapr init --slim\n\n# Install your app\nWORKDIR \/components\nCOPY pubsub-component.yaml .\n\nEXPOSE 3500\n<\/code><\/pre>\n<p>And when we extend it with the app&#8217;s dependencies, it leaves us with a Dockerfile that is able to pull the standalone app&#8217;s image, therefore testing the real application, and then extend it to add the testing dependencies. For this sample it&#8217;s just <code>Dapr<\/code> but for our customer we also included the <code>Azure cli<\/code> in this step, which allowed us to keep the security of the standalone app intact. <\/p>\n<pre><code class=\"language-Dockerfile\"># We can reuse the entire image if we just parameterize some of the values\n# In more complex scenarios this might not be enough, or maintainable\nARG image\n\n# This is built by us\nFROM dapr:integration AS builder\n\nFROM $image\n\nUSER root\n\nRUN adduser --disabled-password --gecos '' nonroot\n\nWORKDIR \/app\nENV PYTHONPATH=\/app\n\n# Copy dependencies from the first stage\nCOPY --from=builder \/usr\/local\/bin\/dapr \/usr\/local\/bin\/dapr\nCOPY --from=builder \/root\/.dapr \/home\/nonroot\/.dapr\nCOPY --from=builder \/opt\/dapr \/opt\/dapr\nCOPY --from=builder \/components \/components\n\nCOPY run.sh \/app\/run.sh\nRUN chmod +x \/app\/run.sh\n\nUSER nonroot\n\nENTRYPOINT [\"\/app\/run.sh\"]<\/code><\/pre>\n<h3>Starting the containers<\/h3>\n<p><code>Testcontainers<\/code> gives us a handful of handy functions to configure a container&#8217;s name, network and environment. You might also consider mounting a volume with an environment file to avoid hardcoding the values. <\/p>\n<pre><code class=\"language-python\">@pytest.fixture(scope=\"function\")\ndef processor_container(network):\n    with (DockerContainer(processor)\n        .with_network(network)\n        .with_name(\"processor\")\n        .with_bind_ports(8001, 8001)\n        .with_env(\"app_id\", \"order-processor\")\n        .with_env(\"port\", \"8001\")\n        .with_env(\"dapr_http_port\", \"3501\")\n        .with_env(\"dapr_grpc_port\", \"50002\")\n        ) as processor_container:\n\n        # Wait for the application to start. There are many ways to do this,\n        # but checking the logs seems simple enough to me\n        wait_for_logs(processor_container, \"You're up and running! Both \"\\\n        \"Dapr and your app logs will appear here.\")\n\n        yield processor_container<\/code><\/pre>\n<h3>Running the tests<\/h3>\n<p>The tests can be started by running <code>pytest<\/code> in the console, or, quite possibly through your IDE&#8217;s test window. <\/p>\n<p>The main thing that we need to pay attention to are the test parameters, which as you&#8217;ve probably noticed, share the same name as the <code>fixtures<\/code> defined up above, and in <code>conftest.py<\/code>. This is how we tell <code>pytest<\/code> and our test framework which containers we want to build, start and use. <\/p>\n<pre><code class=\"language-python\">def test_order_publisher_processor(base_publisher_url, redis,\n    publisher_container, processor_container):\n\n    response = requests.post(f\"{base_publisher_url}\/order\",\n        json={\"order_id\": \"1\", \"item\": \"item1\"})\n    assert response.status_code == 200\n\n    wait_for_logs(processor_container,\n        \"Received order: {\\\"order_id\\\": \\\"1\\\", \\\"item\\\": \\\"item1\\\"}\")\n\n    assert True<\/code><\/pre>\n<p>As you can see from the code above, this test requests <code>base_publisher_url<\/code>, <code>redis<\/code>, <code>publisher_container<\/code> and <code>processor_container<\/code>. By the time the code execution enters the actual test function, the <code>fixtures<\/code> would have taken care of the entire setup. This makes it really easy to create new tests with customisable dependencies. <\/p>\n<h3>Future improvements<\/h3>\n<p>Here are a few things that we found and shared with the customer during our engagement. These were suggestions for them to take on and implement after we disengaged. I think it&#8217;s a good idea to share these here as well, even though they&#8217;re not in the provided sample. <\/p>\n<h4>Use env files<\/h4>\n<p>Right now this sample has hardcoded environment values which are supplied\nlike this<\/p>\n<pre><code class=\"language-python\">.with_env(\"app_id\", \"order-processor\")\n.with_env(\"port\", \"8001\")\n.with_env(\"dapr_http_port\", \"3501\")\n.with_env(\"dapr_grpc_port\", \"50002\")<\/code><\/pre>\n<p>A good improvement would be to mount the same <code>.env<\/code> file that&#8217;s used for either local development or <code>QA<\/code>. This would ensure that environments are kept, updated and maintained in a single place. One way to do this would be to add it as a <code>volume<\/code> to the container and make the app load it on startup. <\/p>\n<pre><code class=\"language-python\">.with_volume_mapping()<\/code><\/pre>\n<h4>Use marks<\/h4>\n<p>Because of the way that <code>pytest<\/code> discovers tests, when we run <code>pytest<\/code>, every single test will fire, to optimise this we can use <a href=\"https:\/\/docs.pytest.org\/en\/stable\/example\/markers.html\">marks<\/a> to let <code>pytest<\/code> know which components each test uses. Let&#8217;s use a graphical example. <\/p>\n<pre><code class=\"language-python\">@pytest.mark.a\ndef test_component_a():\n    pass\n\n@pytest.mark.a\n@pytest.mark.b\ndef test_component_a_b():\n    pass\n\n@pytest.mark.b\ndef test_component_b():\n    pass<\/code><\/pre>\n<p>By having these marks defined, we can run <code>pytest -m {mark}<\/code> and target specific tests. If this is configured correctly, one can imagine a pipeline that detects which components have been modified, and fire only the tests for the components that need testing. <\/p>\n<h2>Conclusion<\/h2>\n<p>By integrating Dapr with Testcontainers and Pytest, we established a reliable testing framework that ensures isolated, repeatable, and real-world scenario testing. This approach minimizes configuration overhead while maintaining production-like conditions for validation.<\/p>\n<p>We hope this article provides useful insights for implementing integration testing in your own projects.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This blog post discusses setting up a Pytest integration test framework for a system using Dapr and Testcontainers. This framework was initially setup for a customer to suit their needs and it has been extracted into a sample project to provide a starting point.<\/p>\n","protected":false},"author":187801,"featured_media":16317,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1,3451],"tags":[60,3443,3597,300,3596,3598,3346],"class_list":["post-16194","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cse","category-ise","tag-azure","tag-dapr","tag-pytest","tag-python","tag-redis","tag-testcontainers","tag-testing"],"acf":[],"blog_post_summary":"<p>This blog post discusses setting up a Pytest integration test framework for a system using Dapr and Testcontainers. This framework was initially setup for a customer to suit their needs and it has been extracted into a sample project to provide a starting point.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/16194","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/users\/187801"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/comments?post=16194"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/16194\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media\/16317"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media?parent=16194"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/categories?post=16194"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/tags?post=16194"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}