{"id":71324,"date":"2025-07-24T22:47:36","date_gmt":"2025-07-25T06:47:36","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/devops\/?p=71324"},"modified":"2025-07-24T22:47:36","modified_gmt":"2025-07-25T06:47:36","slug":"from-manual-testing-to-ai-generated-automation-our-azure-devops-mcp-playwright-success-story","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/devops\/from-manual-testing-to-ai-generated-automation-our-azure-devops-mcp-playwright-success-story\/","title":{"rendered":"From Manual Testing to AI-Generated Automation: Our Azure DevOps MCP + Playwright Success Story"},"content":{"rendered":"<p>In today\u2019s fast-paced software development cycles, manual testing often becomes a significant bottleneck. Our team was facing a growing backlog of test cases that required repetitive manual execution\u2014running the entire test suite every sprint. This consumed valuable time that could be better spent on exploratory testing and higher-value tasks.<\/p>\n<p>We set out to solve this by leveraging Azure DevOps\u2019 new <a href=\"https:\/\/devblogs.microsoft.com\/devops\/azure-devops-mcp-server-public-preview\/\" target=\"_blank\">MCP server<\/a> integration with GitHub Copilot to automatically generate and run end-to-end tests using <a href=\"https:\/\/playwright.dev\/\" target=\"_blank\">Playwright<\/a>. This powerful combination has transformed our testing process:<\/p>\n<ul>\n<li><strong>Faster test creation<\/strong> with AI-assisted code generation<\/li>\n<li><strong>Broader test coverage<\/strong> across critical user flows<\/li>\n<li><strong>Seamless CI\/CD integration<\/strong>, allowing hundreds of tests to run automatically<\/li>\n<li><strong>On-demand test execution<\/strong> directly from the Azure Test Plans experience (associating Playwright JS\/TS tests with manual test cases is coming soon. Keep an eye on our <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/devops\/release-notes\/2025\/testplans\/sprint-258-update\" target=\"_blank\">release notes<\/a> for the announcement.).<\/li>\n<\/ul>\n<p>By automating our testing pipeline, we\u2019ve significantly reduced manual effort, improved test reliability, and accelerated our release cycles. In this post, we\u2019ll share how we did it.<\/p>\n<h1>How We Turn Test Cases into Automated Scripts (Step-by-Step)<\/h1>\n<p>Enabling this AI-driven workflow required a few pieces to come together. Here\u2019s how the process works from start to finish: <a href=\"https:\/\/devblogs.microsoft.com\/devops\/wp-content\/uploads\/sites\/6\/2025\/06\/Screenshot-2025-06-26-at-8.58.38.png\"><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/devops\/wp-content\/uploads\/sites\/6\/2025\/06\/Screenshot-2025-06-26-at-8.58.38.png\" alt=\"Screenshot showing the steps to generate automated Playwright tests.\" width=\"930\" height=\"1004\" class=\"aligncenter size-full wp-image-71355\" srcset=\"https:\/\/devblogs.microsoft.com\/devops\/wp-content\/uploads\/sites\/6\/2025\/06\/Screenshot-2025-06-26-at-8.58.38.png 930w, https:\/\/devblogs.microsoft.com\/devops\/wp-content\/uploads\/sites\/6\/2025\/06\/Screenshot-2025-06-26-at-8.58.38-278x300.png 278w, https:\/\/devblogs.microsoft.com\/devops\/wp-content\/uploads\/sites\/6\/2025\/06\/Screenshot-2025-06-26-at-8.58.38-768x829.png 768w\" sizes=\"(max-width: 930px) 100vw, 930px\" \/><\/a><\/p>\n<p>By following the above loop for each test case (and you can do it in bulk, by passing an entire Test Suite to GitHub Copilot), we gradually turned an entire manual test suite into an automated one (we have hundreds of test cases only for our own domain and over a thousand test cases for the entire project). The MCP server and Copilot essentially handled the heavy lifting of writing code, while our team oversaw the process and made minor adjustments. It felt almost like magic \u2013 describing a test in plain English and getting a runnable automated script in return!<\/p>\n<h1>Challenges and Lessons Learned<\/h1>\n<ul>\n<li><strong>Prompt is the king!<\/strong> Goes without saying &#8211; how you prompt the AI matters. A clear, specific prompt yields better results. In our case, breaking the task into two prompts (\u201cfetch test case\u201d then \u201cgenerate script\u201d) produced more reliable code than a single combined prompt. We also sometimes had to experiment with phrasing \u2013 e.g. using the exact wording \u201cconvert the above test case steps to Playwright script\u201d worked better than a vaguer command. In addition to this, make sure to point the model to relevant code\/files where you have existing tests. The more references you give, the more accurate the newly generated script it will be. It\u2019s a bit of an art, but the more we used it, the more we developed a feel for what phrasing GitHub Copilot responds best to. Thankfully, our test case descriptions were usually detailed and structured, which made it easier for the AI to identify the sequence of actions.<\/li>\n<li><strong>Quality of context:<\/strong> You\u2019ll need to spend extra time on one of two things:<\/li>\n<\/ul>\n<ol>\n<li>\n<p>Either improve your test cases in Azure DevOps by writing clearer, more detailed steps,<\/p>\n<\/li>\n<li>\n<p>Or spend more time fixing the generated scripts later.<\/p>\n<\/li>\n<\/ol>\n<p>If you choose to improve the test cases, make sure they are specific. Some examples of vague and specific steps:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/devops\/wp-content\/uploads\/sites\/6\/2025\/07\/example-test-case-steps.png\" alt=\"A table with examples of vague and specific test case steps.\" width=\"1232\" height=\"596\" class=\"aligncenter size-full wp-image-71550\" srcset=\"https:\/\/devblogs.microsoft.com\/devops\/wp-content\/uploads\/sites\/6\/2025\/07\/example-test-case-steps.png 1232w, https:\/\/devblogs.microsoft.com\/devops\/wp-content\/uploads\/sites\/6\/2025\/07\/example-test-case-steps-300x145.png 300w, https:\/\/devblogs.microsoft.com\/devops\/wp-content\/uploads\/sites\/6\/2025\/07\/example-test-case-steps-1024x495.png 1024w, https:\/\/devblogs.microsoft.com\/devops\/wp-content\/uploads\/sites\/6\/2025\/07\/example-test-case-steps-768x372.png 768w\" sizes=\"(max-width: 1232px) 100vw, 1232px\" \/><\/p>\n<ul>\n<li><strong>Handling of Non-Textual Steps:<\/strong> Some test scenarios involve graphics or media (for example, \u201cverify the chart looks correct\u201d or checking an image). The current Copilot agent cannot interpret images or visual assertions \u2013 its domain is text. Our POCs confirmed that if a test step said \u201ccompare screenshot,\u201d the AI would not magically do image comparison. The workaround is to adjust such steps to something verifiable via DOM or data (or handle those cases manually for now). In practice, this was a minor limitation \u2013 the vast majority of our test steps were things like \u201cclick this\u201d or \u201center that,\u201d which AI handles well. But it\u2019s good to be aware: for purely visual verifications, you\u2019ll need to supplement with traditional methods, or use Playwright\u2019s screenshot assertions with predefined baseline images.<\/li>\n<\/ul>\n<h1>Appendix<\/h1>\n<h2>Prompts<\/h2>\n<p>Below you can find the 2 prompts that can help you get started. After you generate the scripts, you can tweak them until the point that you can execute them successfully locally. Once you are happy with the scripts, you can create an <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/playwright-testing\/quickstart-automate-end-to-end-testing\" target=\"_blank\">Azure Pipeline<\/a> to execute them as part of it, on regular basis.<\/p>\n<p>Make sure to tailor the prompts to your specific needs and context &#8211; this will help Copilot generate higher-quality scripts.<\/p>\n<p><strong>Prompt 1:<\/strong><\/p>\n<pre><code class=\"bicep\">Get me the details of the test cases (do not action anything yet, just give me the details of each test case).\n\nTest Information:\n\n*   ADO Organization: Org_Name\n\n*   Project: Project_Name\n\n*   Test Plan ID: Test_Plan_ID\n\n*   Test Suite ID: Test_Suite_ID\n\n<\/code><\/pre>\n<p>After Copilot gets the details for each test case, via the MCP server, use the following prompt:<\/p>\n<p><strong>Prompt 2:<\/strong><\/p>\n<pre><code class=\"bicep\">Imagine you are an experienced Software Engineer helping me write high-quality Playwright test scripts in TypeScript based on the test cases I provided. Please go over the task twice to make sure the scripts are accurate and reliable. Avoid making things up and do no hallucinate. Use all the extra information outlined below, to write the best possible scripts, tailored for my project.\n\n# Project Context\n\nLook at the \"Project_name\" folder, to get more insights (if your project is quite large, use the below section to be more concrete and reference specific folders\/files).\n\nMy project structure includes:\n\n*   Authentication helpers: \/\/*Add\/folder\/path*\n\n*   Existing sample tests: \/\/*Add\/folder\/path*\n\n*   Playwright config: \/\/*Add\/folder\/path*\n\n*   Test Structure: \/\/*Add\/folder\/path\/test-1656280.spec.ts*\n\n*   The project\u2019s UX components are in the following folder: \/\/*Add\/folder\/path*.\n\n# Test Structure Requirements\n\nFor each test, please follow this structure:\n\n1.  Clear test description using *'test.describe()'* blocks\n\n2.  Proper authentication setup before any page navigation\n\n3.  Robust selector strategies with multiple fallbacks\n\n4.  Detailed logging for debugging\n\n5.  Screenshot captures at key points for verification\n\n6.  Proper error handling with clear error messages\n\n7.  Appropriate timeouts and wait strategies\n\n8.  Verification\/assertion steps that match the test case acceptance criteria\n\n# Robustness Requirements\n\nEach test should include:\n\n1.  Retry mechanisms for flaky UI elements\n\n2.  Multiple selector strategies to find elements\n\n3.  Explicit waits for network idle and page load states\n\n4.  Clear logging of each test step\n\n5.  Detailed error reporting and screenshots on failure\n\n6.  Handling of unexpected dialogs or notifications\n\n7.  Timeout handling with clear error messages\n\n# Environmental Considerations\n\nThe tests will run in:\n\n*   CI\/CD pipeline environments\n\n*   Headless mode by default\n\n*   Potentially with network latency\n\n*   Different viewport sizes\n\n# Example Usage\n\nPlease provide a complete implementation with:\n\n1.  Helper functions for authentication and common operations\n\n2.  Full test implementation for each test case\n\n3.  Comments explaining complex logic\n\n4.  Guidance on test execution\n\n# Authentication Approach\n\nIn order for the tests to be executed, we need to authenticate the application. Use the below auth approach:\n\n\/\/{you need to define the authentication steps \u2013 if this is already defined for your project, instruct Copilot how to use it. If your scenarios do not require auth, you can remove this part from the prompt.}\n\n# Configuration Reference\n\nFor timeouts, screenshot settings, and other configuration options, please refer to:\n\n\/\/{Add a reference to a specific file, etc. for better context}\n\nI want these tests to be maintainable, reliable, and provide clear feedback when they fail.<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>In today\u2019s fast-paced software development cycles, manual testing often becomes a significant bottleneck. Our team was facing a growing backlog of test cases that required repetitive manual execution\u2014running the entire test suite every sprint. This consumed valuable time that could be better spent on exploratory testing and higher-value tasks. We set out to solve this [&hellip;]<\/p>\n","protected":false},"author":176848,"featured_media":71385,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1,252],"tags":[7287,7262],"class_list":["post-71324","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-devops","category-testing","tag-automated-testing","tag-azure-devops"],"acf":[],"blog_post_summary":"<p>In today\u2019s fast-paced software development cycles, manual testing often becomes a significant bottleneck. Our team was facing a growing backlog of test cases that required repetitive manual execution\u2014running the entire test suite every sprint. This consumed valuable time that could be better spent on exploratory testing and higher-value tasks. We set out to solve this [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/devops\/wp-json\/wp\/v2\/posts\/71324","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/devops\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/devops\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/devops\/wp-json\/wp\/v2\/users\/176848"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/devops\/wp-json\/wp\/v2\/comments?post=71324"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/devops\/wp-json\/wp\/v2\/posts\/71324\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/devops\/wp-json\/wp\/v2\/media\/71385"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/devops\/wp-json\/wp\/v2\/media?parent=71324"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/devops\/wp-json\/wp\/v2\/categories?post=71324"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/devops\/wp-json\/wp\/v2\/tags?post=71324"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}