
By July 2025, digital accessibility will no longer be a goal … it’s a requirement. With AI and cutting-edge technologies, we can simplify the journey toward inclusivity and legal compliance.
This piece explores both the why behind accessibility and the how of using AI to help. We’ll delve into a technical proof-of-concept, discussing the architecture and specific tools like Semantic Kernel and OmniParser. Expect a moderate level of technical detail suitable for developers and AI practitioners looking into practical applications.
The Importance of Accessibility
A moral imperative and a legal obligation – accessibility has become a fundamental requirement for digital products and services. According to Eurostat, 101 million or 1 in 4 adults in the European Union have a disability (Source: Europa Consilium). This number is expected to rise as the population ages. These individuals often encounter barriers that hinder equal participation in education, employment, and everyday life. At the heart of the European Accessibility Act (EAA) and Germany’s Barrierefreiheitsstärkungsgesetz (BFSG) is the commitment to eliminate these barriers in the digital space. These aren’t just abstract values, either, as they come with enforceable deadlines.
Why Accessibility Design is Critical Today
Statistical evidence highlights why designing accessible websites is essential and not just for compliance, but to create meaningful digital experiences and also a competitive advantage. For example:
- 70% of users with disabilities will abandon a website if it isn’t accessible (Source: Click-Away Pound).
- More than 2.2 billion people worldwide have visual impairments, which means over 27% of the global population can benefit from using screen readers to navigate the web (Source: Microsoft).
- Businesses could unlock a market of $13 trillion by embracing disability inclusion, with web accessibility being a crucial component of this strategy (Source: Forbes).
- Digital products that fully comply with WCAG 2.0 are expected to have a 50% higher market performance than their competitors (Source: Gartner).
- Businesses with non-compliant digital platforms risk legal challenges, fines, and reputational harm. In 2024, in the US alone, there were over 4,000 lawsuits related to non-compliance (Source: accessibility.works).
There are many more statistics that underscore the importance of accessibility, but the bottom line is clear: accessibility is a strategic imperative.
The Role of AI in Accessible Development
AI offers the potential to automate, streamline, and scale the analysis of websites against international accessibility guidelines like WCAG (Web Content Accessibility Guidelines). By leveraging machine learning and natural language processing models, AI-powered tools can examine web content for typical accessibility issues such as:
- Missing or insufficient alt text for images.
- Improper semantic structure in HTML or ARIA (Accessible Rich Internet Applications) tags.
- Inaccessible user journeys, such as broken keyboard-only navigation paths.
- Form fields without proper types, labels, or instructions.
In addition to analyzing the content, AI can interpret visual elements such as layout, color contrast, and text size to identify potential issues that could impact users with visual or cognitive impairments. Using AI tools (e.g. Microsoft’s OmniParser) we can visually extract and analyze the elements of a website and with this assist the language models in understanding the context of the content.
Analyzing Accessibility
As part of our quarterly innovation days at Xebia(opens in a new window), my colleagues and I came up with the idea to explore how AI could help us analyze the accessibility of a website as of today. The idea was to leverage modern tools and frameworks to identify accessibility issues and provide actionable insights. The goal was not just to detect problems, but to demonstrate how the findings from multiple agents could be combined into a comprehensive accessibility report. At the end of that day we had a working proof of concept that could analyze a website for accessibility issues.
Overview of the Solution
The solution we built is centered around two major components, where the analysis part takes the spotlight (we’ll dive deeper into the details later):
- Capturing the website content and visuals. Using a headless browser, we obtain the fully rendered HTML source code and a screenshot of the website.
- Analyzing the captured results. This involves using Semantic Kernel’s process framework to orchestrate the analysis workflow with multiple specialized AI agents.
For the Proof of Concept, we selected a set of analyzers to focus on specific WCAG criteria that demonstrate the potential of this system. The POC includes the following analyzers:
- HTML Content Analysis:
- Alt text analysis.
- Semantic code validation.
- Keyboard navigation checks.
- Form validation.
- Visual Analysis:
- Color contrast analysis.
- Font size and readability evaluation.
Instead of building a complete testing suite to cover every WCAG criteria, we intentionally scoped the POC to include these six analyzers. This allowed us to test the concept and validate the architecture without overwhelming scope or complexity.
To house the solution, we implemented it as a Minimal API(opens in a new window) using .NET 9. Minimal APIs in .NET provide a lightweight framework for quickly developing APIs with just the core essentials.
From a high-level perspective, the process looks like this:
Capturing the Rendered Website
A simple HTTP request to the URL of the website is not enough to retrieve the website’s content or take an accurate screenshot. Websites today often rely heavily on client-side JavaScript for rendering dynamic elements such as buttons, menus, or even text content. To ensure the page is rendered completely as intended, we need to simulate a user environment via a headless browser.
For this Proof of Concept, we used the Playwright library(opens in a new window). Playwright loads the website in a headless browser, giving JavaScript the chance to execute and ensuring a fully rendered page. Once rendered, the HTML source is captured along with a full-page screenshot.
Structured Data and LLM
The structured data generated by OmniParser, along with the screenshot captured by Playwright, forms the foundation for the visual analysis. By including the structured data in the LLM’s prompt alongside the screenshot, we aim to give the LLM additional contextual information about the website’s visual elements, such as their content, placement, and interactivity.
While we hypothesize that this dual-input approach enhances the LLM’s ability to evaluate accessibility issues, this has not yet been conclusively tested. In theory, the structured data could provide explicit spatial relationships and content details, helping the LLM analyze visual elements more comprehensively. Combined with the screenshot, these inputs might allow the LLM to cross-reference visual and semantic contexts to identify potential issues with layout, contrast, or interactivity.
Working with the Results
To ensure that the output from each analysis step is structured and easy to process, we include a system message in every prompt that defines the expected output format. This approach standardizes the results from each analyzer, making it easy to deserialize the data.
Once all analysis steps and their sub-processes are complete, the parent process aggregates the results from both the HTML Content Analysis and Visual Analysis sub-processes. The final output is a structured JSON object that includes:
{
"analysis": [
{
"id": "2.1.1", // WCAG success criterion reference, e.g., "1.1.1"
"error": "Code.Keyboard.Focusable", // unique error code, 'Code' if html was analyzed, 'Visual' if screenshot was analyzed
"category": "Keyboard Accessibility", // category of the agent performing the analysis
"description": "Element is not keyboard-focusable.", // Very brief description of the issue, acts as the title
"detail": "All interactive elements must be accessible via keyboard navigation.", // Description of the issue
"location": "Line 10, Column 20: <span id=\"foo\">Bar</span>" // Location of the issue in the code, or a description where in the screenshot if a screenshot is provided
}
]
}
To help illustrate how the results appear in practice, here’s an example of the JSON output generated by the analysis process:
{
"analysis": [
{
"id": "1.1.1",
"error": "Code.Image.MissingAlt",
"category": "Alt Text for Images",
"description": "Meaningful image lacks alt text.",
"detail": "The image at <img src=\"og-image.png\"> is meaningful but does not have alt text. Add descriptive alt text.",
"location": "Element in meta tag with attribute: property='og:image'",
"information": []
},
{
"id": "2.4.3",
"error": "Code.Keyboard.TabOrder",
"category": "Keyboard Accessibility",
"description": "Incorrect Tab Order value used.",
"detail": "`tabindex=\"-1\"` on non-modal elements disrupts logical navigation as it removes them from the tab order.",
"location": "Line 10: <span tabindex=\"-1\"></span>",
"information": []
},
{
"id": "3.1.2",
"error": "Forms.Missing.Label",
"category": "Form Accessibility",
"description": "Missing form labels.",
"detail": "The 'Search' button lacks a proper label or an aria-label to describe its purpose.",
"location": "Line containing `<button aria-label='Search'>`.",
"information": []
},
{
"id": "1.3.1",
"error": "Forms.Placeholder.Misuse",
"category": "Form Accessibility",
"description": "Placeholder as label.",
"detail": "The 'Search' button uses the placeholder 'Search' as the only means of labeling, which is an incorrect accessibility practice.",
"location": "Line containing `<span class='DocSearch-Button-Placeholder'>Search</span>`.",
"information": []
},
...
],
"summary": {
"failed": 12
}
}
The list of analysis results is returned to the API function called by the user and sent back as a structured JSON response. Our Proof of Concept ends here, and we did not implement a UI to visualize the results. However, the JSON output can be easily used in a front-end framework or reporting tool to display the findings.
Conclusion
This POC demonstrated the potential of combining Semantic Kernel, OpenAI models, and tools like OmniParser to automate website accessibility analysis. The solution achieved solid results as a foundational starting point, particularly in analyzing HTML structure and visual elements for WCAG compliance.
We relied on Semantic Kernel to handle the orchestration of the analysis workflow. Its process framework simplified the implementation of the multi-step process by enabling state management, event handling, and coordination between agents. This modular design made it easy to develop and expand the system without introducing unnecessary complexity.
The OpenAI models performed well in certain areas, such as diagnosing alt text issues and semantic code validation, but they were not without limitations. Token limits presented challenges when sending larger sections of the HTML source to multiple agents, requiring careful adjustments to avoid truncating important content. To control costs, we transitioned from GPT-4 to GPT-3.5 during development. Both GPT-4 and GPT-3.5 occasionally produced hallucinations, such as generating non-existent HTML code. These issues show the need for additional layers of validation when using language models for tasks that require high accuracy.
There were also some unexpected challenges with integrating OmniParser. The instance hosted on Hugging Face(opens in a new window) returned errors during testing, which required us to self-host OmniParser to proceed.
Despite these challenges, the results showed that AI can – at this stage – uncover a range of accessibility issues and provide insights that are actionable for developers. The structured JSON outputs from our system lay the groundwork for integration into frontend tools or reporting platforms.
What Could Be Improved
While the Demo showed promising results, there are several areas that could be refined for future iterations, including:
Specialized Models for Accessibility Analysis
Currently, general-purpose language models like GPT-3.5 and GPT-4 are being used for analysis. While these models performed well, developing or fine-tuning specialized models focused specifically on accessibility issues could improve accuracy and reduce the chances of hallucination. A model trained on accessibility-specific datasets would also be better equipped to handle nuances in WCAG compliance.
Optimized Token Usage
Token limits were one of the key challenges we encountered when processing larger sections of source code in parallel. This limitation required careful adjustments to ensure no critical information was truncated before analysis. One way to address this issue might be to minify the HTML code prior to sending it to the models, reducing token usage. However, further testing would be needed to verify that this doesn’t degrade detection accuracy, especially for scenarios where formatting or element relationships play an important role.
Another solution is to explore Large Language Models with larger context windows. For example, newer models like Gemini 2.0 Pro offer a context window of up to 2 million tokens(opens in a new window), far exceeding the limits of GPT-3.5 or GPT-4.
In addition, Azure OpenAI Service provides flexibility in managing token usage. Token rate limits can be adjusted via Azure AI Foundry for the selected LLM deployment, allowing developers to fine-tune the settings to better suit specific workloads. By tweaking these settings, it’s possible to mitigate some of the practical limits imposed during input processing and optimize workflows for more complex, large-scale analysis.
Discovering Playwright MCP
While writing this article, I came across Microsoft’s Playwright MCP repository(opens in a new window), which provides LLMs with the ability to analyze and interact with web pages using Playwright.
MCP (Model Context Protocol) acts as a standardized way for AI applications to connect with data sources and tools. Anthropic describes MCP as a “USB-C for AI applications”(opens in a new window), simplifying how LLMs exchange context with external systems.
This would be a whole article on its own.
Playwright MCP allows LLMs to navigate web pages, fill out forms, click buttons, and extract data. A key feature is its ability to use the accessibility tree of a webpage, which represents UI elements as seen by assistive technologies. This lets the tool analyze UI interfaces without requiring the LLM to process raw HTML directly. Playwright MCP also supports screenshot-based analysis through a visual mode, making it a flexible option for different types of workflows.
Given its ability to interact with the accessibility tree, Playwright MCP could complement OmniParser as part of future iterations. By providing a direct, standards-compliant mechanism to evaluate UI elements, it reduces reliance on standalone visual parsing tools or reliance on HTML source code, potentially simplifying the architecture while improving results.
Definitely worth checking out!
Final Thoughts
With the July 2025 deadline for compliance with accessibility standards in the EU fast approaching, the need to prioritize digital accessibility has never been greater. AI-driven tools, such as the ones explored in this POC, can play a growing role in supporting developers and accessibility experts by identifying potential issues and providing actionable insights.
While AI is not a replacement for static analysis tools or human expertise, it can complement existing efforts by detecting areas that might otherwise be overlooked and speeding up the auditing process. Despite its current limitations, the progress achieved here demonstrates how AI can meaningfully contribute to accessibility analysis today – and its role is only set to grow as we refine these tools further.
All the code can be found on GitHub: https://github.com/xebia/AccessibilityAnalyzer
0 comments
Be the first to start the discussion.