AI will soon deliver code that will pass automatic testing by default

Thanks to Michael Fairchild and Microsoft, we now have some accessibility benchmarks for a selected range of AI models – where they were tested with automatic accessibility tools after generating sample code. Project is called A11y LLM Eval (opens in new window), and it provides quite some insights, both directly and indirectly.

When I checked the shopping home page results I was pleasantly surprised – as we need to consider that methodology strictly did not prompt AI to make things accessible. Funny how shopping home page was selected, it seems that I am not the only one thinking that shopping websites are worst when it comes to accessibility.

I will not comment on the differences between models, you can check that out by yourself, but it is evident that Open AI leads. It will forever be a so called moving target, model comparison, so it is amazing to have this data (and a possibility to refresh it, expand and add models), I only wish that stakeholders with different model providers try to compete here as well and that accessibility will not be as forgotten as it sometimes sadly is. Perhaps this benchmark will be some kind of a motivator, if we spread the awareness enough.

Color contrast failures are (still) most common

Please take a moment and consider some important things here.

First, automatic accessibility testing tools have a limited capability of testing for accessibility. Therefore, when we say that color contrast is the most common failure we need to be aware that automatic testing just offers limited set of rules. If we had the time to manually check all the sites we could find that some other issue is more common. At least theoretically.

The other thing is that AI (well, LLMs) are not doing math directly – so I really doubt that they do the “thinking” before they “define” the colors (without being explicit about it in the prompt at least – and even then I would not trust it to respect the color contrast calculations). And at the same time, I also suspect that they just learned from bad code anyways (when we consider that poor contrasts are everywhere).

So, I am not surprised, but I also see that we can (and must) easily do something about it.

Helping AI to deliver better accessibility, again

As mentioned, I suspect that AI is not doing the math that would prevent poor contrasts, but we have the tools to help AI do it. I believe that AI vendors would need to make sure of that on their side, that would be best and most effective, but until we wait for that (sorry to say that we must not hold our breath), we can use Model Context Protocol (MCP) tools that would check the contrast ratios before AI delivers the code. Not only for contrasts, but seems that it struggles with contrasts a lot, so we must include contrast.

Google DevTools MCP is a good start to provide AI with better context, but it’s not enough.

We can have dedicated MCPs for the contrast (I think that we should consider to use a WCAG-first, but we could perhaps be smart and also use Accessible Perceptual Contrast Algorithm (APCA, opens in new window) in a way that AI prioritizes APCA but makes sure color combinations do not fail WCAG, to deliver conformance.

This kind of MCP is only useful if we do not have an accessible design system at hand, and if we have the freedom of color selection, of course. If we have an accessible design system at hand, there are multiple ways to just use it with the help of MCP and, ideally, never worry about color contrasts again (as they should be checked and implemented already in the design system).

A reminder to conclude – human knowledge is essential

Satisfying automatic accessibility testing is nice, just like passing the syntax linting checks of your code. But passing linting does not mean you do not have bugs or security issues. And passing automatic accessibility testing just means that your code survived some dozens of static code pattern checks – it does not mean that website is actually accessible. And it does absolutely not mean that it is usable. We still need humans to check and test and verify. Even if AI agents can use the browser, AI still benefit from good semantics and beyond.

Sure, more things can be checked with AI that go beyond automatic testing default code syntax checks. We can check the before and after states of components and have a bit better coverage, we can check the language of the page and parts of it with AI, we can check more and more things, but at the end we still need human knowledge in all stages – from design to code, to content, to testing and beyond.

And to conclude – people with disabilities don’t care if website is passing automatic tests, or even if it is technically conforming to the WCAG. They came to do or learn something, and we, that want more people to do or learn something, we need to bear the responsibility. With AI or without it.

Author: Bogdan Cerovac

I am IAAP certified Web Accessibility Specialist (from 2020) and was Google certified Mobile Web Specialist.

Work as digital agency co-owner web developer and accessibility lead.

Sole entrepreneur behind IDEA-lab Cerovac (Inclusion, Diversity, Equity and Accessibility lab) after work. Check out my Accessibility Services if you want me to help your with digital accessibility.

Also head of the expert council at Institute for Digital Accessibility A11Y.si (in Slovenian).

Living and working in Norway (🇳🇴), originally from Slovenia (🇸🇮), loves exploring the globe (🌐).

Nurturing the web from 1999, this blog from 2019.

More about me and how to contact me:

View all posts by Bogdan Cerovac

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30