Provide context and AI will be better, is kind of an logical and accepted concept when it comes to a lot of things, AI included. And I agree to a point, until I see obvious signs of models fighting me and context I provided.
I still do my homework and check different approaches beyond (hopefully better than) prompting – at least for a couple of models that I’ve seen are promising and affordable (and some open source). Markdown files as guardrails – some can also call them skills, model context protocol (MCP) with static rule analysis – or direct feedback loops from the browser, AI using agents forcing AI to consider accessibility (“skills” in the loop), Retrieval-Augmented Generation (RAG) with carefully selected sources, reinforced re-learning or amplification and so on.
And it helps, often, sometimes, until it does not. In a recent academic article called Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents? (opens in new window) I’ve even found that bolting on context manually can sometimes even harm end results (and increase inference costs).
Once again I feel that we, as the end users of the models, only have limited possibilities. That model owners/trainers are responsible for shifting accessibility to the left. Which is a hard thing for them to do – if they just crawl the web uncritically – they adopt it’s inaccessibility (and other biases, yes). When they don’t have accessibility specialists and people with disabilities on-board and in control they will not have a chance at shifting it left, at least not in a sustainable way.
Sure, crawling WCAG, the normative (and non-normative parts ?), crawling ARIA documentation, crawling Aria Patterns Guide (please check a post from Eric Bailey about LLMs and APG, opens in new window) and then feeding the model was most probably already a step done years ago. But it does not take much testing before we can see the negative effects of the built-in biases from other sources, seemingly obsolete and sometimes even suggesting bad practices, models not only hallucinating but also being totally wrong. Seems that their “weights” are not working as they should and that the sources are absolutely not verified by accessibility subject matter specialist.
I clearly see this as another missed opportunity of “shifting accessibility to the left”, and yes, you are right, not only accessibility, but a lot of other vital concepts that matter for humanity.
I am not sure (I suspect) if the big players in the accessibility field are making their own models from “scratch”, from verified sources, from their own manual audits that were peer reviewed, for example. I can only hope, so I still miss having that possibility. Perhaps it is right behind the corner, or perhaps it failed due to too little training data or other issues. Time will show.
Benchmarking models help, they also show that accessibility needs to cost less when it comes to AI
Different models are trained on different sources (that is another, often difficult subject I will not dive into here, but I wish author rights would be respected as a default), and competition is fierce – making authorship even less respected, I am afraid. Besides that we also need to advocate louder for accessibility benchmarking and for improved regulative. Some regulations I have seen, like for example EU AI Act are too wide when it comes to accessibility and I wish they would promote this shift left, but we need way more voices on different levels to hope for that, I am afraid.
AIMAC: The AI Model Accessibility Checker (opens in new window) and other benchmarks are extremely useful and it looks like AI will soon deliver code that will pass automatic testing by default, but unfortunately it seems that this will only be true when we use the more expensive models (at least for some time, luckily some exceptions are already visible). Sure, we can have some kind of context-oriented orchestration in place that selects between different models, to optimize for costs and quality. But that is again just healing of the symptoms because accessibility was not integrated.
And here, dear reader, I miss the regulators. As a human right – accessibility must not be only reserved for expensive AI models. This goes against shifting it left and it means that AI will spread inaccessibility exponentially if it is not also financially accessible.
Therefore I hope that benchmarking, together with regulative, will help shifting accessibility more to the left when it comes to making/training the models. Benchmarking can currently be used for cost optimization, but it should be used to improve the accessibility results for all models. Due to the nature of both AI technology and accessibility itself, we will still need human verification, I do not believe that AI will independently just “work” for all scenarios soon, but it can help a lot and it can reach way better accessibility that we currently have if training is more aware. I think most of us actively curious, already recognize the positive impacts, but also the missing parts.
Context is still essential, but it seems that it needs to be shifted to the left, not only bolted on, and with this we are yet again facing similar issue as we already fight with, which tells a lot about the low accessibility maturity of big tech on a system level. Luckily, also because of legislation effects, like The European Accessibility Act and similar, we now have better momentum, and awareness is greater than ever, hope this will also have spill-over effects that will open hearts and wallets for AI model trainers.