How I imagine a modern automatic accessibility testing tool

(Read 409 times)

What would I have in an automatic accessibility testing tool if I could have anything that is possible with today’s technology?
Well, I would start at the beginning – clear scope and known priorities is a start and sometimes we can’t really cover all that when we have to choose where we need to focus. Next, I would like to teach the tool, so that it will be more and more independent. And because I like to stand for my decisions – I would like to use the blockchain to prove my efforts and fixes. Words can be empty, deeds talk.

Blockchain, artificial intelligence, machine learning, deep learning, neural networks and other “hot” technologies offer so much. But they are still not our daily companion, at least not to their full potential.

So I wanted to brainstorm a bit around these possibilities applied to automatic accessibility testing. How could we leverage them to help us build tools that can go beyond current WCAG and accessibility coverage?

We need help with scope on sites that we don’t know well or don’t have the data about

When I do audits it makes sense to map the target scope and WCAG-EM offers some good points, but when we don’t know anything about the site we have to audit we can still miss very important parts.

So my first idea for an improved tool would be to help me with scope. That would be fairly easy to achieve with normal crawling and static code analysis, but adding some simple machine learning to it would get us way further along.

We could try to map page complexity with help of computer vision and find out about different possible problems, collect their screenshots together with code and mark them for manual review.

Defining priorities of the audit can be quite difficult, sometimes even impossible. There are customers that define the scope for us, but when we audit different success criteria we notice that we can’t audit everything because we don’t know. Are there forms on the site? Lists? Quotes? Tables and so on. If we need to find them manually we can use a lot of time only to find them.

So automatic tools can save some time already in the scoping phase. Defining what to test after getting scope figured out is way more efficient.

Supervised accessibility auditing to make fully automatic audits with enough data

Manual review is a fact. We don’t have the tools that would run and just provide us with factual results. I’ve written about problems with automatic accessibility testing and we still need to manually check most of the results.

We often concentrate on individual components where duplicates used on many pages would be removed to save time and focus on instances, not cases. This kind of review would help the algorithms to get their samples for reinforced learning, for future similar cases.

In practical terms, let’s imagine a datepicker component that’s not made semantically but our tool can quickly find that there are multiple onclick events on it and also that it has to do something with getting date input from users.

Manual review marks the problems, and everything is saved in the database – the HTML, the CSS, the screenshot and also the existing accessibility tree. We also add code examples on how to fix issues. Everything is then available next time the tool discovers a similar pattern / component.

Ideally, with enough data, we would get to a point where tools can mark failures of similar components and maybe even test them out with keyboard simulation and mouse simulation. That would extend the tools capabilities to detect problems and at least flag them for manual verification. Or maybe even flag them as absolute failures provided good enough probability.

Mixing user interface graphics with code, structuring the data would improve future audits and increase stability and precision. It would need time and manual supervision but in the long run we could make a tool that really helps us. A tool that pinpoints the worst problems, filters out obvious ones, and makes us focus on real problems.

In theory we could maybe connect our user experience monitoring (heatmaps, completion rates, failures) with such a tool and feed it user experience data as well, to flag additional problems that were actual barriers to actual people. That would be somehow delicate concerning personal data, but on generic websites it could maybe be viable. Feeding algorithms with both expert audits and end user experiences would much more efficiently prioritize problem solving.

Before my imagination goes too far I would like to list some things that should not be too complicated to analyze;

  • Language & language of parts – should be very easy to check if language in code matches language in content,
  • Expand collapse missing semantics – testing component and checking if it expands / collapses visually and then verifying the semantics and accessibility tree,
  • Button vs link – empty links or links with no real destination, used for triggering javascript should be buttons, and buttons navigating should be links,
  • Text in images – computer vision, character recognition and some machine learning would flag such images,
  • Contrast interactive elements – taking into consideration different screen sizes (media queries in CSS and some generic sizes would for sure cover much of it), flagging it for manual reviews when in doubt,
  • Captions real time ok – video and audio processing is getting cheaper and more efficient – so it could be possible to verify that captions are done well, or at least flag problems,
  • Reflow problems – checking the visuals on specific screen dimensions that are defined in the WCAG and marking problems and potential problems,
  • Alternative texts interactive elements – sometimes could be done automatically, checking icons that are missing alternative text to a dictionary of icons and their meaning and so on,
  • List missing semantics – checking visual lists that are not marked with semantic list HTML,
  • OCR/CV and accessibility tree comparison – to discover which parts of UI are not present in the accessibility tree or are missing semantics, 
  • Link text in context – are links conforming to WCAG on level AA? Checking context in programmatically detected vicinity,
  • Screen orientation – visual checks for problems based on screen orientation,
  • Zoom problems – visual checks for problems when zooming in,
  • “Rage clicks” to discover hidden interactivity, capture it and analyze differences – potentially finding hidden form fields, hidden multimedia, and other hidden content,
  • Identify input purpose – checking automatically if autocomplete was needed, if form fields is for example a type of tel and not text,
  • Keyboard traps check – simulating keyboard navigation and checking all possible paths, to catch eventual keyboard traps,
  • Keyboard navigate to all interactives – to check that all are accessible with keyboard,
  • Page titled based on context – text analysis to catch titles that are not valid for the page,
  • Focus order – visually checking focus order and finding problems,
  • Focus visible – visually checking all interactive elements got focus indication and that it had sufficient contrast,
  • Consistent identification – same functionality of components also identified consistently,
  • Labels or instructions – checking that they are there and suiting the context,
  • Name role value checks based on visuals – comparing general components in DOM, accessibility tree and visually and finding differences to mark them for audit.

These are just things that I would like to get from modern automatic analysis, based on my limited knowledge of machine learning and artificial intelligence, combined with computer vision, optical character recognition, supervised and unsupervised machine learning and so on.

Blockchain for reports and proof of truth

Blockchain could be used for providing proof of remediation based on accessibility statements, so that progress would be tracked by the technology that is making cheating very difficult if not impossible. We know that writing accessibility statements is easy and promising too much in them is even easier considering the efforts needed for us to really audit the whole system. So having a blockchain-backed log of fixes (and problems) would really add to our transparency. Just be sure to choose solutions that don’t overuse energy for doing it. Otherwise we are contributing to environmental problems and we don’t want to do that!

Conclusion – a lot of possibilities but still in the beginning

I know that companies are doing some of the things already, and I know that it will take time to make it efficient – as we need a lot of data first.

So the future seems bright, but please stay away from companies that are promising too much.

Automatic WCAG conformance by running a script is only possible if we have problems that can be detected and solved with the script alone.

That should make sense – if you can measure it you can (perhaps) a fix it. But with the current state of automatic accessibility testing it’s not possible to even guarantee the passes and sometimes even the failures of WCAG with scripts.

Human learning and experience are still the only path for now! But we can learn the machines to do some parts and build on it to be more effective together.

Author: Bogdan Cerovac

I am IAAP certified Web Accessibility Specialist (from 2020) and was Google certified Mobile Web Specialist.

Work as digital agency co-owner web developer and accessibility lead.

Sole entrepreneur behind IDEA-lab Cerovac (Inclusion, Diversity, Equity and Accessibility lab) after work. Check out my Accessibility Services if you want me to help your with digital accessibility.

Also head of the expert council at Institute for Digital Accessibility (in Slovenian).

Living and working in Norway (🇳🇴), originally from Slovenia (🇸🇮), loves exploring the globe (🌐).

Nurturing the web from 1999, this blog from 2019.

More about me and how to contact me: