All countries should have a central registry on how accessible it’s public sector is in my opinion. It would most definitely save a lot of time when such registries would then be used to compare overall accessibility (provided they used same methodology). This opinion of mine is tightly connected with an eGovernment Benchmark analysis because I think some of benchmark could be much more efficient with a central database. Automatic accessibility testing is one of them for sure.
The eGovernment Benchmark analyses on eGovernment in the 27 European Union Member States but also in Iceland, Norway, Switzerland, Albania, Montenegro, North Macedonia, Serbia and Türkiye.
This study evaluates online public services on four dimensions, which consist of 14 underlying indicators, broken down into 48 survey questions. Accessibility is not a factor yet, but was added as a side information (pilot indicator). This post discusses only accessibility though. Although it’s obvious that other parts are also important, we wanted to focus on the accessibility alone.
eGovernment Benchmark 2023 includes a pilot benchmark for accessibility
eGovernment Benchmark 2023 (opens in new window) includes a lot of data and their comparison and one of them is also accessibility. Accessibility is a so called pilot indicator which means that it is not playing decisive role in the actual benchmark eGovernment for now and they just tried to check if it is doable (and we can therefore speculate it will possibly be included in the next benchmark).
Eight in ten eGovermnent websites violate one or more of the selected WCAG criteria
15.000 eGovernment websites were assessed on 8 out of 50 WCAG 2.1 success criteria. We will come back to the actual WCAG success criteria assessed, but for now let’s check what was stated.
80% (approximately 12.000) of all public sector websites defined as eGovernment in all 35 countries are not accessible and violate at least one or more selected WCAG criteria. Due to limitation of automatic accessibility testing only 8 out of 50 WCAG 2.1 success criteria was being tested. This does not mean that 18% are accessible as only manual testing can establish if they really are and that requires checking for all 50 of success criteria, but it does still provide an indication of overall accessibility.
# | Country | % |
---|---|---|
1 | Norway | 84 |
2 | The Netherlands | 83 |
3 | Sweden | 78 |
4 | Finland | 73 |
5 | Danmark | 67 |
6 | Luxemburg | 57 |
7 | Spain | 51 |
8 | Austria | 47 |
9 | Poland | 46 |
10 | Hungary | 44 |
11 | Germany | 39 |
12 | Ireland | 35 |
13 | France | 34 |
14 | Italy | 34 |
15 | Malta | 33 |
16 | Estonia | 33 |
17 | Belgium | 32 |
18 | Czech Republic | 20 |
19 | Portugal | 18 |
20 | Slovenia | 17 |
21 | Switzerland | 17 |
22 | Cyprus | 16 |
23 | Latvia | 16 |
24 | Croatia | 16 |
25 | Bulgaria | 15 |
26 | Iceland | 15 |
27 | Slovakia | 13 |
28 | Lithuania | 8 |
29 | Greece | 6 |
30 | Montenegro | 4 |
31 | Romania | 4 |
32 | Albania | 0 |
33 | North Macedonia | 0 |
34 | Serbia | 0 |
35 | Türkiye | 0 |
Table 1 was made out of the bar chart provided in the benchmark. As mentioned the percentages don’t mean that rest of webpages are accessible, it only means that the specific failures were not discovered.
8 selected WCAG success criteria used in the analysis
The Benchmark claims they tested 8 WCAG 2.1 A and AA success criteria, with the help of axe browser extension. We will come back to these claims later in the post, but for now let’s check what was stated.
WCAG success criterion | % |
---|---|
WCAG 1.1.1 – Non-text Content | 71 |
WCAG 1.4.3 – Contrast (Minimum) | 48 |
WCAG 2.4.2 – Page Titled | 99 |
WCAG 2.4.4 – Link Purpose (In Context) | 54 |
WCAG 3.1.1 – Language of Page | 85 |
WCAG 3.1.2 – Language of Parts | 84 |
WCAG 4.1.1 – Parsing | 67 |
WCAG 4.1.2 – Name, Role, Value | 100 |
Table 2 (converted from bar chart) indicates that all 18% of 15.000 (2.700) eGovernment websites passed WCAG 4.1.2 and almost all passed WCAG 2.4.2.
Now that we have the numbers I would like to point out some important details that will hopefully make more sense and understand the overall situation in this accessibility benchmark.
Some views of mine on the accessibility pilot indicator
It was not totally clear from the provided documentation if testing of websites relied solely upon the results from axe browser extension or if somebody actually checked if they were reliable and not false positives. Automatic accessibility testing is far from perfect. Sometimes it can report false positives and sometimes also false negatives. So trusting the tool 100%, with no manual check can sometimes produce incorrect results. Sure, it’s our intention to prevent this and axe is doing an amazing work in that respect, but bugs do happen and axe is not an exception.
Therefor I think it’s important to reflect on some relevant parts of the report, explain potential problems and suggest some improvements.
8 selected WCAG success criteria are not really covering full success criteria but only parts of it
Automatic accessibility testing can’t test all of WCAG. It can only discover some WCAG failures. It’s also not possible to claim that whole WCAG success criterion is passed if we only use automatic accessibility testing. We can discover failures but we can’t claim passes when we use automatic accessibility testing alone. The only possible exception is the WCAG 4.1.1 Parsing success criterion. But with WCAG 2.2 we also got an update to WCAG 2.1 that defines 4.1.1 as always passing for websites, so it’s actually not relevant anymore. It was relevant at the time of the analysis though.
I will provide some examples to make it more understandable.
- 1.1.1 Non-text Content can’t be passed with only automatic accessibility testing. Alternative texts need human verification and the tests can only catch missing attributes, suspicious alternative texts (like filename) and very obvious failures.
- 1.4.3 Contrast (Minimum) can’t be passed with only automatic accessibility testing as there are complex situations in code that are difficult to check even for experienced accessibility auditors. Once again we can only discover obvious failures and even those can in some cases be false negatives.
- 2.4.2 Page Titled is once again not possible to be checked with automatic testing alone. The title has to be present, that is simple for automatic test, but is the title also descriptive? It takes a human to verify that, at least for now.
- 2.4.4 Link Purpose (In Context) is very similar to 2.4.2 and again not quite possible to pass it with only using automatic accessibility testing.
- 3.1.1 Language of Page and 3.1.2 Language of Parts are sometimes quite complex to analyze, even manually. If they are set is one thing, but are they set correctly? Do they reflect the reality? Especially in the case of 3.1.2 it can be impossible to rely solely on the automatic test.
- 4.1.2 is still very difficult to detect solely with automatic accessibility testing as those seldom compare the visible user interface with code. At least free solutions like axe browser extension don’t do it. So stating that all sites with no errors passed 4.1.2 can’t be realistic either.
Once again we need to be careful when using automatic accessibility testing. It can discover up to approximately 30% of WCAG failures, but it can almost never be the only way of passing them. I notice that it’s common misunderstood and it’s also not very clear from the eGovernment Benchmark.
When we need to be certain to come with conclusions of passing the WCAG we need to rely not only on automatic but also manual tests. Even with promising, so called artificial intelligence, context-aware, and computer vision enabled tools, we are still waiting for an automatic accessibility testing that don’t need human verification.
Checking webpages manually with axe extension was a big loss of time
According to their Method Paper (PDF, opens in new window) accessibility testing results were made by using a browser extension manually, page by page. The methodology doesn’t mention results were interpreted and checked manually by accessibility specialists, so we can conclude that all results were just written to a central data storage (seems that it was a spreadsheet). This means that somebody needed to open a webpage, run the extension and then save or in worst case manually note the data.
That could easily be automated and would in my opinion save a lot of time (and money). Once again – a central registry would be ideal in this case as well. Just querying the registry based on common factors would really make this a simple task, provided such registries would exist and have quality data.
Conclusion and suggestions
Don’t present a single test as whole WCAG success criterion
Automatic accessibility testing includes multiple dozens specific tests, depending on the tool. Those tests cover parts of WCAG success criteria, but as mentioned they can’t cover the whole success criterion.
Therefore, presenting a single test as passing the whole WCAG success criteria isn’t realistic. Passing a single test out of many more can’t assure whole WCAG success criterion is passed. We should be aware of this when we present results like this.
Don’t test manually when you can automate
Using browser extension instead of running a script that automate automatic accessibility checking is a waste of time. Running a script takes some seconds, opening the webpage, running extension and then taking notes takes a minute and more at minimum.
Besides obvious room for human error when writing down the results (it’s not mentioned if they used a tool to collect the results from the extension or not, so I speculate they had to write things manually) we also save a lot of time when we automate tests that can be automated. With default axe-core engine that powers the browser extension it would be quite simple to write a script that would test and record accessibility issues automatically. Saved a lot of time and money and also preventing human errors.
Use central registry for accessibility testing results
As mentioned in the beginning such reports could be much more efficient if countries would use central registries and check for accessibility regularly. Such a registry would enable reports, statistics, trend changes and more at a fraction of time that is used when analysis and reports need to be custom made.
I hope that eGovernment Benchmark 2024 will include accessibility as a key indicator. Furthermore I also hope that they will improve the methodology and reporting, automate automatic accessibility testing and perhaps even cooperate with existing registries that are already doing same accessibility testing on a country scale.
Nevertheless I am happy that after years of Web Accessibility Directive accessibility will finally get more attention and with that, I hope, also better awareness and execution.
Just wondering how these numbers are calculated. For example, if a website has non-text content on most pages, but fails on 2% of them, does the site fail on this checkpoint? We all know that accessibility is a continuum and not just a binary yes/no, so knowing a little more about the methodology would be helpful.
Hi Tim, thank you for your comment.
According to the Method paper, chapter 5.3.3. Accessibility foundations, they used axe extension to test each URL.
In the eGovernment Benchmark 2023 – Non-scored Indicators document we can also read:
Axe extension reports all failures per URL, so I think that a single failing success criterion on a single URL is enough to say that the whole website fails it.
This is also very typical for WCAG Evaluation Methodology version 1.0 (latest W3C Working Group Note from 10th of July 2014 and the best W3c offers at the moment).
You have mentioned a more realistic methodology that is similar to WCAG 3 one, but it is still in early draft and can change, so we will have to wait years for it to be a W3C Recommendation…
Dear Bogdan,
thank you for this research. Can I ask you to write Belgium instead of Belgia ? I would like to share the results of your work, but the Belgian government may not be happy with the country name cited as such 😉
Régine, thank you for pointing this out, totally understand.
I’ve fixed Belgium and some other names 😉.