Be careful when using generative artificial intelligence to produce code

(Loaded 1205 times)

AI is a fact. Some say it will improve accessibility, and I hope it will, but please consider the facts and new regulation.

Using Large Language Models (LLM) and other generative artificial intelligence (let’s call it AI for the sake of brevity) is without any doubt already becoming a normal everywhere, perhaps even more in the software development lifecycle. Integration happens on many levels, for example in code editors (IDE), version control, pull request automation, quality assurance, automatic testing and sometimes using AI web or app chat interfaces directly.

It seems that AI can’t be avoided anymore and that’s mostly great news for personalization of content, speed in prototyping and always having a helping and knowledgeable partner at our side.

A lot of blog posts go into prospects and I wanted to add my point of view on the matter, trying to be realistically optimistic and warn about possible dangers. After doing some tests with different workflows where I explicitly required that code needs to conform to WCAG 2.2 on levels A and AA, I observed some scenarios that are better than other.

None were good enough for me to suggest it to people that don’t have prior accessibility basic knowledge though. And that is the biggest problem. Ideally we shouldn’t even need to explicitly require accessibility – it should just be the default. But as AI learns from existing code that is not accessible – it can’t really do better. I hope that larger accessibility companies can help with supervised teaching of AI – feeding it with manually verified and accessible code that it can “learn” from. I’ve tested some products that are supposed to do that, but still discovered issues with random problems, that I wish wouldn’t occur. Sometimes subtle, sometimes obvious, but still issues that can multiply inaccessibility when used in real world.

Until AI can’t produce “stable accessibility”, we need to be careful. Perhaps that is obvious for most of people, but I still see over-promising like “production-ready” that I just can’t trust after testing with same inputs and getting different results (often problematic). Randomization, derived from so called temperature (opens in new window) seems to be essential for AI and the same randomization seems to be it’s problem.

We are already getting academic studies about AI and Accessibility (opens in new window) with similar conclusions, and we need more of them to systematize findings and improve awareness.

Use AI responsibly – even with regulation we need to do our part

With AI implemented in more and more products, we need to make sure it will not increase inaccessibility (even if some vendors market it as production ready we need to do our own due diligence).

I love the fact that legislation is recognizing negative effects of AI for people with disabilities. European AI Act (Regulation (EU) 2024/1689, opens in new window) – first regulation on artificial intelligence in EU – tries to protect people from discrimination.

I am not a lawyer and this is not legal advice, but here is one of relevant parts of AI Act that comes to mind:

It is therefore essential that providers ensure full compliance with accessibility requirements, including Directive (EU) 2016/2102 of the European Parliament and of the Council (38) and Directive (EU) 2019/882. Providers should ensure compliance with these requirements by design.

Part of Article 80 from EU AI act that refer to Web Accessibility Directive and European Accessibility Act.

I would really like to think that this does not only require AI interfaces to be accessible, but also their end products. Even if that is not the case (yet), we will anyway see the effects of Web Accessibility Directive and European Accessibility Act when product parts generated with AI become a part of the products (and services) that are in the scope of WAD and EAA.

As always – build knowledge and awareness, cooperate with accessibility specialists and people with disabilities, don’t just rely on AI!

Author: Bogdan Cerovac

I am IAAP certified Web Accessibility Specialist (from 2020) and was Google certified Mobile Web Specialist.

Work as digital agency co-owner web developer and accessibility lead.

Sole entrepreneur behind IDEA-lab Cerovac (Inclusion, Diversity, Equity and Accessibility lab) after work. Check out my Accessibility Services if you want me to help your with digital accessibility.

Also head of the expert council at Institute for Digital Accessibility A11Y.si (in Slovenian).

Living and working in Norway (🇳🇴), originally from Slovenia (🇸🇮), loves exploring the globe (🌐).

Nurturing the web from 1999, this blog from 2019.

More about me and how to contact me:

2 thoughts on “Be careful when using generative artificial intelligence to produce code”

  1. I used generative AI, specifically ChatGPT, to write code and create small tools. While the code output is not fully accessible, I need to be specific with my requirements in the prompts and then correct certain aspects of the code. However, it has significantly helped speed up the process of writing usable and accessible code, with only about 20% needing manual adjustments.

    I agree that generative AI is not yet fully capable of creating accessible code simply by asking it to ‘create a contact form.’ We need to be specific about the requirements.

    Thanks for this wonderful write-up!

    1. Thank you for sharing your experience.

      I like using it for grunt tasks and boilerplate, often with ok results (besides poor accessibility).
      Randomness of quality (in general, not just accessibility) causes me to re-do a lot of things a lot of times, with specific prompts, and that is the scary part.

      My biggest concerns are that even when being very prescriptive about accessibility (conform to these WCAG success criteria, or even deeper) it often fails to deliver,
      even with multiple repetitions. Feels like we have to have luck to get quality and that is very unfortunate.

      Over-confidence is dangerous as well – when people trust it without understanding, but I think that we are now very aware of this.

      Overall – a superb tool that makes us work faster, but needs 100% human supervision, where human is knowledgeable enough to detect issues.

Comments are closed.