PDF is less accessible than HTML

Note: This post is older than two years. It may still be totally valid, but things change and technology moves fast. Code based posts may be especially prone to changes...

(Read 654 times)

HTML semantics and assistive technologies support is way better than PDF’s. If you are a MAC user that needs to use a screen-reader you may be forced to experience the missing semantics of even most accessible PDF’s. And maybe it is time to move more PDF documents to HTML?

After learning more about creating accessible documents I came to a conclusion that Portable Document Format (PDF) is less accessible than Hyper Text Markup Language. This may seem a bold statement at first but after some hands-on experience and some research I can only confirm it is correct. There are some different points to be noted about it and I will try to go through them to save you some time. I hope.

HTML semantics is much better

Tagging PDF automatically or manually exposes the differences at once. It may not be a big surprise though. Today’s HTML is used for much more than only documents and therefore also needs much more semantics to support different interactions. For example modal windows, expandable sections and so on. PDF is not very interactive and therefore does not need all the functionality but it is a fact that HTML semantics is way better than PDF’s. It is also less likely that PDF will get all the semantics of HTML as there is no need for it. Existing tagging possibilities enable authors to add quite enough semantics to PDFs but are still limiting them in the creativity. Sometimes headings, paragraphs, lists, links, tables and images may not be enough. For a static document they may do it though. Still content providers that are aware of the need of “tagging” the PDFs have lesser variety of semantics available and are therefore also more limited in their creations. This may even mean that PDF creativity is more limited than HTML one in a way.

Tagged PDF support with assistive technologies on MAC is disappointing

After really trying my best to understand the tagging in PDF I wanted to empirically test the results. Automatic test tools are a good start to discover the obvious errors and give some basic feedback before we test documents manually with assistive technologies like screen-readers and so on, so I started with those.

Tagged PDF comes in multiple variants, and creators can also define which standard will be used. I will not go into details for now but there are multiple ways to provide accessibility into PDF files. Tagging can therefore be made with different standards and there are different available. I tried to achieve PDF/Universal Accessibility (PDF/UA, formally ISO 14289) as it seemed to be most “universally accessible” format. I thought that going for the latest version of it would be best. So called “Matterhorn Protocol 1.1” defines a list of all possible ways to fail PDF/UA-1 and it was a good starting point (opens in new window).

After fixing the obvious semantic problems I tried to provide the correct tags for them and then also test it with automatic tools like PDF accessibility checker (PAC) (opens in new window).

Last step was to test this PDF with assistive technologies and I started on Windows with NVDA and JAWS. It seemed to work fine, I could navigate the PDF almost like a webpage, got all the semantics and it really demonstrated the efforts made in tagging. But then I wanted to test it with VoiceOver on MAC and it was quite a disappointment. It seemed that VoiceOver support for PDFs was extremely – it only read the text but seemed to not announce any of the tags that worked with Jaws and NVDA. I decided to investigate where I went wrong and then learned that this is a common problem for all MAC users.

Conclusion – prefer HTML over PDF to make content accessible

MAC tagged PDF support is a real disappointment and it is beyond my understanding why such giant companies do not do something about it. But nevertheless when comparing the whole process of making things more accessible I would pick HTML over PDF any day. So my advice will remain – pick HTML over PDF if you want better accessibility. Maybe that invoice that your client want to have as a PDF could really be an HTML? The commercial flyer could also be a HTML? Maybe we can even have our contracts in HTML format someday?

Author: Bogdan Cerovac

I am IAAP certified Web Accessibility Specialist (from 2020) and was Google certified Mobile Web Specialist.

Work as digital agency co-owner web developer and accessibility lead.

Sole entrepreneur behind IDEA-lab Cerovac (Inclusion, Diversity, Equity and Accessibility lab) after work. Check out my Accessibility Services if you want me to help your with digital accessibility.

Also head of the expert council at Institute for Digital Accessibility A11Y.si (in Slovenian).

Living and working in Norway (🇳🇴), originally from Slovenia (🇸🇮), loves exploring the globe (🌐).

Nurturing the web from 1999, this blog from 2019.

More about me and how to contact me: