OpenAI finally launched it’s agent called Operator (opens in new window) that can browse the web for you, not only browse, but also perform tasks for you. After some experience with headless browsers, end to end browser automation and automatic accessibility testing engines I wrote finally because technology for this is actually not very advanced, when we consider they made large language models, at least.
It is not a secret that they use some kind of automation to run the browser that can understand the content and context of the webpage, based on the visuals alone. Basically visit a website, take a screenshot, try to understand what is in the screenshot and then follow the instructions to click an element, write into forms and so on.
Things that people involved in browser automation known from (at least) 2004, when they started with one of the most known tools for it called Selenium (opens in new window). Computer vision is way older as well.
So it makes total sense that a vendor that has the technology to “understand” the context can also automate browsers (and operating systems).
Visual accessibility is essential for computer vision
As mentioned above – besides most important tool that enables “understanding”, we also need automation and computer vision. And all three actually also need at least basic accessibility.
First thing that comes in mind is recognition of elements and with it also color contrast. I’ve done some quick tests with very poor contrast and was quite surprised that ChatGPT managed to “read” text with very very low contrasts (even 1.1:1), but my tests used components like buttons in isolation and we have all experienced websites that sometimes even use white text on white background (remember image backgrounds that can cause that white on white or black on black in some situations?).
So, if a website is not accessible and agent “bot” is so lucky that the automatically moving carousel (or hero video) is just at the “right” position, we may have an issue.
There are of course other things that come to mind, like for example document outline, labels and so on, but I am guessing that icon-only buttons and links will potentially have a huge negative impact on the efficiency of AI agents. The more innovative the icons, the worse the results, I guess. Until AI “learns” enough, at least.
It also seems that CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) – one of the most problematic parts of the web – is now more of a burden for us people than it is for the computers. Perhaps we can start calling it Completely Agitating Pointless Test for Common Human Agitation instead of the original version, as agents will apparently have no trouble identifying as humans from now on.
As per current information agents don’t check the code behind and only rely on the visuals, so everything visual is also important to them. Will be interesting to see how they progress with time, but it is quite clear that they will have a major role in helping people out, even as a testing tools.