On the stagnation of screen readers

If you have sight, imagine that in every digital interface, the visuals are beamed directly into your eyes, into the center and peripheral vision, blocking out much else, and demanding your attention. All “visuals” are mostly text, with a few short animations every once in a while, and only on some interfaces. You can’t move it, unless you want to move everything else, like videos and games. You can’t put it in front of you, to give you a little space to think and consider what you’re reading. You can’t put it behind you. You can make it softer, though, but there comes a point where it’s too soft and blurry to see.

Also imagine that there is a form of art that 95% of other humans can produce and consume, but for you is either blank or filled with meaningless letters and numbers ending in .JPEG, .PNG, .BMP, or other computer jargon, and the only way to perceive it is to humbly ask that the image is converted to the only form of input your digital interface can understand, straight, plain text. This same majority of people have access to everything digital technology has to offer. You, though, have access to very little in comparison. Your interface cannot interpret anything that isn’t created in a standards-compliant way. And this culture, full of those who need to stand out, doesn’t like standards.

There is, though, a digital interface built by Apple which uses machine learning to try to understand this art, but that’s Apple only, and they love control too much to share that with other interfaces on other company’s systems. And there are open source machine learning models, but the people that could use it are too busy fixing their interface to work with breaks in operating system behaviour and UI bugs to research that. Or you could pay $1099, or $100 per year, for an interface that can describe the art, by sending it to online services of course, and get a tad bit more beauty from the pervasive drab, plain text.

Now, you can lessen the problem of eye strain, blocked out noise, and general information fatigue by using a kind of projector, but other people see it too, and it’s very annoying to those who don’t need this interface, with its bright, glaring lights, moving quickly, dizzyingly fast. It moves in a straight line, hypnotically predictable, but you must keep up, you must understand. Your job relies on it. You rely on it for everything else too. You could save up for one of those expensive interfaces that show things more like print on a page… if the page had only one small line and was rather slow to read, but even that is dull. No font, no true headings, no beauty. Just plain, white on black text, everywhere. Lifeless. Without form and void. Deformed and desolate. Still, it would make reading a little easier, even if it is slower. But you don’t want to be a burden to others or annoy them, and you’ve gotten so used to the close, direct, heavy mode of the less disruptive output that you’re almost great at it. But is that the best for you? Is that all technology can do? Can we not do better?

This is what blind people deal with every day. From the ATM to the desktop workstation, screen readers output mono, flat, immovable, unchanging, boring speech. There is no HRTF for screen readers. Only one can describe images without needing to send them to online services. Only a few more can describe images at all. TalkBack, a mobile screen reader for Android, and ChromeVox, the screen reader on Chromebooks, can’t even detect text in images, let alone describe images. Update: TalkBack can read text and icons now, but not describe images. ChromeVox still can’t do any of that. All of them read from top to bottom, left to right, unless they are told otherwise. And they have to be specifically told about everything, or it’s not there. We can definitely do better than this.