The debate over AI rages on, and I find myself caring less and less as the tug
of war between the sides, one saying that AI is a threat to humanity and the
other side saying that AI can do lots of amazing stuff, and definitely couldn’t
take our jobs, becomes more fierce. No, AI cannot take our jobs. Rich people can take our jobs and
give it to AI, though.
This post isn’t going to be about the rich people that bend AI, and anything
else they can, to their will. This post is about why I use large language
models, especially multimodal ones, and why I find them so useful. A lot of
people without disabilities, particularly those who aren’t blind, probably won’t
understand this. That’s okay. I’m writing this for myself, and for those who
haven’t gotten to use this kind of technology yet.
Text only models
ChatGPT was the first large-language model I used. It introduced me to the idea,
and to the issues of the model. It couldn’t give an accurate list of screen
reader commands. But it could tell me a nice story about a kitten who drinks out
of the sink. From the start, I wondered if I could feed the model images. I
tried with Ascii art, but it wasn’t very good at describing that. I tried with
Braille art, but it wasn’t good at that either. I even tried with an SVG, but it
couldn’t fit the whole thing into the chat box.
I was disappointed, but I kept trying different tasks. It was able to explain
output of some Linux commands, like Top, which doesn’t read well with a screen
reader. It was even able to generate a little Python script that turned a CSV
file into an HTML table.
As ChatGPT improved, I found more uses for it. I could ask it to generate a
description of a video game character, or describe scenes from games or TV
shows. But I still wanted it to describe images.
My Fascination with Images
I’ve always wanted to know what things look like. I’ve been blind since birth,
so I’ve never seen anything. From video games to people to my surroundings, I’ve
always wondered what things look like. I guess that’s a little strange in the
blind community, but I’ve always been a little strange in any community.So many
blind people don’t care what their computer interface looks like, or what
animations are like, or even if there is formatting information in a document or
book. I do. I love learning about what apps look like, or what a website looks
like. I love reading with formatted Braille or speech, and learning about
different animations used in operating systems and apps. I find plain screen
reader speech, without sounds and such, to be boring.
So, when I heard about the Be My Eyes Virtual Volunteer program, I was excited.
I could finally learn what things look like. I could finally learn what apps and
operating systems look like. I could send it pictures of my surroundings, and
get detailed descriptions of them. I could send it pictures of my computer
screen, and understand what’s there and how it’s laid out. I could even send it
pictures from Facebook or Twitter, and get more than a bland description of the
most important parts of the image.
I began trying the app, with saved pictures and screenshots. The AI, GPT4’s
multimodal model, gave excelent descriptions. I finally learned what my old cat
looks like. I learned what app interfaces like discord look like. I sent it
screenshots of video games from Dropbox, and learned what some video game
characters and locations look like.
Now, it’s not always perfect. Sometimes it imagines details that aren’t there.
Sometimes it doesn’t get the text right in an image. If a Large Language Model
is a blurry picture of the web, I’d rather have that than a blank canvas. I’d
rather see a little than not at all. And that’s what these models give me. No,
it’s not real site. I wouldn’t want to wait a good 30 seconds to get a
description of each frame of my life. But it’s something. And it’s something
that I’ve never had before.
Feeding the Beast
A lot of people will say that these models just harvest our data. They do. A lot
of people will then say that I shouldn’t be feeding their Twitter posts, video
games, interfaces, comic books, and book covers into the models. My only
response to that is that if all these things were accessible to me, I wouldn’t
have to feed them to the models. So if you don’t want your pictures in
OpenAI’s next batch of training data, add descriptions to them. If you don’t
want your video game pictures used in the next GPT model, make your game
accessible. If you don’t want your book covers used in the next GPT model, add a
description to them. That’s just all there is to it. I’m not giving up this new
ability to understand visual stuff.
So this is in response to This blog post regarding Mr. Beast’s Blindness video, which shows the perspective of a person that still has some remaining vision. I, however, have none. I am completely blind. I wanted to write a response that shows my perspective on the video about the possibility of regaining sight. I speak for myself, not for anyone else in the blind community or culture. I’ll also talk about the need of a Blind culture, and the cheapening of such culture by these types of videos.
The video, titled 1000 Blind people See for the First Time was hard for me to watch, in several ways. Firstly, the title is a lie. I’ll go through that later. Second, a lot of it is visual in nature, with very little description. Third, it was not described, using any kind of audio description. Audio description is where a video has another audio track that plays alongside the original, where a person is describing the events that happen during the video. Not just the text, or the main idea as in this video, but the people, places, actions in the video. Sometimes, Youtube videos have “separate but equal” versions with descriptions, but I tried searching for it, but could not find it, using my admittedly slower Mac with Safari and VoiceOver. Still, if this video was about blind people, it should be suitable for blind people as well. It reminds me of the accessibility overlay companies, which will gladly post on social media images without descriptions.
The Big Lies
Let’s pick apart the title of the video. I’m not going to concern myself with 1000. That seems cruel to me, just picking an arbitrary number like that, but let’s move on. “Blind people.” Were these people blind? No, they weren’t. In the culture I live in, they would be called “low vision” or “visually impaired.” Mr. Beast cured people with cataracts. In the worst circumstance, maybe that could mean blind. But in the video, there were people that had been blind for four months. Imagine having a disability for four months. Comparing that to my life of absolutely no vision is very harmful, and cheapens the lives of those who are actually blind, and makes me feel as if I’m not even worthy to be called blind, or that my experiences are absolutely worthless, that my work is worthless, and that my life is meaningless. Let’s move on to the next part. “For the First Time.” Again, a lot of these people could see before. Maybe they couldn’t see perfectly, but their eyes could perceive enough to live mostly normal lives. They could see people’s faces, with glasses. In fact, one of the people in the video said something like “Well I don’t need these anymore.” I don’t know for sure, because again the video wasn’t described, but I’m pretty sure she was referring to glasses. You know what? If I could see well, even enough to not need a screen reader, I’d happily, happily take glasses over what I have now. If I needed glasses and a screen magnifier to see a computer or television screen, I’d take that in a heartbeat.
The cheapening of Blind Culture
The next time you meet a Deaf person, I want you to ask them how they would respond if a person with some hearing loss approached the Deaf person, and told them that they, the person with hearing loss, was also Deaf. Now, I don’t know any Deaf people personally, and am not Deaf myself. But I’m pretty sure it wouldn’t go over well, to say the least. The reason I think this is that I never hear any hard of hearing people call themselves Deaf. Why? Because their experience is not the same as a Deaf person. The same applies to Blind people, if indeed we want to be a stronger culture. We must be allowed to have our own words, our own experiences, our own culture. To do otherwise is to weaken and cheapen the bond shared by all totally blind people.
If, instead, we allow videos like this to claim our words, our terms, and our experiences, we’ll need to retreat to where Autistic people are now, having to call themselves “Actually Autistic.” Why? Because the broader culture claimed their word and their way of finding and talking about one another, and explaining themselves. So now they have to use another hashtag to show who they actually are. And honestly, we have so little power as it is. If we allow the term “blind” to be used for those who have usable vision, the general population will then think that there are no people that can’t see at all, and when we tell them that we cannot see, they’ll think we’re lying. This already happens sometimes to me. And it’s yet another slap in the face.
And here we get to the biggest problem with this video: its effect on the general population. Now, if people don’t look any further, which many conservatives have proven that they will not, people will think that blindness is curable in 10 minutes, which it’s not. There are so many causes of blindness, and so many do not have cures. A lot of blind or low vision people do not want to be cured, fine with the life they live or the vision that they do have. And now, sighted people have a video to point to. “Dude it’s just ten minutes,” they’ll say. “Are you so anti-vaxxxxx that you won’t even stop being a burden on society?” Parents will guilt-trip their kids. Husbands will guilt-trip their wives. And for what? A cure that will more than likely not apply to them.
And what of people, like me, who want to even see as low vision people do? What of people who would give much for a cure, are shown this video and asked “Hey dude, check your mail! Maybe you were one of the 1000! Maybe you won!” Just another slap in the face. An undescribed video, an arbitrary number, a single cure, for those who are not actually blind.
Another aspect of this is that we do want to experience the world. Why else would we want pictures described, or to be able to watch videos or television, or play video games? Yes, we have our own culture. We have our audio books, text to speech voices, audio games, and screen readers. But we want to know the sighted culture too. Sure, some may not want to see, enjoying who they are. Others beg their god or science to be able to see . But whatever way we experience the world, either through sight, visual interpretation, or reading books about the world, we love experiencing it. Even those who do not want to see the world still live in it. But this video, with its outright lies and false hope and capitalistic choosing of just 1000 people, doesn’t help anyone. From the low vision people that were “left behind,” to the blind people for whom there is no cure, to the general public who will now have another excuse to shun us, it does more harm, I think, than good.