Recently I’ve been seeing entirely new creators in my YouTube feed. New video thumbnails with a tiny marking in the corner that says “auto-dubbed.” Oh, neat! This means I can see much more content from around the world—I like that idea.
That is, I liked it until I clicked on one of these videos. While it’s a great idea, the execution leaves something to be desired.
The Promise of Autodubbing
YouTube’s new AI-powered dubbing feature is the culmination of several technologies that have reached the point where something like this is possible. Advanced AI can now transcribe spoken words in pretty much any language with a high-degree of accuracy. AI technology can also synthesize a human voice based on that text which is essentially indistinguishable from a real human voice. Even more impressive, you can clone the voice of the speaker in the original language, so it sounds like they are speaking the target language in their own voice.
Combine this with the massive amount of compute power at YouTube, and you get auto-dubbing. There’s a whole world of non-English YouTube content out there, and, of course, a huge part of the world that doesn’t speak English. So an auto-dubbing feature like this has the potential of unlocking all of YouTube for anyone. I’ve long observed on X that people make use of AI-powered dubbing tools for videos that are posted, so clearly there’s demand for this, but how well does it work?
The Quality Issues Are Real
First, I have to note that YouTube seems pretty aware of all the shortcomings with this feature at the moment. If you look at the official help document for automatic dubbing, it lists nearly all of them.
One big issue I have is how the dub replaces pretty much the entire audio track. So any sound that’s not spoken dialog goes away. It does come back here and there in segments with no speech, but for the most part these dubbed videos sound empty. I know it’s possible to erase the original voice from the video without removing the rest of the sound, so I hope we’ll see a version of auto-dubbing that can restore the original audio mix to these videos. Heck, I wouldn’t mind some sort of AI that can just fix the audio mix in YouTube videos, because a lot of YouTubers don’t know how to mix their audio.
The other problem is a little more subtle, and it’s also acknowledged by YouTube. The dubbed voice is lifeless, and doesn’t seem to try and match the tone or energy of the original voice. In this way, it sounds pretty much like the sort of live dubbing you get at the United Nations, with the translator sounding pretty disinterested. YouTube is obviously actively working on this, but the dubs I’ve heard don’t show this yet.
There’s no attempt to match the audio to the speaker’s mouth movements as far as I can tell, which is a good thing, but it does make it odd that the dubs sound like the type that is trying to match mouth movement. Adding in things like “right?” and “you know?”, or “OK? Of course, for all I know, those were all on the original audio track. But since I don’t speak the original language, I can’t know. All I know is the dubs sound awkward because of it.
Besides this, there are all the artifacts you get from raw AI-generated voices. Such as mispronouncing abbreviations like “MB” as “em-bee” instead of “megabyte.” I can’t blame YouTube for this in particular, though, without humans tweaking the AI, text-to-voice always has these issues in my experience.
The Conflict Between Access and Accuracy
These quality issues are annoying, but they can be improved over time. Ultimately, it’s an issue of refinement for the technology. What’s not so easy to solve is trust in the actual translation. Machine translation has come a long, long way since the early days of reading gadget manuals in broken English. However, even expert human translators find it a tricky job and make mistakes.
The bottom line is that even when I watch an auto-dubbed video, I have no idea whether I can trust the information in that video or not. This is no different from subtitles, of course, but dubs are more popular than subs, even as this grouchy anime fan has to admit. So the scope for misinformation is much higher thanks to this feature, especially if YouTube has taken it as a green light to push videos into the feeds of people who speak a different language from the source material.
I would hate to be taken to task for one of my own YouTube videos for something I didn’t actually say! I’ve written before that trust is the one thing AI can’t solve by throwing more money and tech at the problem, and this might be a prime example.
How YouTube Could Make It Work
I like the auto-dubbing feature, and I think it has a lot of potential, but there are a few things I think YouTube could implement right away that would make it better.
First, I’d like to see an easy up-front way to toggle off auto-dubs showing up in my feed or in searches. Make it as easy as filtering out YouTube Shorts, for example. The second thing I’d like to see is some sort of AI confidence score in the translation, and perhaps more importantly, human ratings from people who speak both languages.
Having human ratings on the quality of the dub would boost trust immensely. Letting those same people give specific feedback the AI can use would be a double-win. So far, I give the whole auto-dubbing feature a C. Not bad, but could be better.