Being able to make Alexa say anything you want with Home Assistant has always been one of my favorite features. But I quickly tired of her default voice and decided to replace it with a far more soothing Irish voice through Home Assistant’s cloud text-to-speech (TTS) service. Recently, I started to wonder if I could replace that voice with an even better one—my own.
I’ve been playing around with ElevenLabs for a while
ElevenLabs is a software company that offers a range of AI voice tools. At its heart is a TTS service that will turn any text into natural-sounding speech in a wide variety of voices. It’s possible to give cues to change the emotional tone of the speech to make it sound exactly how you want it to. The results can be genuinely impressive.
Another feature offered by ElevenLabs is voice cloning. Using as little as ten seconds of audio, it’s possible to create your own custom voice that you use with the TTS engine. Once you’ve cloned a voice, you can get it to say whatever you want just by typing out the text that you want to say.
There are also some other useful features, including a tool that can generate music purely based on text prompts. ElevenLabs has a free tier, but with significant limitations on the features and on how much text you can convert to speech. I pay $5 a month for the Starter plan, which gives me enough credits to generate up to an hour of speech each month.
You won’t be able to reproduce this method without a paid ElevenLabs account. The free version of ElevenLabs doesn’t let you clone voices or use custom voices for text-to-speech.
Using my own voice in my smart home
I’ve cloned my voice before using Apple’s Personal Voice feature. Unfortunately, you can’t use this voice in any meaningful way. It’s not possible to replace Siri’s voice with your own using Personal Voice, for example.
I knew that I could clone my voice using ElevenLabs, so I gave it a try to see if it would sound good enough to use. I asked an AI chatbot to generate a two-minute script for me that would capture different tones of voice and all of the key phonetic sounds.
I clicked the “Create or Clone a Voice” button in ElevenLabs, selected “Instant Voice Clone,” and recorded myself reading the script in 30-second chunks. After each recording, an icon indicated if there was enough audio to create a good clone. I kept recording until the green circle was full; it took six 30-second recordings.
Once I clicked “Next,” the voice was created in just a few seconds. I tested it out by typing some text, and the results were good; it sounded remarkably close to my own voice. It wasn’t a perfect clone; the odd word would sound a little different from how I would say it, but most of the time, the speech was scarily similar.
All I needed was a way for Home Assistant to generate speech using that cloned voice, and I’d be able to use my own voice to make announcements through my Echo smart speakers around the house.
The ElevenLabs integration makes generating speech simple
The beauty of Home Assistant is that whatever you want to do, you can be almost certain that someone else has tried it first and created an integration to make it simple to do. That was exactly the case; there is an ElevenLabs integration that you can use to generate text-to-speech through ElevenLabs using any of your saved voices; you just need the API key for your account, and the Voice ID for the voice you want to use.
After installing the integration, I tested the feature out on my Apple HomePod mini, as I’ve had fewer issues using this device for TTS announcements in the past. I found that I could get Home Assistant to say anything in my voice through my HomePod mini by calling the “tts.speak” action with ElevenLabs as the target, my HomePod as the media player, the text I wanted to hear as the message, and the voice ID of my custom voice in the options.
Getting my Echo devices to play my voice was the hardest part
Now that I could get my voice to play on my HomePod mini, I was sure I had cracked it. I changed the media player to one of my Echo speakers exposed by the Alexa Media Player integration and tried again. Unfortunately, instead of hearing my voice, I got a message in Alexa’s standard voice saying, “I’m having trouble accessing your Simon Says skill right now.”
I spent a long time trying to fix this problem, with little success. This is a common issue with the Alexa Media Player integration, as the Echo devices don’t like audio unless it’s in a specific format. I just couldn’t seem to get it to work.
Then, like most tech problems, I realized that there was a potentially simple solution that I should have tried hours earlier. I changed the target media player from the one exposed by the Alexa Media Player integration to the one exposed by Music Assistant. For whatever reason, this worked perfectly. I can now say anything I want in my own voice through all of my smart speakers. For example, my trash day announcement that fires whenever anyone first enters the kitchen on a Friday morning now tells me which type of waste I need to put out that day, in my own voice.
Using my own voice to replace Alexa’s was initially just an experiment, but it works really well and makes announcements feel much more personal. You can use this method to potentially clone any voice within reason, such as other family members.
You should bear in mind that cloning other people’s voices without permission has legal and ethical implications and could result in your ElevenLabs account being suspended. ElevenLabs has the rights to iconic voices such as Judy Garland and John Wayne, but these are intended for commercial use. If you are willing to pay, however, then you could license the official voice of Michael Caine and turn Alexa into your own Alfred Pennyworth from the Batman movies.