Voicery makes synthesized voices sound extra like people – TechCrunch
Developments in AI expertise have paved the way in which for breakthroughs in speech recognition, pure language processing and machine translation. A brand new startup referred to as Voicery now needs to leverage those self same developments to enhance speech synthesis, too. The result’s a quick, versatile speech engine that sounds extra human — and fewer like a robotic. Its machine voices can then be used wherever a synthesized voice is required — together with in new functions, like robotically generated audiobooks or podcasts, voice-overs, TV dubs and elsewhere.
Earlier than beginning Voicery, co-founder Andrew Gibiansky labored at Baidu Analysis, the place he led the deep studying speech synthesis group.
Whereas there, the group developed cutting-edge methods within the area of machine studying, revealed papers on speech constructed from deep neural networks and synthetic speech technology and commercialized its expertise in production-quality programs for Baidu.
Now, Gibiansky is bringing that very same ability set to Voicery, the place he’s joined by co-founder Bobby Ullman, who beforehand labored at Palantir on databases and scalable programs.
“In the time that I was at Baidu, what became very evident is that the revolution in deep learning and machine learning was about to happen to speech synthesis,” explains Gibiansky. “Previously 5 years, we’ve seen that these new methods have introduced an superb features in pc imaginative and prescient, speech recognition and in different industries — however it hasn’t but occurred with synthesizing human speech. We noticed that if we might use this new expertise to construct speech synthesis engines, we might do it so significantly better than all the things that at the moment exists.”
Particularly, the corporate is leveraging newer deep studying applied sciences to create higher synthesized voices extra rapidly than earlier than.
In truth, the founders constructed Voicery’s speech synthesis engine in simply two-and-half months.
Not like conventional voice synthesizing options, the place a single individual data hours upon hours of speech that’s then used to create the brand new voice, Voicery trains its system on a whole lot of voices without delay.
It can also use various quantities of speech enter from anyone individual. Due to how a lot knowledge it takes in, the system sounds extra human because it learns the proper pronunciations, inflections and accents from a greater variety of supply voices.
The corporate claims its voices are almost indistinguishable from people — it even revealed a quiz on its web site that asks guests to see if they will determine which of them are synthesized and that are actual. I discovered that you just’re nonetheless capable of determine the voices as machines, however they’re significantly better than the machine reader voices you might be used to.
After all, given the fast tempo of expertise improvement on this area — to not point out the truth that the group constructed their system in a matter of months — one has to marvel why the main gamers in voice computing couldn’t simply do one thing related with their very own in-house engineering groups.
Nevertheless, Gibiansky says that Voicery has the benefit of being the primary out of the gate with its expertise that capitalizes on the machine studying developments.
“None of the currently published research is quite good enough for what we wanted to do, so we had to extend that a fair bit,” he notes. “Now we have several voices that are ready, and we’re starting to find customers to partner with.”
Voicery already has just a few clients piloting the expertise, however nothing to announce at the moment as these talks are in numerous phases.
The corporate is charging clients an upfront charge to develop a brand new voice for a buyer, after which fees a per-usage charge.
The expertise can be utilized the place voice programs exist right this moment, like in translation apps, GPS navigation apps, voice assistant apps or display readers, for instance. However the group additionally sees the potential for it to open up new markets, given the convenience of making synthesized voices that actually sound like folks. This contains issues like synthesizing podcasts, studying the information (assume: Alexa’s “Flash Briefing”), TV dub-ins, voices for characters in video video games and extra.
“We can move into spaces that fundamentally haven’t been using the technology because it hasn’t been high enough quality. And we have some interest from companies that are looking to do this,” says Gibiansky.
Voicery, based mostly in San Francisco, is bootstrapped save for the funding it acquired by collaborating in Y Combinator’s Winter 2018 class. It’s trying to increase extra funds after YC’s Demo Day.