Meta Showcases Recent ‘Voicebox’ Speech-to-Text Translation Tool

On the surface at the least, Meta’s latest AI advancement doesn’t seem to be a significant step.

Today, Meta has published an outline of its latest ‘Voicebox’ AI system, which is able to enable users to translate text to audio, in a variety of styles and voices.

Introducing Voicebox, a brand new breakthrough generative speech system based on Flow Matching, a brand new method proposed by Meta AI. It will probably synthesize speech across six languages, perform noise removal, edit content, transfer audio style & more.

More details on this work & examples ⬇️

— Meta AI (@MetaAI) June 16, 2023

As presented on this overview clip, the Voicebox system can take text inputs and translate them into audio, with different voice options, enabling more advanced text-to-audio translation, but with reduced learning and processing requirements than other, similar offerings.

Though, on the surface at the least, it’s not a heap different from the text-to-audio tools that we’re now accustomed to – whether we like them or not – on TikTok and other apps.

The Voicebox translations sound pretty similar – and I’m willing to bet Meta won’t let me use the voice of Rocket Raccoon or a Transformer in these latest translations.

However the Voicebox system can be greater than only a direct text-to-speech translation tool.

As explained by Meta:

“Voicebox can produce prime quality audio clips and edit pre-recorded audio – like removing automobile horns or a dog barking – all while preserving the content and sort of the audio. The model can be multilingual and may produce speech in six languages. In the long run, multipurpose generative AI models like Voicebox could give natural-sounding voices to virtual assistants and non-player-characters within the metaverse. They might allow visually impaired people to listen to written messages from friends read by AI of their voices, give creators latest tools to simply create and edit audio tracks for videos, and far more.”

As Meta notes, Voicebox also lets you use models of voice for translation, so you need to use an audio clip of one other person with a view to make your text-to-speech translation sound like that person is speaking, via just seconds of audio input.

Which is able to undoubtedly result in a brand new raft of deepfakes – though again, similar tools do exist already. They’re just not the identical, and Meta says not pretty much as good, as this latest process.

The true good thing about Voicebox, in a broad-reaching sense, will probably be in translation, and enabling simplified, native-sounding variations of your text inputs in numerous languages. That might open up latest, cross-market opportunities, while the advanced modeling of the system will even facilitate broader use cases and process, which could provide other key advantages.

But Meta can be aware of the risks.

At this stage, Meta isn’t releasing the source code or app to the general public, citing ‘the potential risks of misuse’. It’s hoping to search out more practical, worthwhile use cases for the technology over time – so its announcement today is more of an FYI than a launch, as such.

You may read more about Meta’s Voicebox project here.

Blog

Meta Showcases Recent ‘Voicebox’ Speech-to-Text Translation Tool

info

info

Login