Microsoft

VALL E AI Tool Imitates Voices In Just Seconds

AI tools that create images and texts out of nothing are the topic of the hour, well-known representatives are ChatGPT and DALL-E. Microsoft has a hand in both. Now another AI is added and it is probably the scariest: Because Vall-E imitates voices.

For a long time, artificial intelligence was often little more than an empty buzzword to describe relatively banal machine learning. The latter is still central, but the results are now so impressive that the word intelligence can actually apply. The OpenAI solutions ChatGPT and DALL-E show this only too well.

Microsoft, which is one of the sponsors of OpenAI, also (directly) has its own AI research and has also chosen a name for this that is based on DALL-E: VALL-E. This is an application capable of mimicking voices. The special thing about it is that VALL-E requires a sample of just three seconds in order to believably imitate the human voice or a specific person.

Also for the tone of voice and emotions

As AITopics reports via Windows Central, the tool was trained on 60,000 hours of English language data. A special feature is that the AI ​​voice is able to imitate the tone and emotions of a speaker. In a related study, researchers from Cornell University generated several voices or sentences, which can also be heard via GitHub.

However, the quality varies: Some recordings sound convincing and natural, while others are rather tinny and artificial. However, the main thing to remember here is that the initial situation was a three-second sample. The more you “feed” the AI, the better the result will be, and the AI ​​itself will learn more.

VALL-E is not yet publicly available, so you can’t try out for yourself how well or convincingly the tool works – but maybe that’s a good thing because you can guess what damage such a tool and the associated fakes could do.