Meta has released AudioCraft, a new set of AI tools to generate what the tech giant claims is “high-quality, realistic audio and music from text” — for example, producing a music sequence based on the text phrase “electronic Jamaican reggae DJ set.”
“Imagine a professional musician being able to explore new compositions without having to play a single note on an instrument,” Meta says in a blog post about AudioCraft. “Or a small business owner adding a soundtrack to their latest video ad on Instagram with ease.”
AudioCraft consists of three models: MusicGen (for music), AudioGen (for sound effects) and EnCodec (a decoder). MusicGen was trained on roughly 400,000 recordings along with text description and metadata, amounting to 20,000 hours of music owned by Meta or licensed specifically for this purpose, according to the tech giant. “Music tracks are more complex than environmental sounds, and generating coherent samples on the long-term structure is especially important when creating novel musical pieces,” the company says.
“With even more controls, we think MusicGen can turn into a new type of instrument — just like synthesizers when they first appeared,” the company said in the blog post.
Meta shared a clip of what music generated by MusicGen sounds like:
Meanwhile, Meta said that AudioGen was trained on “public sound effects” and can generate environmental sounds and sound effects like a dog barking, cars honking or footsteps on a wooden floor. The company released what it said is an improved version of the EnCodec decoder, “which allows higher-quality music generation with fewer artifacts.”
The company is releasing the AudioCraft models as open-source code, explaining that the goal is to give “researchers and practitioners access so they can train their own models with their own datasets for the first time, and help advance the field of AI-generated audio and music.”
Meta acknowledged that the datasets used to train the AudioCraft models lack diversity — in particular, the music dataset used “contains a larger portion of Western-style music” and is limited to audio-text pairs with text and metadata written in English. “By sharing the code for AudioCraft, we hope other researchers can more easily test new approaches to limit or eliminate potential bias in and misuse of generative models,” the company said.
- Robot Reggae? Meta Releases AI Music Generator That Creates Generic-Sounding Compositions Based on Text Prompts
- Check all news and articles from the latest HOLLYWOOD updates.
- Please Subscribe us at Google News.