sightupdate

Music

Nvidia's Groundbreaking Generative Audio Model: Fugatto

2024-11-26

Nvidia Corp. has made a significant stride in the realm of artificial intelligence by releasing a generative model designed to create "new" music and audio from human language prompts. This move positions Nvidia alongside industry giants like Meta Platforms Inc., OpenAI, and Runway AI Inc. The new model, Fugatto (for Foundational Generative Audio Transformer Opus 1), holds unique capabilities that set it apart from other models.

Unique Abilities of Fugatto

According to the chipmaker, Fugatto is uniquely able to modify human voices and create "novel sounds" that no other model can produce. It has the ability to absorb and modify existing sounds. For instance, it can listen to a musical segment played on a piano and transform it into notes sung by a human voice or an alternative instrument like a violin. It can also take a human voice recording and alter the accent and mood expressed in the singing. This shows its remarkable flexibility in manipulating and creating different audio elements.Moreover, Fugatto comes with more fine-grained controls for users to edit the soundscapes they create. This gives users greater creative control and allows them to fine-tune the generated audio to their specific needs.

Comparison with Other Models

Nvidia isn't the first company to attempt generative AI music creation. Meta debuted a model called Movie Gen last month, which can create both video and soundscapes. However, Fugatto stands out with its ability to modify human voices and create truly novel sounds. It goes beyond basic prompt engineering and offers users a more comprehensive set of tools for audio creation.

Safety and Copyright Concerns

While Fugatto shows great potential, it also raises concerns. Nvidia has not publicly released the model yet due to safety concerns. Any generative technology carries risks as people might use it to generate things that are not desirable. Additionally, there are potential copyright issues. In June, record labels filed lawsuits against generative AI music startups for "widespread infringement" of copyrighted sound recordings. The relationship between AI and Hollywood is also tense, with some actresses like Scarlett Johansson accusing OpenAI of cloning her voice. Nvidia is mindful of these issues and is still debating how to release the model safely.

Potential Impact on Music Production

Bryan Catanzaro, Nvidia's vice president of applied deep learning research, believes that generative AI has the potential to affect music production in the same way that electronic synthesizers did. If we think about synthetic audio over the past 50 years, music sounds different now because of computers. Generative AI is going to bring new capabilities to music, video games, and ordinary folks who want to create things. It opens up new possibilities for creative expression and allows for more diverse and unique audio creations.In conclusion, Nvidia's Fugatto model represents a significant advancement in generative audio technology. While it faces challenges related to safety and copyright, its potential impact on the music industry and beyond is undeniable. It offers users a powerful tool for creating new and innovative audio experiences.