TechCrunch AI • 69일 전

스테이빌리티 AI, 6분 길이 음악 생성 모델 공개

IMP

7/10

핵심 요약

스테이빌리티 AI가 최대 6분 20초의 전문가급 음악을 생성할 수 있는 새로운 오디오 모델 'Stability Audio 3.0' 4종을 발표했습니다. 소형 및 중형 모델은 오픈 가중치(Open weights)로 공개되어 누구나 활용할 수 있으며, 대형 모델은 유료 API를 통해서만 제공됩니다. 이번 모델들은 완전히 라이선스를 획득한 데이터로 학습되어 저작권 분쟁 리스크를 줄였다는 점, 그리고 전문 뮤지션을 위한 신규 제품 개발 및 업계 전문가 영입을 진행 중이라는 점에서 업계에 중요한 의미를 갖습니다.

번역된 본문

스테이블 디퓨전(Stable Diffusion)으로 잘 알려진 스테이빌리티 AI(Stability AI)가 'Stability Audio 3.0'이라는 새로운 오디오 모델 패밀리를 공개했습니다. 이 회사에 따르면 최상위 모델은 6분 이상의 전문가급 음악을 생성할 수 있습니다.

이번에 발표된 Stability Audio 3.0은 ▲소형 SFX(4억 5,900만 파라미터) ▲소형(4억 5,900만 파라미터) ▲중형(14억 파라미터) ▲대형(27억 파라미터) 등 총 4개의 모델로 구성되어 있습니다. 두 개의 소형 모델은 최대 2분 길이의 기기 내장형(On-device) 사운드 및 음악 생성에 적합합니다. 중형 및 대형 모델은 음악적 구조와 멜로디 톤을 유지하면서도 6분 20초 길이의 풀 곡을 만들어낼 수 있습니다. 이는 2024년에 발표된 Stable Audio 2.0이 생성할 수 있었던 길이의 두 배 이상에 해당합니다.

스테이빌리티 AI는 소형 SFX, 소형, 중형 모델에 대해 누구나 사용하고 수정할 수 있는 오픈 가중치(Open weights)를 공개했습니다. 2024년에 최대 47초의 음악 생성만 가능했던 'Stable Audio Open'을 출시했던 것과 비교하면, 새롭게 선보이는 모델 패밀리는 이전 오픈 버전들보다 큰 도약을 이뤄냈습니다. 대형 모델은 유료 API 및 자체 호스팅(Self-hosting) 유료 서비스를 통해서만 이용할 수 있습니다. 또한 연 매출 100만 달러(약 13억 원) 이상의 기업은 엔터프라이즈 라이선스를 별도로 취득해야 합니다.

구글(Google)과 일레븐랩스(ElevenLabs)를 비롯한 여러 기업들이 음악 생성 중심의 모델 및 툴을 출시하며 경쟁하고 있습니다. 그러나 Suno와 Udio의 진행 중인 소송전이 증명하듯, 데이터 라이선스 확보와 음악 레이블과의 파트너십은 이러한 서비스가 장기적으로 생존하기 위한 핵심 요소가 될 것입니다. 스테이빌리티 AI는 작년에 워너 뮤직 그룹(Warner Music Group) 및 유니버설 뮤직 그룹(Universal Music Group)과 모델 및 음악 창작 툴 개발을 위한 계약을 체결했습니다. 이 회사는 최신 오디오 모델들이 모두 정식으로 라이선스를 획득한 데이터를 바탕으로 구축되었다고 밝혔습니다.

이 AI 스타트업은 현재 전문 뮤지션을 위한 새로운 제품군을 개발 중이지만, 구체적인 기능에 대한 자세한 내용은 공개하지 않았습니다. 유니버설 오디오(Universal Audio)와 펜더(Fender)의 전 최고 디지털 책임자(CDO)였던 에단 카플란(Ethan Kaplan)이 합류해 스테이빌리티의 프로페셔널 뮤직 제품 총괄을 맡게 됩니다. 여러 AI 기업들이 음악 업계 임원 영입을 통해 자사의 전문성을 높이려 하고 있습니다. 올해 초, Suno는 전 멀린(Merlin) CEO 제레미 시로타(Jeremy Sirota)를 최고 상업 책임자(CCO)로 영입했습니다. 일레븐랩스 역시 인디 음악 퍼블리셔인 코발트(Kobalt) 출신의 데릭 쿠르누아(Derek Cournoyer)를 음악 비즈니스 전략 리더로 영입했습니다.

원문 보기

원문 보기 (영어)

Stability AI, the company behind Stable Diffusion, is releasing a new family of audio models, called Stability Audio 3.0. The top model can generate professional-grade music of more than six minutes long, the company claimed. The company is releasing four new models under the Stable Audio 3.0 name: small SFX (459M parameters), small (459M parameters), medium (1.4B parameters), and large (2.7B parameters). The duo of small models is suitable for on-device sound and music generation of up to two minutes. Both medium and large models can create full compositions of 6 minutes 20 seconds long that can maintain musical structure and melodic tone. This is more than double the length of what Stable Audio 2.0, released in 2024, was capable of generating. Stability AI is making small SFX, small, and medium models available with open weights for anyone to use and modify. In 2024, the company released Stable Audio Open , which allowed for music generation of up to 47 seconds. The new family of models is a big step up from the previous open versions. The large model is available only through the API and self-hosting paid services. Plus, companies with more than $1 million in revenue would need to get an enterprise license. Many companies, including Google and ElevenLabs , are releasing models and tooling around music generation. However, as Suno and Udio's ongoing court battles have proved, licensing of data and partnerships with music labels could become a key part of the long-term survival of these services. Last year, Stability AI inked deals with Warner Music Group and Universal Music Group to develop models and music creation tools. The company said that its latest set of audio models is built on fully licensed data. The AI startup is developing a new suite of products for professional musicians, but didn't give more details on its features. Ethan Kaplan, former chief digital officer at Universal Audio and Fender, is joining the company to lead Stability's professional music offering. A number of AI companies are trying to bolster their credentials by hiring music execs. Earlier this year, Suno hired former Merlin CEO Jeremy Sirota as chief commercial officer . ElevenLabs has also hired Derek Cournoyer from indie music publisher Kobalt as a strategy lead for its music business . Topics AI , audio creation , Music , Stability AI When you purchase through links in our articles, we may earn a small commission . This doesn’t affect our editorial independence. Ivan Mehta Ivan covers global consumer tech developments at TechCrunch. He is based out of India and has previously worked at publications including Huffington Post and The Next Web. You can contact or verify outreach from Ivan by emailing im@ivanmehta.com or via encrypted message at ivan.42 on Signal. View Bio May 27 Athens, Greece StrictlyVC Athens is up next. Hear unfiltered insights straight from Europe’s tech leaders and connect with the people shaping what’s ahead. Lock in your spot before it’s gone. REGISTER NOW Most Popular Google Search as you know it is over Sarah Perez Elon Musk has lost his lawsuit against Sam Altman and OpenAI Tim Fernholz Users turn to jailbreaking their older Kindles as Amazon ends support Lauren Forristal OpenAI launches ChatGPT for personal finance, will let you connect bank accounts Ivan Mehta US orders travelers on Air Force One to throw away gifts, pins, and burner phones after China trip Lorenzo Franceschi-Bicchierai OpenAI is reportedly preparing legal action against Apple; it wouldn't be the first partner to feel burned Connie Loizos How to turn off Instagram's new Instants feature and retract photos you accidentally shared Aisha Malik