The Decoder • 69일 전

스테이빌리티 AI, 최대 6분 곡 생성 오픈웨이트 모델 '스테이블 오디오 3.0' 공개

IMP

7/10

핵심 요약

스테이빌리티 AI(Stability AI)가 최대 6분 길이의 음악 트랙을 생성하는 '스테이블 오디오 3.0(Stable Audio 3.0)'을 출시했습니다. 이번 모델은 저작권 문제를 피하기 위해 전적으로 라이선스를 확보한 데이터로만 학습되었으며, 4개 변형 중 3개가 오픈 웨이트(open weights)로 공개되어 실무자들이 자유롭게 활용할 수 있습니다. 기업 고객에게는 법적 책임 보장(indemnification)을 제공하며, 연 매출 100만 달러까지는 상업적 이용이 무료입니다.

번역된 본문

스테이빌리티 AI(Stability AI)는 최대 6분 길이의 음악 트랙을 생성할 수 있고, 이 중 3개 모델이 오픈 웨이트(open weights)로 제공되는 차세대 오디오 모델 '스테이블 오디오 3.0(Stable Audio 3.0)'을 공개했다고 밝혔다. 회사 측에 따르면 이 모델들은 전적으로 라이선스를 획득한 데이터를 사용해 학습되었다.

모델 제품군은 총 4개의 변형(variant)으로 구성되어 있다. '스테이블 오디오 3.0 스몰 SFX(Small SFX)'와 '스테이블 오디오 3.0 스몰(Small)'은 각각 4억 5,900만 개의 파라미터(parameters)를 탑재하고 있으며, H200 GPU 환경에서 0.44초의 추론(inference) 시간만으로 최대 2분 길이의 트랙을 생성해 낸다. 전자는 음향 효과에 초점을 맞춰 스마트폰과 일반 소비자용 노트북에서 작동하도록 설계되었고, 후자는 짧은 음악 조각 제작을 타겟으로 한다. '스테이블 오디오 3.0 미디엄(Medium)'은 14억 개의 파라미터를 구동하여 1.31초 만에 최대 6분 20초 길이의 트랙을 생성한다. 이 세 가지 모델은 모두 허깅페이스(Hugging Face)에서 오픈 웨이트 모델로 사용할 수 있다.

반면 27억 개의 파라미터를 자랑하는 가장 큰 규모의 모델인 '스테이블 오디오 3.0 라지(Large)'는 오픈 웨이트로 제공되지 않는다. 이 모델은 스테이빌리티 AI API, 파트너사인 fal.ai를 통해서만 접근할 수 있거나, 기업 라이선스를 통해 자체 인프라에 호스팅(hosting)하는 방식으로만 이용 가능하다. 스테이빌리티 AI는 이 모델이 가장 뛰어난 음악성을 제공하며, 대규모 생성 작업이 필요한 음악 플랫폼을 위해 제작되었다고 설명했다.

새로운 아키텍처로 더 길고 유연한 오디오 출력 구현 스테이빌리티 AI에 따르면, 스테이블 오디오 3.0은 새로운 아키텍처와 함께 의미-음향(semantic-acoustic) 오토인코더(autoencoder)를 탑재하여 더 길고 유연한 오디오 출력을 가능하게 한다. 생성은 가변 길이로 작동하며 초 단위의 세밀한 제어가 가능하다. 회사 측은 스테이블 오디오 3.0 스몰이 기기 내에서 오프라인으로, 그리고 짧은 샘플 길이 제한 없이 완전한 음악 작곡을 가능하게 하는 유일한 모델이라고 밝혔다. 참고로 기존의 '스테이블 오디오 오픈 스몰(Stable Audio Open Small)'은 11초가 한계였고, '스테이블 오디오 오픈(Stable Audio Open)'은 47초에 그쳤다.

또한 스테이빌리티 AI는 스몰 및 미디엄 모델의 웨이트(weights)와 함께 LoRA(LoRA) 훈련 문서를 배포하여, 사용자들이 자체 오디오 라이브러리를 통해 모델을 파인튜닝(fine-tuning)할 수 있도록 지원한다. 기업 고객에게는 전담 가이드와 함께 파인튜닝 지원이 제공된다. 더불어 이 모델들은 인페인팅(inpainting) 기능도 포함하고 있어, 사용자는 트랙의 개별 세그먼트를 편집하거나 여러 섹션을 한 번에 수정할 수 있으며, 기존 트랙의 원래 끝점을 넘어 확장하는 것(인과적 연속, causal continuation)도 가능하다.

연 매출 100만 달러까지 상업적 사용 무료 스테이빌리티 AI 커뮤니티 라이선스(Community License)에 따라, 사용자는 자신이 생성한 오디오 파일의 소유권을 갖으며 이를 상업적으로 사용할 수 있다. 단, 연간 매출이 100만 달러를 초과하는 조직은 기업 라이선스를 통해 상업적 사용 범위와 법적 책임 보장(indemnification) 혜택을 받기 위해 스테이빌리티 AI와 별도로 계약해야 한다.

스테이빌리티 AI는 경쟁사인 오픈 소스 음악 모델들이 상업적 사용을 제한하거나, 라이선스 없는 데이터로 학습해 법적 위험을 안고 있는 것과 달리 자사의 라이선스 입장이 확고하다고 강조한다. 실제로 이 회사는 유니버설 뮤직 그룹(Universal Music Group) 및 워너 뮤직 그룹(Warner Music Group)과의 파트너십을 통해 이러한 라이선스 기반을 뒷받침하고 있다.

이미지 선구자에서 오디오 전문가로의 변신 스테이빌리티 AI는 한때 '스테이블 디퓨전(Stable Diffusion)'으로 오픈 이미지 생성 분야를 주도했으나, 창립자 에마드 모스타크(Emad Mostaque)의 퇴사와 지속적인 재정난 이후 오디오 분야로 초점을 옮겼다. 2023년 9월 첫 '스테이블 오디오(Stable Audio)' 출시 당시에는 스톡 음악 제공업체인 오디오스파크스(AudioSparx)와의 파트너십을 통해 약 80만 곡의 음악, 음향 효과 및 악기 스니펫(snippet)을 제공받았다. 이후 2024년 4월에 출시된 '스테이블 오디오 2.0(Stable Audio 2.0)'은 최대 3분 길이의 풀 사이즈 44.1 kHz 오디오를 생성할 수 있는 최초의 상업적으로 실현 가능한 AI 음악 도구 중 하나였다.

원문 보기

원문 보기 (영어)

Stability AI launches Stable Audio 3.0 with up to six-minute tracks and open weights Jonathan Kemper View the LinkedIn Profile of Jonathan Kemper May 20, 2026 Stability AI Key Points Stability AI's Stable Audio 3.0 generates music tracks up to six minutes long, trained entirely on licensed data. Three of the four model variants are freely available as open-weights models. The largest remains exclusive to API users and enterprise customers. With licensed training data and legal indemnification for enterprise customers, Stability AI is deliberately distancing itself from competitors currently facing copyright lawsuits. Ask about this article… Search Stability AI has unveiled Stable Audio 3.0, a new generation of audio models - three of which ship with open weights. The models generate music tracks up to six minutes long and were trained entirely on licensed data, according to the company. The model family includes four variants. Stable Audio 3.0 Small SFX and Stable Audio 3.0 Small each pack 459 million parameters and produce tracks up to two minutes long in 0.44 seconds of inference time on an H200 GPU. The first focuses on sound effects and is designed for smartphones and consumer laptops. The second targets short music pieces. Stable Audio 3.0 Medium runs 1.4 billion parameters and generates tracks up to 6:20 minutes in 1.31 seconds. All three are available as open-weights models on Hugging Face . The largest model, Stable Audio 3.0 Large with 2.7 billion parameters, isn't available as open weights. It's only accessible through the Stability AI API, through partner fal.ai, or can be hosted on a company's own infrastructure via enterprise licensing. Stability AI says it delivers the highest musicality and is built for music platforms with high generation volume. Ad New architecture enables longer, more flexible audio output Stable Audio 3.0 runs on a new architecture with a semantic-acoustic autoencoder that allows longer and more flexible audio output, according to Stability AI. Generation works at variable length with second-level control. Ad DEC_D_Incontent-1 Stable Audio 3.0 Small is the only model that enables full music composition on-device - offline and without short sample limits, the company says. For context: Stable Audio Open Small topped out at eleven seconds. Stable Audio Open managed 47 seconds. Stability AI is also releasing LoRA training documentation alongside the Stable Audio 3.0 Small and Medium weights, letting users fine-tune models on their own audio libraries. Enterprise customers get guided fine-tuning support. The models also include inpainting features: users can edit individual segments of a track, modify multiple sections at once, or extend existing tracks beyond their original endpoint (causal continuation). Ad Commercial use is free up to a million dollars in revenue Under the Stability AI Community License, users own the audio files they generate and can use them commercially. Organizations with more than one million dollars in annual revenue need to contact Stability AI for enterprise licensing, which adds commercial coverage and legal indemnification. Stability AI points out that, to its knowledge, competing open music models either restrict commercial use or carry risks from training on unlicensed data. The company backs up its licensing stance with partnerships with Universal Music Group and Warner Music Group. Ad DEC_D_Incontent-2 From image pioneer to audio specialist Stability AI once shaped the open image generation space with Stable Diffusion , but has shifted its focus toward audio since founder Emad Mostaque's departure and ongoing financial struggles. The first Stable Audio launch in September 2023 relied on a partnership with stock music provider AudioSparx, which contributed about 800,000 songs, audio effects, and instrument snippets. Ad Stable Audio 2.0 followed in April 2024 and was one of the first commercially viable AI music tools for full-length 44.1 kHz audio up to three minutes. Stable Audio Open arrived in summer 2024 as an open-source variant for shorter samples. In May 2025, Stability AI teamed up with Arm to release Stable Audio Open Small, a compact text-to-audio model that runs on smartphones. Stable Audio 2.5 from September 2025 targeted professional sound production with multi-part compositions featuring intro, development, and outro sections. Stable Audio 3.0 now marks the shift to a unified architecture that Stability AI says will serve as the foundation for its next generation of licensed professional models. Licensed training data gains weight amid copyright rulings The company's repeated emphasis on licensed training data carries extra weight given recent court decisions. In November 2025, a Munich court found OpenAI liable for copyright infringement because ChatGPT reproduced protected song lyrics from the GEMA catalog in response to simple prompts. The court agreed that training data remains embedded in model weights and can be retrieved - a phenomenon GEMA calls memorization. OpenAI has appealed. The case is now before the Munich Higher Regional Court. Stability AI's promise to work with fully licensed data and to indemnify enterprise customers positions the British company squarely against providers like Suno and Udio, which are facing similar legal battles. A separate GEMA lawsuit against Suno alleges the tool was trained on original recordings from GEMA's catalog and produces near-identical versions. In the US, Suno and Udio face comparable lawsuits from the music industry . With fully licensed training data and legal protection for enterprise customers, Stability AI is deliberately staying clear of that fight. AI News Without the Hype – Curated by Humans Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section. Subscribe now Source: Stability AI (Pressemitteilung)