Google AI Blog • 104일 전

제미나이 3.1 플래시 TTS: 차세대 표현력 넘치는 AI 음성

IMP

8/10

핵심 요약

구글이 텍스트 명령어로 AI의 감정, 말하기 속도 등을 세밀하게 제어할 수 있는 '제미나이 3.1 플래시 TTS'를 공개했습니다. 이 모델은 70개 이상의 언어를 지원하며, 기존 대비 가장 자연스러운 수준의 음성 품질을 제공해 개발자와 기업이 고도화된 음성 애플리케이션을 구축할 수 있게 돕습니다. 오디오 워터마크 기술(SynthID)이 적용되어 AI 생성 음성임을 식별할 수 있어 안전성도 강화되었습니다.

번역된 본문

제미나이 3.1 플래시 TTS: 차세대 표현력 넘치는 AI 음성 2026년 4월 15일 공유: x.com, 페이스북, 링크드인, 메일, 링크 복사

우리의 가장 최신 오디오 모델은 세밀한 오디오 태그(Audio tags)를 도입하여, 표현력이 풍부한 오디오를 생성하기 위해 AI 음성을 정밀하게 통제할 수 있는 권한을 제공합니다. 빌로브 메쉬람(Vilobh Meshram) 수석 제품 관리자 막스 구빈(Max Gubin) 수석 연구 엔지니어 제미나이 팀을 대표하여 작성함

AI 생성 요약 읽기

[일반 요약] 제미나이 3.1 플래시 TTS가 출시되어 향상된 AI 음성 품질과 제어력을 제공합니다. 이제 오디오 태그를 사용하여 70개 이상의 언어로 음성 스타일과 속도를 조절할 수 있습니다. Google AI Studio, Vertex AI, Google Vids에서 사용해 볼 수 있으며, 가짜 뉴스를 방지하기 위해 모든 오디오에는 SynthID 워터마크가 삽입됩니다.

요약은 Google AI에 의해 생성되었습니다. 생성형 AI는 실험 단계입니다.

[핵심 요약]

"제미나이 3.1 플래시 TTS"는 향상된 제어력, 표현력, 품질을 갖춘 새로운 AI 음성 모델입니다.
이 모델은 음성 품질이 향상되어 이전 버전보다 훨씬 더 자연스럽게 들립니다.
오디오 태그를 사용하면 자연어 명령을 통해 음성 스타일, 속도 및 전달 방식을 제어할 수 있습니다.
개발자는 Google AI Studio를 사용하여 음성을 미세 조정하고 일관되게 사용하기 위해 설정을 내보낼 수 있습니다.
제미나이 3.1 플래시 TTS는 70개 이상의 언어를 지원하며, AI 생성 오디오를 식별하기 위해 SynthID 워터마킹을 사용합니다.

요약은 Google AI에 의해 생성되었습니다. 생성형 AI는 실험 단계입니다.

[기초 설명] 제미나이 3.1 플래시 TTS는 컴퓨터 음성을 더욱 실제처럼 들리게 만드는 새로운 AI입니다. 특수 명령어를 사용하여 사람들이 AI가 말하는 방식을 텍스트로 변경할 수 있게 해줍니다. 이 AI는 70개 이상의 언어로 말할 수 있으며 오디오에 숨겨진 워터마크를 추가합니다. 이를 통해 사람들이 실제 사람이 아닌 AI가 생성한 것임을 알 수 있도록 도와줍니다.

요약은 Google AI에 의해 생성되었습니다. 생성형 AI는 실험 단계입니다.

다른 스타일 살펴보기: 일반 요약, 핵심 요약, 기초 설명 공유: x.com, 페이스북, 링크드인, 메일, 링크 복사

(브라우저가 오디오 요소를 지원하지 않습니다.) 기사 읽기 이 콘텐츠는 Google AI에 의해 생성되었습니다. 생성형 AI는 실험 단계입니다. [소요 시간] 분 음성 속도: 0.75X, 1X, 1.5X, 2X

오늘, 우리는 개발자, 기업 및 일반 사용자가 차세대 AI 음성 애플리케이션을 구축할 수 있도록 지원하는 향상된 제어력, 표현력 및 품질을 제공하는 최신 텍스트 음성 변환(TTS) 모델인 제미나이 3.1 플래시 TTS를 소개합니다.

오늘부터 3.1 플래시 TTS가 다음과 같이 출시됩니다:

개발자를 위해: Gemini API 및 Google AI Studio를 통한 프리뷰 제공
기업을 위해: Vertex AI에서의 프리뷰 제공
워크스페이스 사용자를 위해: Google Vids를 통한 제공

향상된 음성 품질과 제어력 우리는 제미나이 3.1 플래시 TTS의 전반적인 음성 품질을 향상시켜, 이번 모델이 역대 가장 자연스럽고 표현력이 뛰어난 모델이 되었습니다. 수천 건의 블라인드 인간 선호도를 반영한 벤치마크인 Artificial Analysis TTS 리더보드에서, 3.1 플래시 TTS는 1,211이라는 인상적인 엘로(Elo) 점수를 기록했습니다. 또한 Artificial Analysis는 고품질 음성 생성과 낮은 비용의 이상적인 조합을 이유로 제미나이 3.1 플래시 TTS를 '가장 매력적인 사분면'에 배치했습니다. 이 모델은 기본 다중 화자 대화, 70개 이상의 언어 지원, 자연어를 통한 세밀한 창의적 제어 기능을 통해 더욱 돋보입니다.

더욱 표현력이 풍부한 음성 생성을 위한 새로운 오디오 태그 3.1 플래시 TTS는 음성 스타일, 속도 및 전달 방식을 제어할 수 있는 직관적인 방법인 오디오 태그(audio tags)를 도입합니다. 자연어 명령을 텍스트 입력에 직접 삽입함으로써, 향상된 세분화 수준으로 AI 음성 출력을 조종할 수 있습니다. 개발자를 '감독의 자리'에 앉히는 구성 가능한 제어 기능을 갖춘 Google AI Studio에서 이러한 오디오 태그와 개발자 경험에 대한 기타 업데이트를 실험해 볼 수 있습니다:

씬 디렉션(Scene direction): 환경을 정의하고 특정 대화 지침을 제공하여 무대를 설정합니다. 이러한 세계관 구축의 맥락은 캐릭터가 캐릭터답게 유지하고 여러 턴에 걸쳐 서로 자연스럽게 반응하도록 돕습니다.
화자 수준의 세부 설정(Speaker-level specificity): 고유한 오디오 프로필(Audio Profiles)을 사용하여 캐릭터를 캐스팅한 다음, 디렉터 노트(Director's Notes)를 지정하여 속도, 톤 및 억양을 조절할 수 있습니다.

원문 보기

원문 보기 (영어)

Gemini 3.1 Flash TTS: the next generation of expressive AI speech Apr 15, 2026 · Share x.com Facebook LinkedIn Mail Copy link Our newest audio model introduces granular audio tags that give you precise control to direct AI speech for expressive audio generation. Vilobh Meshram Senior Product Manager Max Gubin Principal Research Engineer on behalf of the Gemini team Read AI-generated summary General summary Gemini 3.1 Flash TTS is here, giving you improved AI speech quality and control. You can now use audio tags to adjust vocal style and pacing in over 70 languages. Test it out in Google AI Studio, Vertex AI, and Google Vids, and know that all audio is watermarked with SynthID to prevent misinformation. Summaries were generated by Google AI. Generative AI is experimental. Bullet points "Gemini 3.1 Flash TTS" is a new AI speech model with better control, expressiveness, and quality. This model has improved speech quality, making it sound more natural than previous versions. Audio tags let you control vocal style, pace, and delivery using natural language commands. Developers can use Google AI Studio to fine-tune voices and export settings for consistent use. Gemini 3.1 Flash TTS supports 70+ languages and uses SynthID watermarking to identify AI-generated audio. Summaries were generated by Google AI. Generative AI is experimental. Basic explainer Gemini 3.1 Flash TTS is a new AI that makes computer speech sound more real. It lets people change how the AI talks by using special commands in the text. This AI can speak in over 70 languages and adds a hidden watermark to the audio. This helps people know it's AI-generated and not a real person. Summaries were generated by Google AI. Generative AI is experimental. Explore other styles: General summary Bullet points Basic explainer Share x.com Facebook LinkedIn Mail Copy link Your browser does not support the audio element. Listen to article This content is generated by Google AI. Generative AI is experimental [[duration]] minutes Voice Speed Voice Speed 0.75X 1X 1.5X 2X Today, we’re introducing Gemini 3.1 Flash TTS, the latest text-to-speech model that delivers improved controllability, expressivity and quality — empowering developers, enterprises and everyday users to build the next generation of AI-speech applications. Starting today, 3.1 Flash TTS is rolling out: For developers in preview via the Gemini API and Google AI Studio For enterprises in preview on Vertex AI For Workspace users via Google Vids Improved speech quality and controllability We’ve improved the overall speech quality of Gemini 3.1 Flash TTS, making it our most natural and expressive model to date. On the Artificial Analysis TTS leaderboard , a benchmark that captures thousands of blind human preferences, 3.1 Flash TTS achieved an impressive Elo score of 1,211. Artificial Analysis has also positioned Gemini 3.1 Flash TTS within its “ most attractive quadrant ” for its ideal blend of high-quality speech generation and low cost. The model stands out further with native multi-speaker dialogue, support for 70+ languages, and granular creative control via natural language. New audio tags for more expressive speech generation 3.1 Flash TTS also introduces audio tags — an intuitive way to control vocal style, pace and delivery. By embedding natural language commands directly into the text input, you can steer AI-speech output with improved levels of granularity. You can start experimenting with these audio tags along with other updates to the developer experience in Google AI Studio with configurable controls that place the developer in the “director’s chair”: Scene direction: Set the stage by defining the environment and providing specific dialogue instructions. This world-building context helps characters remain “in-character” and react to one another naturally across multiple turns. Speaker-level specificity: Cast characters using unique Audio Profiles, then specify Director’s Notes to toggle pace, tone and accent. Using inline tags , speakers can pivot from these high-level settings to change expression mid-sentence. Seamless export: Once the performance is perfected, these exact parameters can be exported as Gemini API code to ensure consistent, recognizable voices across various projects and platforms. With these new configurations, developers can enhance precision for specific scenarios, creating memorable characters and immersive audio experiences. Get started with high-fidelity speech generation in the Google AI Studio Playground . Built for global scale Gemini 3.1 Flash TTS delivers high-fidelity speech and more precise control across more than 70 languages. These core optimizations bring advanced style, pacing and accent control to major markets — helping developers create localized, expressive speech experiences for users at global scale. Early developer and enterprise testers are already seeing the impact of 3.1 Flash TTS, highlighting its impressive controllability and expressivity. They’ve told us how audio tags provide a new level of creative precision, transforming simple text into a high-fidelity vocal performance. Watermarked with SynthID All audio generated by Gemini 3.1 Flash TTS is watermarked with SynthID. This imperceptible watermark is interwoven directly into the audio output, allowing the reliable detection of AI-generated content to help prevent misinformation. Get more stories from Google in your inbox. Get more stories from Google in your inbox. Email address Your information will be used in accordance with Google's privacy policy. Subscribe Done. Just one step more. Check your inbox to confirm your subscription. You are already subscribed to our newsletter. You can also subscribe with a different email address . POSTED IN:

제미나이 TTS 음성 AI 구글 API