The Decoder • 109일 전

구글 '제마 4' 출시, 데이터 유출 없는 온디바이스 AI

IMP

9/10

핵심 요약

구글이 오픈소스 모델인 Gemma 4를 발표했습니다. 이 모델은 텍스트, 이미지, 오디오를 기기 내에서 완벽하게 처리하며, 위키피디아나 지도 같은 외부 도구를 클라우드 없이 자율적으로 사용하는 에이전트 기능을 갖추고 있습니다. 스마트폰용 경량 모델은 RAM 6GB 환경에서도 구동되어 기기 내 AI 활용의 새로운 기준을 제시합니다.

번역된 본문

구글의 새로운 오픈소스 모델인 Gemma 4는 텍스트, 이미지 및 오디오를 기기 내에서 완벽하게 처리합니다. 에이전트 기술(Agent skills)을 활용하여 이 AI는 클라우드 연결 없이도 위키피디아나 인터랙티브 지도와 같은 도구를 독립적으로 사용할 수 있습니다. 모델 실행에 필요한 'Google AI Edge Gallery' 앱은 Android와 iOS에서 무료로 제공됩니다. Gemma 4가 공개된 이후, 이 앱은 iOS 앱 스토어에서 가장 많이 다운로드된 무료 생산성 앱 4위(클로드, 제미나이, 챗GPT 바로 뒤)로 급상승했습니다.

Gemma 4는 구글의 사유 모델인 Gemini 3와 동일한 연구를 기반으로 구축되었으며, 상업적 이용이 자유로운 Apache 2.0 라이선스로 제공됩니다. 구글에 따르면 1세대 출시 이후 Gemma 계열 모델의 다운로드 수는 4억 회를 돌파했습니다. 모든 모델은 140개 이상의 언어로 텍스트, 이미지 및 오디오를 처리할 수 있습니다.

스마트폰부터 서버까지 아우르는 4가지 모델 크기 이번 최신 릴리스는 4가지 버전으로 제공됩니다. E2B와 E4B는 스마트폰 전용으로 제작되었습니다. 여기서 'E'는 '유효 파라미터(Effective parameters)'를 의미하며, 이는 추론 시 실제로 활성화되는 파라미터 수를 나타냅니다. 양자화(Quantized) 적용 시 E2B는 기기 내 약 1.3GB를 차지하며, E4B는 약 2.5GB의 공간이 필요합니다. 더 큰 26B 및 31B 버전은 서버 및 고성능 하드웨어를 대상으로 합니다. 26B 버전은 128개의 전문가(MoE)가 혼합된 아키텍처를 사용하여, 특정 시점에 활성화되는 파라미터는 38억 개에 불과합니다. 밀집형(Dense) 모델인 31B는 최대 256,000 토큰의 컨텍스트 윈도우를 제공합니다.

구글은 Arm, 퀄컴(Qualcomm)과 협력하여 현재의 모바일 칩셋에 맞게 스마트폰용 모델을 최적화했습니다. 구글에 따르면, 안드로이드에서 Gemma 4는 이전 세대보다 최대 4배 빠르게 실행되며 배터리 소모는 최대 60% 줄어듭니다. Arm의 자체 벤치마크는 훨씬 더 큰 성능 향상을 보여줍니다. 기기에 AI 모델의 행렬 연산을 실리콘 수준에서 가속화하는 SME2 명령어 세트를 탑재한 최신 Arm 칩이 있다면 평균 5.5배의 처리 속도 향상을 기대할 수 있습니다.

에이전트 기술, 온디바이스 AI에 도구 사용 기능을 도입하다 이 앱을 실행하려면 Android 12 또는 iOS 17 이상이 필요합니다. 두 가지 스마트폰용 모델은 RAM 요구 사항에서 차이가 납니다. 양자화된 E2B는 약 1.3GB를 사용하며 6GB RAM을 가진 기기에서 실행되고, E4B는 약 2.5GB의 모델 메모리와 최소 8GB의 RAM이 필요합니다. 기본 채팅, 이미지 인식 및 오디오 전사 외에도 이 앱은 구글이 '에이전트 기술'이라고 부르는 기능을 기본적으로 제공합니다. 여기에는 위키피디아 검색, 인터랙티브 지도, 자동 생성 요약 및 플래시카드가 포함됩니다. 또한 Gemma 4는 사진을 설명하고, 음성 입력을 다이어그램 및 시각화로 변환하며, 텍스트 음성 변환(TTS)이나 이미지 생성과 같은 작업을 위해 다른 로컬 모델과 연동할 수도 있습니다. 구글은 동물 울음소리를 묘사하고 재생하는 데모 기술을 통해 이러한 연동 능력을 선보였습니다.

구글에 따르면 이미지 인식 기능도 확실하게 업그레이드되었습니다. 이미지, 다이어그램 또는 손글씨에서 텍스트를 추출하는 OCR 작업이 눈에 띄게 더 나은 결과를 제공합니다. 또한 이 모델은 달력, 알림 및 알람 설정에 중요한 시간 관련 정보를 더욱 안정적으로 처리합니다.

개별적으로 볼 때, 이러한 기능 중 어느 것도 기존 클라우드 제공업체가 이미 제공하는 것과 비교하여 획기적인 신기술은 아닙니다. 그러나 가장 돋보이는 점은...

원문 보기

원문 보기 (영어)

Google's Gemma 4 puts free agentic AI on your phone and no data ever leaves the device Jonathan Kemper View the LinkedIn Profile of Jonathan Kemper Apr 11, 2026 Screenshots by THE DECODER Key Points Google's open-source model Gemma 4 can process text, images, and audio entirely on-device and autonomously use tools like Wikipedia, interactive maps, or QR code generators through built-in agent skills. The smaller smartphone variants E2B and E4B run on devices with just 6 and 8 GB of RAM respectively, deliver up to four times the speed of the previous generation according to Google, and serve as the foundation for the upcoming Gemini Nano 4 on Android. All models are released under the commercially friendly Apache 2.0 license, developers can create and share custom skills via GitHub, and the free "Google AI Edge Gallery" app is available for both Android and iOS. Ask about this article… Search Google's new open-source model, Gemma 4, processes text, images, and audio completely on-device. Using agent skills, the AI can independently tap into tools like Wikipedia or interactive maps, no cloud required. The Google AI Edge Gallery app needed to run the model is free on Android and iOS . Since Gemma 4 dropped, the app has shot up to fourth place among the most-downloaded free productivity apps in the iOS App Store, sitting right behind Claude, Gemini, and ChatGPT. Gemma 4 is built on the same research as Google's proprietary Gemini 3 model but ships under the commercially friendly Apache 2.0 license. Google says the Gemma family has racked up over 400 million downloads since the first generation launched. All models handle text, images, and audio across more than 140 languages. Ad Four model sizes cover everything from phones to servers The latest release comes in four variants. E2B and E4B are built specifically for smartphones. The "E" stands for "effective parameters," meaning the number of parameters actually active during inference. Quantized, E2B takes up about 1.3 GB on-device, while E4B needs roughly 2.5 GB. Ad DEC_D_Incontent-1 The bigger 26B and 31B variants target servers and high-performance hardware. The 26B version uses a mixture-of-experts architecture with 128 experts, so only 3.8 billion parameters are active at any given time. The dense 31B model offers a context window of up to 256,000 tokens. Google also teamed up with Arm and Qualcomm to optimize the phone variants for current mobile chips. According to Google, Gemma 4 on Android runs up to four times faster than the previous generation while cutting battery drain by up to 60 percent. Arm's own benchmarks show even bigger gains: an average 5.5x speedup in processing, provided the device packs a newer Arm chip with the SME2 instruction set, an extension that accelerates matrix math for AI models directly in silicon. Ad Agent skills bring tool use to on-device AI The app requires Android 12 or iOS 17. The two phone-sized variants differ in RAM requirements: E2B uses about 1.3 GB quantized and runs on devices with 6 GB of RAM, while E4B needs around 2.5 GB of model memory and at least 8 GB of RAM. Beyond basic chat, image recognition, and audio transcription, the app ships with what Google calls "agent skills": Wikipedia search, interactive maps, auto-generated summaries, and flashcards. Gemma 4 can also describe photos, turn spoken input into diagrams and visualizations, and even team up with other local models for things like text-to-speech or image generation. Google shows this off with a demo skill that describes and plays animal calls. Ad DEC_D_Incontent-2 Image recognition got a solid upgrade too, according to Google . OCR tasks, pulling text from images, diagrams, or handwriting, now deliver noticeably better results. The model also handles time-related information more reliably, which is important for calendars, reminders, and alarms. Ad Individually, none of these features break new ground compared to what cloud providers already offer. What stands out is that a demo app running a purely local model on a phone can now use these tools on its own. Developers can build custom skills through GitHub and share them with the community. The built-in tools do need an internet connection, but the model itself runs locally, and chats never get saved. Gemma 4 sets the stage for the next Gemini Nano According to Google , Gemma 4 E2B and E4B serve as the foundation for Gemini Nano 4 , the next generation of Android's system-wide on-device model. Code written for Gemma 4 today will work with Gemini Nano 4 out of the box when it ships on new flagship devices later this year. Gemini Nano already runs on over 140 million Android devices, powering features like Smart Replies and audio summaries. Back in December, Google previewed this direction with FunctionGemma, a tiny local model with just 270 million parameters that can route commands to other phone apps. It translates natural language into structured function calls: toggling the flashlight, creating contacts, sending emails, adding calendar entries, pulling up locations on a map, or opening Wi-Fi settings. How much on-device AI matters strategically became clear earlier this year with the billion-dollar deal between Apple and Google. Since January, we've known that the next generation of Apple's Foundation Models will be built on Google's Gemini technology, powering a sweeping Siri upgrade over the course of 2025. AI News Without the Hype – Curated by Humans Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section. Subscribe now Source: Google

온디바이스 AI 구글 제마 4 에이전트 AI 오픈소스 모델 모바일 AI