r/LocalLLaMA • 114일 전

젬마 4, 압도적 가성비로 오픈소스 모델 리더보드 흔들어

IMP

8/10

핵심 요약

구글의 오픈소스 모델인 Gemma 4(31B)가 푸드트럭 경영이라는 복잡한 비즈니스 시뮬레이션 벤치마크에서 GPT-5.2 등 고가의 상용 모델들을 압도적인 가성비로 제치며 파란을 일으켰습니다. 한 번 실행에 단 0.20달러라는 저렴한 비용으로 최고 수준의 투자 수익률(ROI)을 달성하며 에이전트(Agent) 워크플로우에서 새로운 최적의 선택지로 떠올랐습니다.

번역된 본문

우리의 벤치마크에서 Gemma 4(31B)를 테스트해 보았습니다. 솔직히 이런 결과는 전혀 예상하지 못했습니다.

100% 생존율, 5번의 실행 모두 수익성을 달성했으며, 중간 투자 수익률(ROI)은 무려 +1,144%를 기록했습니다. 실행당 비용은 단 0.20달러입니다.

실행당 4.43달러인 GPT-5.2, 2.95달러인 Gemini 3 Pro, 7.90달러인 Sonnet 4.6보다 뛰어난 성능을 보여주며, 우리가 테스트한 모든 중국 오픈소스 모델들—Qwen 3.5 397B, Qwen 3.5 9B, DeepSeek V3.2, GLM-5—을 완전히 압도했습니다. 이 모델들은 심지어 시뮬레이션을 무사히 마치지도(생존하지도) 못했습니다.

Gemma 4를 이기는 유일한 모델은 실행당 36달러인 Opus 4.6뿐입니다. 무려 180배나 비쌉니다.

310억 개의 파라미터(Parameters). 단 20센트. 우리는 설정(Config), 프롬프트(Prompt), 모델 ID를 다시 확인했습니다. 리더보드에 있는 다른 모든 모델과 완전히 동일한 조건입니다. 같은 시드(Seed), 같은 도구(Tools), 같은 시뮬레이션 환경입니다. 그저 그만큼 모델이 뛰어날 뿐입니다.

에이전트 워크플로우(Agentic workflows)에 사용해 보시기를 강력히 추천합니다. 지금까지 22개의 모델을 테스트했으며, 이것은 우리가 본 것 중 단연코 최고의 가성비(Cost-to-performance ratio)입니다.

차트 및 일별 분석이 포함된 전체 리포트: foodtruckbench.com/blog/gemma-4-31b

*푸드트럭 벤치(FoodTruck Bench)는 AI 비즈니스 시뮬레이션 벤치마크입니다. 에이전트가 30일 동안 푸드트럭을 운영하며 위치, 메뉴, 가격 책정, 직원 관리 및 재고 관리에 대한 의사 결정을 내립니다. 리더보드는 foodtruckbench.com 에서 확인할 수 있습니다.

원문 보기

원문 보기 (영어)

Tested Gemma 4 (31B) on our benchmark. Genuinely did not expect this. 100% survival, 5 out of 5 runs profitable, +1,144% median ROI. At $0.20 per run. It outperforms GPT-5.2 ($4.43/run), Gemini 3 Pro ($2.95/run), Sonnet 4.6 ($7.90/run), and absolutely destroys every Chinese open-source model we've tested — Qwen 3.5 397B, Qwen 3.5 9B, DeepSeek V3.2, GLM-5. None of them even survive consistently. The only model that beats Gemma 4 is Opus 4.6 at $36 per run. That's 180× more expensive. 31 billion parameters. Twenty cents. We double-checked the config, the prompt, the model ID — everything is identical to every other model on the leaderboard. Same seed, same tools, same simulation. It's just this good. Strongly recommend trying it for your agentic workflows. We've tested 22 models so far and this is by far the best cost-to-performance ratio we've ever seen. Full breakdown with charts and day-by-day analysis: [foodtruckbench.com/blog/gemma-4-31b](https://foodtruckbench.com/blog/gemma-4-31b) *FoodTruck Bench is an AI business simulation benchmark — the agent runs a food truck for 30 days, making decisions about location, menu, pricing, staff, and inventory. Leaderboard at* [*foodtruckbench.com*](https://foodtruckbench.com)

오픈소스 젬마4 벤치마크 에이전트 가성비