r/LocalLLaMA • 100일 전

Gemma 4 26B-A4B GGUF 벤치마크 분석

IMP

7/10

핵심 요약

Unsloth이 Gemma 4 26B-A4B 및 Qwen3.6 모델의 GGUF 양자화(Quantization) 버전별 성능을 비교하는 벤치마크를 발표했습니다. KL Divergence 지표를 통해 원본 모델의 정확도를 얼마나 잘 보존하는지 분석한 결과, Unsloth의 GGUF 포맷이 22개 중 21개 크기에서 최고 성능을 기록하며 압도적인 우위를 점했습니다. 또한 기존 Q6_K 및 MLX 4-bit 양자화 방식의 정확도를 개선하고, 16GB VRAM 환경에 맞춘 새로운 UD-IQ4_NL_XL 포맷을 추가로 제공합니다.

번역된 본문

r/LocalLLaMA 여러분, 여러분이 최적의 양자화(Quantization) 버전을 선택하실 수 있도록 다양한 제공자별로 Gemma 4 26B-A4B GGUF 모델의 KL 발산(KL Divergence) 벤치마크를 진행했습니다.

평균 KL 발산 지표에서 Unsloth GGUF 버전들이 파레토 최전선(Pareto frontier)에 거의 다 포진해 있었습니다.
KLD는 양자화된 모델이 원본 BF16 모델의 출력 분포를 얼마나 잘 따르는지를 보여주며, 이는 정확도가 잘 유지되었음을 의미합니다.
이 결과로 인해 Unsloth는 22개 크기 중 21개에서 최고 성능을 기록했습니다. 99.9% KLD 및 기타 지표에서도 비슷한 추세를 보였습니다.
또한 Q6_K 양자화 버전을 더 동적으로 업데이트했습니다. 이전에도 최적화되어 있었지만 이제 조금 더 개선되었습니다. 굳이 다시 다운로드할 필요는 없지만, 약간 더 나은 버전을 원하신다면 받으시면 됩니다. 기존 양자화 버전도 완벽히 괜찮지만 새 버전이 용량이 약간 더 큽니다. 같은 작업이 Qwen3.6 모델에도 적용되었습니다.
16GB VRAM에 맞춘 새로운 UD-IQ4_NL_XL 양자화 버전도 새롭게 선보입니다. UD-IQ4_NL_XL(14.6GB)은 UD-IQ4_XS(13.4GB)와 UD-Q4_K_S(16.4GB) 사이에 위치합니다. 이 역시 Qwen3.6 모델에 동일하게 적용되었습니다.

모바일 환경에서 이미지가 압축되어 보이므로, 고화질(HQ) 그래프 버전은 다음 링크에서 확인해 주세요: Gemma 4 벤치마크 및 Qwen3.6 벤치마크

MLX 양자화 버전 역시 더 나은 레이어 선택을 통해 더 동적으로 업데이트했습니다(MLX 자체의 한계점은 존재합니다). 자세한 내용은 링크를 참조해 주세요.

MLX 지표	UD-4bit (이전)	UD-4bit (최신)	MLX 4.4bit MSQ
퍼플렉서티(Perplexity)	4.772	4.766	4.864
평균 KLD	0.0177	0.0163	0.0878
99.9% KLD	0.8901	0.8398	2.9597
디스크 용량	21.4 GB	21.6 GB	21.2 GB

Gemma 4 GGUF: https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF

Qwen3.6 GGUF: https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF

원문 보기

원문 보기 (영어)

Hey r/LocalLLaMA we conducted KL Divergence benchmarks for Gemma 4 26B-A4B GGUFs across providers to help you pick the best quant. * Mean KL Divergence puts nearly all **Unsloth GGUFs on the Pareto frontier** * KLD shows how well a quantized model matches the original BF16 output distribution, indicating retained accuracy. * This makes Unsloth the **top-performing in 21 of 22 sizes.** Similar trend for 99.9% KLD and others. * We also updated our Q6\_K quants to be more dynamic. Previously, they were optimized, just now they're a bit better - no need to re-download though - it's up to you if you want a slightly better version. The previous quant was perfectly fine but this one is slightly bigger. The same was done for Qwen3.6. * We're also introducing a new UD-IQ4\_NL\_XL quant that fits in 16GB VRAM. UD-IQ4\_NL\_XL (14.6GB) sits between UD-IQ4\_XS (13.4GB) and UD-Q4\_K\_S (16.4GB). The same was done for Qwen3.6. For HQ versions of the graphs as Reddit mobile compresses it. See: [Gemma 4 Benchmarks](https://unsloth.ai/docs/models/gemma-4#unsloth-gguf-benchmarks) and [Qwen3.6 Benchmarks](https://unsloth.ai/docs/models/qwen3.6#unsloth-gguf-benchmarks) We also updated our MLX quants to be more dynamic with better layering selection (there are limitations due to MLX): [See here](https://unsloth.ai/docs/models/qwen3.6#mlx-dynamic-quants) |MLX Metrics|**UD-4bit (Old)**|**UD-4bit (New)**|**MLX 4.4bit MSQ**| |:-|:-|:-|:-| |Perplexity|4.772|**4.766**|4.864| |Mean KLD|0.0177|**0.0163**|0.0878| |99.9% KLD|0.8901|**0.8398**|2.9597| |Disk Sze|21.4 GB|21.6 GB|21.2 GB| Gemma 4 GGUFs: [https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF](https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF) Qwen3.6 GGUFs: [https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF](https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF)

로컬-LLM 양자화 Gemma-4 Qwen3.6 벤치마크