r/LocalLLaMA • 110일 전

Gemma 4 MTP 리버스 엔지니어링 분석

IMP

7/10

핵심 요약

구글의 경량화 라이브러리를 이용해 Gemma 4 모델에서 Multi-Token Prediction(MTP) 초안 모델을 추출하는 방법을 공유하는 게시글입니다. 작성자는 C++ 소스 코드와 TFLite 그래프를 분석해 MTP 구조를 파악해야 한다며 커뮤니티에 리버스 엔지니어링을 요청했습니다.

번역된 본문

Gemma 4 E4B MTP 추출 시도

이 모델은 https://github.com/google-ai-edge/LiteRT-LM 의 litertlm_peek_main CLI를 사용하여 추출되었습니다.

복원 방법: Git 저장소를 복제하고 해당 디렉토리로 이동합니다. git clone https://github.com/google-ai-edge/LiteRT-LM.git cd LiteRT-LM/ git fetch --tags

gcc 및 기타 컴파일러를 설치합니다. sudo apt update && sudo apt install -y clang build-essential

참고: Bazel을 설치해야 합니다. https://bazel.build/install

추출기 CLI를 실행합니다. bazel run //schema/py:litertlm_peek_main -- --litertlm_file=/path/to/gemma-4-E4B-it.litertlm --dump_files_dir=/path/to/extracted_gemma4

단서 C++에 대한 지식이 많고 저보다 똑똑한 누군가가 이 파일들을 리버스 엔지니어링하여 MTP가 어떻게 실행되는지 파악해 주시면 감사하겠습니다: https://github.com/google-ai-edge/LiteRT-LM/blob/cdb7e4bc31bf01b000eba5d2599337ada5e4945c/runtime/executor/llm_litert_mtp_drafter.h https://github.com/google-ai-edge/LiteRT-LM/blob/cdb7e4bc31bf01b000eba5d2599337ada5e4945c/runtime/executor/llm_litert_mtp_drafter.cc

이 파일에서 엔드투엔드(e2e) 초안 생성(drafting)을 위해 호출되는 것으로 보입니다: https://github.com/google-ai-edge/LiteRT-LM/blob/cdb7e4bc31bf01b000eba5d2599337ada5e4945c/runtime/executor/llm_litert_compiled_model_executor.cc

이를 테스트하는 파일도 있습니다: https://github.com/google-ai-edge/LiteRT-LM/blob/cdb7e4bc31bf01b000eba5d2599337ada5e4945c/runtime/executor/llm_litert_compiled_model_executor_test.cc#L435

Google AI Edge Model Explorer를 활용하는 것도 매우 좋은 방법입니다. https://github.com/google-ai-edge/model-explorer

이 모델 탐색기를 사용하면 Section11_TFLiteModel_tf_lite_mtp_drafter.tflite를 시각화할 수 있으며, 이는 다음과 같은 거대한 그래프로 나타납니다: 이 그래프를 JSON으로 추출했으며, 여기에서 확인할 수 있습니다. 누군가 GPT나 Claude 등을 사용하여 이 그래프를 리버스 엔지니어링하고 깔끔한 Pytorch 파일로 출력할 수 있을지도 모르겠네요????

ChatGPT Pro 5.4 Extended thinking이 기본적으로 뱉어낸 결과물은 다음과 같습니다: https://chatgpt.com/share/69d8d08a-c458-838f-9b6d-e72d2956dede

지난 달 다운로드 - 추론 제공자(Inference Providers) 새로운 소식 이 모델은 어떠한 추론 제공자에 의해서도 배포되지 않았습니다. 🙋 제공자에게 지원을 요청하세요

원문 보기

원문 보기 (영어)

Gemma 4 E4B MTP Extraction Effort How to Replicate Model extracted with the litertlm_peek_main CLI from https://github.com/google-ai-edge/LiteRT-LM To replicate: Git clone the repo and enter the directory git clone https://github.com/google-ai-edge/LiteRT-LM.git cd LiteRT-LM/ git fetch --tags Install gcc and other compilers sudo apt update && sudo apt install -y clang build-essential Note: You need to install Bazel https://bazel.build/install Run the extractor CLI bazel run //schema/py:litertlm_peek_main -- --litertlm_file=/path/to/gemma-4-E4B-it.litertlm --dump_files_dir=/path/to/extracted_gemma4 Clues Someone smarter than me with more knowledge in C++ should please reverse engineer these files, so we can figure out how the MTP runs: https://github.com/google-ai-edge/LiteRT-LM/blob/cdb7e4bc31bf01b000eba5d2599337ada5e4945c/runtime/executor/llm_litert_mtp_drafter.h https://github.com/google-ai-edge/LiteRT-LM/blob/cdb7e4bc31bf01b000eba5d2599337ada5e4945c/runtime/executor/llm_litert_mtp_drafter.cc It looks like it's called here for e2e drafting: https://github.com/google-ai-edge/LiteRT-LM/blob/cdb7e4bc31bf01b000eba5d2599337ada5e4945c/runtime/executor/llm_litert_compiled_model_executor.cc There's this file that tests it: https://github.com/google-ai-edge/LiteRT-LM/blob/cdb7e4bc31bf01b000eba5d2599337ada5e4945c/runtime/executor/llm_litert_compiled_model_executor_test.cc#L435 A very good idea is to utilize the Google AI Edge Model Explorer https://github.com/google-ai-edge/model-explorer In this model explorer you can visualize Section11_TFLiteModel_tf_lite_mtp_drafter.tflite Which will show up as this huge graph: I have extracted the graph as a JSON, which can be found here Maybe someone can reverse engineer this graph with GPT or Claude or something and output a clean Pytorch file???? Baseline of what ChatGPT Pro 5.4 Extended thinking spat out: https://chatgpt.com/share/69d8d08a-c458-838f-9b6d-e72d2956dede Downloads last month - Inference Providers NEW This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

gemma mtp 리버스-엔지니어링 오픈소스 모델-분석