r/LocalLLaMA • 72일 전

4B 소형 모델로 벤치마크 87% 달성한 코딩 에이전트 제작기

IMP

7/10

핵심 요약

GPT나 Claude 같은 대형 모델이 아닌, 로컬에서 구동되는 4B(40억) 파라미터 소형 모델에 최적화된 새로운 코딩 에이전트 'SmallCode'가 등장했습니다. 반복적인 코드 개선 루프, 복합 도구(Compound tools) 사용, 토큰 예산 관리 등의 소프트웨어적 기법을 활용해 모델 크기의 한계를 극복하고 높은 작업 성공률을 달성한 것이 핵심입니다. 실무 개발자들은 오프라인 환경이나 보안이 중요한 환경에서 가벼운 오픈소스 도구를 통해 효율적으로 AI 코딩 보조를 받을 수 있다는 점에 주목할 만합니다.

번역된 본문

최근 출시되는 모든 코딩 에이전트(OpenCode, Cursor, Claude Code 등)는 사용자가 GPT-5.4나 Claude Opus 같은 최고 사양 모델을 구동할 것이라고 전제하는 것 같아서 불만이 많았습니다. Gemma나 Qwen 같은 로컬 모델로 이들을 시도해보면 시스템이 제대로 작동하지 않습니다. 잦은 도구 호출 오류, 컨텍스트 초과(Context overflow), 다단계 작업의 붕괴 등의 문제가 발생하곤 했습니다.

그래서 저는 소형 로컬 모델의 사용을 처음부터 염두에 두고 설계된 'SmallCode'를 개발했습니다.

결과: 토큰당 4B(40억) 파라미터만 활성화하는 Gemma 4 모델을 사용하여 100개의 벤치마크 작업 중 87개를 통과했습니다. 14B 모델을 사용하는 OpenCode의 점수가 약 75%인 것과 비교하면, 모델의 크기보다는 에이전트를 통제하는 프레임워크(harness)가 핵심 역할을 한 것입니다.

작동 방식 (소형 모델을 안정적으로 만드는 기법들):

복합 도구 (Compound tools): 모델이 4번의 도구 호출(파일 찾기 → 파일 읽기 → 파일 수정 → 검증)을 연속으로 수행하게 하는 대신, SmallCode는 이 모든 과정을 한 번에 처리하는 단일 도구를 제공합니다. 소형 모델은 3번 이상의 순차적 호출 시 논리적 일관성을 잃기 쉬운데, 이 방식을 통해 오류 발생률을 절반으로 줄였습니다.
개선 루프 (Improvement loop): 모델이 코드를 작성할 때마다 SmallCode가 즉시 컴파일 및 린트(Lint) 검사를 수행합니다. 오류가 발생하면 해당 내용을 자동으로 모델에 다시 피드백합니다. 모델이 처음부터 완벽한 코드를 짤 필요 없이, 제시된 오류만 수정할 수 있으면 됩니다.
실패 시 작업 분해 (Decompose on failure): 모델이 동일한 작업에서 두 번 연속 실패하면, 무작정 재시도하는 대신 문제를 더 작은 단위로 나눕니다. "이 200줄짜리 파일 수정하기"가 "45번째 줄만 수정하기"로 세분화되는 식입니다.
에스컬레이션 (Escalation): 작업 분해에도 실패하고 Claude나 OpenAI의 API 키가 설정되어 있다면, 해당 특정 작업에 한해서만 자동으로 대형 클라우드 모델에 처리를 넘깁니다. 작업의 약 95%는 로컬에서, 나머지 5%만 클라우드에서 처리됩니다.
토큰 예산 관리 (Token budgeting): 소형 모델은 보통 32k~256k 크기의 컨텍스트를 가집니다. SmallCode는 파일 전체를 텍스트로 전부 밀어 넣지 않습니다. 내용을 요약, 잘라내기 하여 토큰을 관리하므로, 모델이 중요한 코드의 중간 부분에서 '...' 와 같은 잘림 현상을 겪지 않습니다.
코드 그래프 (Code graph): 단순히 grep으로 코드베이스를 검색하는 대신, 코드를 심볼 그래프(함수, 클래스, 호출 관계 등)로 인덱싱합니다. "인증은 어떻게 작동하나요?"라고 질문하면, 무작위 파일 조각 15개가 아닌 그래프를 따라 관련성 있는 코드만 정확히 반환합니다.

인터페이스 구성: OpenCode나 vim과 같은 전체 화면 터미널 UI, 스크롤 가능한 채팅 창, 단축키 /를 이용한 명령 팔레트, 플러그인 시스템, 세션 간의 지속적인 메모리 기능을 지원합니다.

미지원 기능:

(아직은) LSP(Language Server Protocol) 통합 미지원
(아직은) 멀티 세션 미지원
데스크톱 애플리케이션 미지원
최고 사양 프론티어 모델을 사용하는 Claude Code 등과의 직접적인 경쟁은 목표로 하지 않음

설치 방법:

npm install -g smallcode
cd your-project
smallcode

이후 LM Studio, Ollama 또는 OpenAI와 호환되는 API 엔드포인트를 가리키도록 설정하면 됩니다.

MIT 라이선스가 적용되었으며, 모든 소스코드는 GitHub에서 확인할 수 있습니다: https://github.com/Doorman11991/smallcode

아키텍처나 벤치마크 방법론에 대해 질문이 있으시다면 기꺼이 답변해 드리겠습니다.

원문 보기

원문 보기 (영어)

I was frustrated that every coding agent (OpenCode, Cursor, Claude Code) assumes you're running GPT-5.4 or Claude Opus. If you try them with a local model like Gemma or Qwen they fall apart. I find that often tool calls fail, context overflows, multi-step tasks collapse. So I built SmallCode. It's designed from the ground up for small local models. **The result:** 87/100 benchmark tasks pass with a Gemma 4 model that only activates 4B parameters per token. OpenCode scores \~75% with 14B models. The harness does the heavy lifting, not the model size. **How it works (the tricks that make small models reliable):** * **Compound tools:** Instead of making the model chain 4 tool calls (find file → read file → edit file → verify), SmallCode gives it one tool that does all 4. Small models lose coherence after 3+ sequential calls. This cuts failures in half. * **Improvement loop:** Every time the model writes code, SmallCode instantly compiles/lints it. If it fails, it feeds the errors back automatically. The model doesn't need to be smart enough to get it right first try — it just needs to fix errors when shown them. * **Decompose on failure:** If the model fails the same thing twice, SmallCode stops retrying and instead breaks the problem into smaller pieces. "Fix this 200-line file" becomes "fix line 45 only." * **Escalation:** If even decompose fails and you have a Claude/OpenAI key configured, it auto-escalates to the bigger model for just that one task. You stay local 95% of the time, cloud 5%. * **Token budgeting:** Small models have 32k-256k context. SmallCode never dumps a whole file in. It summarizes, truncates, and manages every token so the model never sees "..." truncation in the middle of important code. * **Code graph:** Instead of grep-searching your codebase, SmallCode indexes your code into a symbol graph (functions, classes, who-calls-what). When you ask "how does auth work," it walks the graph and returns just the relevant connected code — not 15 random file snippets. **What it looks like:** Full-screen terminal UI (like OpenCode/vim), scrollable chat, command palette with `/`, plugin system, persistent memory across sessions. **What it doesn't do:** * No LSP integration (yet) * No multi-session (yet) * No desktop app * Doesn't compete with Claude Code for frontier model users **Install:** npm install -g smallcode cd your-project smallcode Point it at LM Studio, Ollama, or any OpenAI-compatible endpoint. MIT licensed, everything's on GitHub: [https://github.com/Doorman11991/smallcode](https://github.com/Doorman11991/smallcode) Happy to answer questions about the architecture or benchmark methodology.

로컬 AI 코딩 에이전트 오픈소스 소형 언어 모델