메뉴
HN
Hacker News 17일 전

대부분 LLM이 작성한 Rust 기반 RAR 구현체

IMP
7/10
핵심 요약

개발자가 OpenAI Codex와 Claude Opus를 활용하여 약 5주 만에 모든 버전을 지원하는 Rust 기반 RAR 압축기를 구현한 프로젝트입니다. 사양서조차 제대로 존재하지 않던 폐쇄적인 포맷을 리버스 엔지니어링과 LLM의 조력으로 해결해냈다는 점에서 코딩 에이전트의 가능성과 한계를 동시에 보여줍니다.

번역된 본문

🦀 Rust로 작성된 rars에 대해, LLM으로 몇 가지 리버스 엔지니어링 프로젝트를 해보고 이번엔 이 기계들(clankers)의 한계를 시험해볼 때가 됐다고 생각했습니다. 모든 버전의 RAR을 지원하는 압축기를 만드는 건 약 5년은 걸릴 법한 일이라 아무도 시도하지 않았죠. 하지만 지금은 OpenAI Codex 5.5와 Claude Opus 4.7을 돌리며 밤과 주말을 보낸 5주면 충분했고, 토큰 비용도 (대폭 할인받아) 약 40파운드 정도 들었습니다. 네, 5만 5천 줄짜리 쓰레기 같은 코드(slop)고, 그렇게 빠르지도 않으며, OpenAI에서 정지를 먹을 뻔하기도 했습니다. 하지만 작동합니다.

SPECIF~1.RAR RAR은 원래 DOS용 LZSS 압축기로, 와레즈(warez) 씬의 표준 포맷으로 인기를 끌었습니다. WinZip과 기능 경쟁을 벌이며 WinRAR은 다중 볼륨 지원, 복구 레코드, 내부 VM까지 내세웠지만, 핵심 차별점은 항상 압축률이었습니다. 중년의 포맷이 되었어도 성장을 멈추지 않아, 지금은 집만큼이나 거대합니다. unrar은 소스코드가 제공되지만 실제로는 자유 라이선스가 아니며, 아이러니하게도 RAR의 작성자 Eugene Roshal은 소프트웨어 불법 복제를 좋아하지 않습니다. 그래서 이상적으로는 사양서(spec)부터 직접 구현해야 했는데, 그런 건 사실상 존재하지 않았습니다.

사양서를 만드는 끔찍한 작업은 야생에 떠도는 무료 압축 해제 소스—unar, libarchive, UNRARLIB—와 각종 웹페이지, 구전 지식들에서 코드를 긁어모으는 것이었습니다. 그런 다음 Claude에게 가능한 한 많이 문서화하도록 시켰습니다. 매 패스마다 빠진 기능에 대해 퀴즈를 내고, 알기 어려운 내용을 담은 ‘갭 문서(gaps doc)’를 계속 유지했습니다. 이 문서는 컨텍스트가 리셋될 때마다 유지되었고, 그래야 토큰을 빈틈에 흘려보낼 수 있었습니다. 2주간 끓이며, 리더(reader) 쪽의 대부분을 문서화할 때까지 작업했습니다. 하지만 라이터(writer) 쪽은 여전히 날조와 추측의 혼합물이었습니다.

그래서 DOS와 Windows용 RAR 바이너리를 잡고 테스트 픽스처를 만들고 헥스 덤프를 뜨며 Ghidra와 DOSBox-x에서 패스를 돌려 패킹 방식을 파악하기 시작했습니다. 1~2주를 더 들이자 갭들이 좁혀지기 시작했습니다. 드디어 쓸만한 게 생겼습니다. 모든 버전의 RAR 파일 포맷에 대한 사양서입니다: 📚 spec

뭔가를 만들어내기 충분히 자신만만해서 시작한 끝에, Codex, Claude 그리고 저는 (아슬아슬하게) 호환되는 Rust CLI를 만들기 시작했습니다. 워크플로는 대략 이랬습니다:

  • 사양서 기반으로 작업
  • Opus는 훌륭하지만, 큰 그림을 놓친 채 열정적으로 코드를 쏟아내는 경향이 있습니다. Claude는 보충 작업, 리팩터링과 가까이서 제어가 필요하지만 전략과 아키텍처 대화에 탁월합니다.
  • Gippity(Codex) 5.5는 내버려 두면 목표에 집중하지만, 너무 많이 대화하면 토끼굴(rabbit hole)에 빠뜨립니다. 사양서 문서를 주면 그냥 알아서 하라고 할 수 있었습니다. 아주 상쾌한 경험이었죠.

사양서 기반으로 작업하던 중, Codex는 가끔 사이버 정책 위반으로 멈췄고 제가 수동으로 압축(compact)해서 계속해야 했습니다. 결국 이게 멈추려면 OpenAI 인증을 받아야 했습니다. 알고 보니 사양서 조사 중 Claude가 인증 검증(유료 기능)을 이해할 필요가 있었는데, 리버스 엔지니어링 도구로 가득 찬 컨텍스트에서 WinRAR을 크랙하고 제품 등록을 우회한 다음, 자신의 범죄를 사양서에 성실히 문서화한 겁니다. 문서를 보는 즉시 OpenAI의 알람이 울려 작업이 중단됐습니다. 저는 이걸 git 기록에서 날려버리고 그 기능은 아예 구현하지 않기로 했습니다.

브레이크에 한 발 봇들을 계속 주시해야 하며, 냄새가 나면 개입해야 합니다. 그러지 않으면 모든 문제와 테스트를 특수 케이스로 우회하게 되고, 추한 패턴이 코드에 퍼지며 나중에 비싼 리팩터링 비용을 치르게 됩니다. 저는 더 많이 했어야 했지만 그러지 못했고, 그 대가를 나중에 치렀습니다. 토큰은 보조금을 받았지만 제 시간은 낭비였습니다. 지난 15개월간 제 취미는 Claude에게 소리치는 것이었고, 개입에는 꽤 능해졌습니다. 성격이 망가졌어도 즐깁니다. Codex에게는 욕을 훨씬 덜 하는 편인데, 아마도

원문 보기
원문 보기 (영어)
🦀 rars in Rust, bro I’ve done a few different reverse-engineering projects with LLMs, and figured it’s time to push the clankers to their limits. A RAR compressor for every version of RAR ought to have taken about 5 years, which is why nobody has ever bothered. Today, it takes 5 weeks of evenings and weekends, clanking OpenAI Codex 5.5 and Claude Opus 4.7, and cost roughly £40 in (heavily subsidised) tokens. Yes it’s 55k lines of slop, no it’s not that fast, and it almost earned me an OpenAI ban. But it works. SPECIF~1.RAR RAR was originally an LZSS compressor for DOS, which peaked in popularity as the warez scene’s format of choice. Fighting with WinZip for feature parity and supremacy, WinRAR boasted multi-volume support, recovery records and even an internal VM, but its USP was always superior compression. It’s a middle-aged format that never stopped growing up, it’s as big as a house. unrar comes with source code but that code is not actually free, and somewhat ironically RAR’s author Eugene Roshal isn’t a big fan of piracy. So ideally I’d need to implement my version from spec, which doesn’t really exist. The monstrous task of creating one involved pulling code from free decompressor sources in the wild - unar, libarchive, UNRARLIB, plus random web pages and folk lore. I then set Claude to work documenting as much as it could. After each pass, I quizzed it on missing features and maintained an ongoing gaps doc containing the hard-to-know stuff. This persisted between context resets, which were needed to flow the tokens into the gaps. It took 2 weeks of cooking, going back and forth until we had most of the reader side documented. The writer side, however, remained a mix of confabulation and conjecture. So next I grabbed the RAR binaries for DOS and Windows, and set to work making test fixtures, hex-dumping and doing passes in Ghidra and DOSBox-x to get some idea of how they were packed. Another week or two of work and the gaps started to close up. Now I had something that might be useful; spec docs for every version of the RAR file format: 📚 spec Building something Being confidently wrong enough to start, Codex, Claude and I set off building a (precariously) compatible Rust CLI. The workflow was shaped something like this: Working from spec Opus is great, but it tends to enthusiastically generate code while missing the bigger picture. Claude requires remedial passes, refactoring and a short leash, but is great for a chat about strategy or architecture. Gippity 5.5 stays on target when left alone, but will rabbit hole you hard if you chat with it too much. I could give Codex the spec docs and basically tell it to just get on with it. Very refreshing. While working from spec, Codex would randomly stop due to cyber violations, and I’d need to manually compact to continue. Eventually I had to get verified by OpenAI to stop it from happening. Well, it turned out that at some time during spec investigation, Claude needed to understand authenticity verification which is a paid feature. With a context full of reverse engineering tools it cracked WinRAR and bypassed product registration, then dutifully documented its crimes in the spec. The docs, when viewed, triggered OpenAI’s alarms and stopped it dead in its tracks. I squashed this out of the git history, and decided not to implement the feature at all. One foot on the brake You’ve gotta keep an eye on the bots and interrupt when things start to smell bad. If you don’t then they’ll special-case their way out of every problem and around every test, ugly patterns will propagate through your code, and you’ll need an expensive refactor later. I should have done more, but I didn’t, and for that I paid the price later on. The tokens were subsidised, but it was a waste of my time. For the last 15 months or so my hobby has been shouting at Claude, so I’m getting good at interventions. I enjoy it, even if it has damaged my personality. I tend to swear at Codex far less, maybe because it’s faster or less of a grinning idiot, but probably because it’s bland and professional. This may be a good thing, but I’m not sure yet. Tests, for science Way too many tests. Fragile tests, coverage that doesn’t matter, excessively_long_test_names_that_fill_your_screen , these are vital when working on something this size. They provide a statistical mass that warps text generation, pulling the bots back on track when they go off-piste or try to cut corners. So, reams of unit tests and as much coverage as is possible please. We can always remove them later, right? Right?… So the tests keep it in shape, but actually running the code is what aligns it with reality. So the real work is about fixtures, oracles, and updating the spec where wrong. In doing this, Codex cleared up autofill bullshit (“hallucinations”) that had previously passed at least ten rounds of review. So it turns out that empirically grinding against reality is the best source of signal, and with enough time the spec was honed into something close to Truth. Science. Cross cutting context Periodically, I had Claude generate a full review of the code. This helps nudge the codebase away from an intolerable slop and more towards a tolerable one, which is the best we can hope for in May 2026. The problem with review agents though, is enthusiasm. They generate laser focused nitpickings that quibble over things that don’t matter, so you need a filter. My filter is, I .gitignore a review.md and instruct codex to group reviews into batches. I then add them to a plan.md by functional area, and the plan drives development tasks. Claude being aware of previous reviews invites compounding blind spots, telling Codex which things I don’t care about pollutes the context - as all text does. Selective context management, switching between sessions, rm’ing the review doc, and running on different machines provides variety that helps the work flow into all the gaps. After a while I had Claude act as a UAT tester, orchestrate compatibility test suites and test against archives in the wild too, producing fresh review.md’s that fed into the plan by the usual route. It was a serial process, but not too much of a bottleneck. A first release By the time I reached RAR 2.9 I was getting a bit bored of reading machine-generated spew. So I switched to some other projects for a bit. Having a working CLI for version 1.3 and 1.4 of RAR, which very few tools can even open, I regenerated this into a new dir and pushed it up to crates.io. I figured it’d be useful for archivists, even if I gave up. So here it is: 🦀 oldrar 🐱 source 🏠 home Scoring a /goal I picked it back up late last week when OpenAI released the /goal feature. This is essentially a Ralph loop that allows the bot to grind on at a task indefinitely, compacting and picking up after filling its meagre context limit. Running only one session means it doesn’t even hit the 5 hour usage limits, so it ran multiple times for 6+ hours while transcribing the rest of the spec into code, and once for a solid 16 hours before I interrupted it and demanded a refactor. It smashed through the bulk of the work this way, flood-filling around 40,000 lines, doing recovery records, encryption, multi-volume support and tons of spec work that I’m still barely aware of. While Codex worked, Claude and I found more RAR files and set up a compression benchmarking and a compatibility regression test suite. Giving Codex the task of optimizing compression worked surprisingly well, it was able to apply well-known techniques from other compressors to optimize LZSS to around 5-10% worse than WinRAR, and beat RAR on some of our test data. WinRAR was optimized for decades by a skilled and obsessive Russian hacker, I used the median of the distribution with brute force and ignorance. Coming so close feels like a huge win given the effort involved. And since I don’t want to read the code too much, it’ll have to do. Performance was a different story. Codex is happy u