Hacker News • 69일 전

AI 코딩 루프를 위한 정형 검증 게이트

IMP

7/10

핵심 요약

AI가 코드를 대량 생성하는 시대에는 프롬프트로 모델의 행동을 통제하는 '행동적 게이트'만으로는 치명적인 보안 버그를 막기 어렵습니다. 이 글은 코드가 원하는 규칙을 위반하면 컴파일이나 테스트 자체가 실패하도록 구조를 짜는 '구조적 게이트(정형 검증)'를 제안하며, 이를 위해 Shen이라는 정적 타입 언어 기반의 'Shen-Backpressure' 도구를 소개합니다.

번역된 본문

가장 심각한 소프트웨어 버그 중 일부는 동시에 가장 지루한 것들이기도 합니다. 사용자는 다른 테넌트(고객)의 데이터를 읽을 수 없어야 합니다. 이에 반대하거나 앨리스가 밥의 기록을 읽는 것을 옹호하기 위해 디자인 리뷰에서 나서는 사람은 아무도 없습니다. 그럼에도 불구하고 깨진 접근 제어는 여전히 OWASP Top 10에서 1위를 차지하는 카테고리입니다. 이러한 버그가 배포되는 이유는 규칙이 시스템의 잘못된 부분에 배치되어 있기 때문입니다. 규칙이 프롬프트나 리뷰 체크리스트, 그리고 모든 미래의 엔지니어와 이제는 매번 실행될 모델调用(invocation)이 그 불변성(invariant)을 기억하고 올바르게 다시 적용할 것이라는 막연한 기대 속에 존재합니다. 이러한 가정은 이미 취약했으며, AI가 대부분의 코드를 생성하는 지금은 완전히 실패하고 맙니다. 눈에 보이는 모든 조치를 취할 수는 있습니다. 규칙을 CLAUDE.md에 넣고, 세심하게 시스템 프롬프트를 작성하고, 에이전트 지침에 "권한 부여는 매우 중요합니다"라고 추가하는 것이죠. 당신은 분명 그 모든 것을 해야 합니다. 하지만 모델이 16,000줄의 코드를 작성한 후에도 여전히 남는 진짜 질문이 있습니다. 코드가 원했던 대로 작동한다는 것을 어떻게 알 수 있을까요? 테스트는 도움이 되지만 경험적인 방법일 뿐입니다. 테스트는 당신과 모델이 작성해야 한다고 기억해낸 사례만 검사할 뿐, 누군가 다음 주에 추가할 핸들러에 대해서는 아무 말도 해주지 못합니다. 저는 다른 지렛대를 당기고 싶습니다. 제 주장을 명확히 말씀드리자면 이렇습니다. 광범위한 프로덕션 소프트웨어의 클래스에서 구조적 역압력(backpressure)은 에이전트 지능의 점진적인 향상을 압도할 것입니다. 기존 모델은 이미 여러분의 코드 중 거의 모든 것을 작성할 수 있습니다. 제한 요소는 모델이 여러분이 원하는 것을 정확히 해냈는지 '알 수 있는가'이며, 이러한 지식은 더 똑똑한 모델을 기다리는 것이 아니라 모델이 코드를 짜는 기반(substrate)에서 비롯됩니다. Shen-Backpressure는 이러한 주장을 탐구하기 위해 제가 구축한 도구이자 방법론입니다. 실행되는 데모를 통해 이 도구가 하는 일을 보여드린 다음, 동일한 루프를 여러분의 프로젝트에 어떻게 연결할 수 있는지 보여드리겠습니다.

행동적 게이트(Behavioral Gates)와 구조적 게이트(Structural Gates) 대부분의 프롬프트 수준 제약 조건은 행동적 게이트입니다. 우리는 모델에게 "권한 부여 건너뛰지 마", "입력값 검증해", "공유 헬퍼 사용해"라고 지시합니다. 모델은 쓸 만할 정도로 이 지시를 자주 따르지만, 동시에 전체 시스템을 불안정하게 만들 만큼 자주 실패하기도 합니다. 행동적 게이트는 모델이 규칙을 기억하고, 규칙이 적용되는 곳을 인식하며, 국소적 맥락의 중력을 이겨내는 데 의존합니다. 그리고 나서는 휴먼 리뷰어가 전체 코드베이스에 걸쳐 동일한 불변성을 유지해야만 합니다. 구조적 게이트는 다릅니다. 컴파일러, 타입 검사기, 테스트 러너, 린터(linter), 증명 검사기가 여기에 해당합니다. 이들은 각자 앞에 주어진 결과물에 대해 구체적인 답을 생성합니다. 그 답이 완벽하지는 않지만 분명히 실재하며, 자신의 범위 내에서 코드가 잘못되었을 때 단호하게 거부합니다. 이러한 '거부'가 바로 핵심입니다. 덕분에 우리는 작업을 모델의 지시 공간에서 모델이 구축 중인 기반으로 옮길 수 있습니다. 불변성을 기억해 달라고 모델에게 토큰을 소비하여 애원하는 대신, 우발적으로라도 불변성을 위반하기 어렵게 코드 구조를 배치하는 것입니다. 우리가 가장 중요하게 여기는 속성을 가져와서 기계가 검사할 수 있는 형태로 표현하고, 이를 구현에 투영(project)한 다음, 창발된 결과물이 이를 만족할 때까지 루프가 해당 검사에서 튕겨 나가도록(반복되도록) 만드는 것입니다. 바로 이 점이 Geoff Huntley의 Ralph와 '역압력을 낭비하지 마세요(Don't Waste Your Backpressure)'라는 에세이에서 사용하는 의미에서의 역압력을 강력하게 만드는 요소입니다. 이전 오류가 다음 반복으로 파이프될 때, 결정론적 게이트는 루프에 '분위기'보다 더 단단한 무언가를 밀어내는 역할을 합니다. 이러한 루프는 더 이상 틈새 아이디어가 아닙니다. Codex CLI는 이제 /goal을 탑재하고 출시됩니다. 이는 OpenAI만의 Ralph 루프 방식으로, 여러 턴에 걸쳐 목표를 유지하고 목표가 충족될 때까지 멈추기를 거부합니다.

기반(Substrate)의 전환 강제할 가치가 있는 불변성은 대개 정확하게 명시하기 쉽습니다. 사용자는 인증되었고, 테넌트의 멤버이며, 해당 리소스가 그 테넌트에 속한 경우에만 리소스에 접근할 수 있습니다. 이것은 완전하고 한정된 규칙입니다. 영어(자연어)는 이를 강제하기에 완전히 잘못된 매체일 뿐입니다. Shen-Backpressure는 기계가 기반(타겟 언어의 타입, 생성자, 게이트 등)에 투영할 수 있는 형태로 이러한 종류의 규칙을 작성하기 위해, 시퀀트 계산(sequent-calculus) 타입 시스템을 갖춘 작고 정적으로 타입이 지정된 리스프(Lisp)인 Shen을 사용합니다.

원문 보기

원문 보기 (영어)

Some of the most serious software bugs are also the most boring. A user should not be able to read another tenant’s data. Nobody disagrees with this, nobody stands up in a design review to defend Alice reading Bob’s records, and yet broken access control remains the #1 category on the OWASP Top 10 . These bugs ship because the rule has been placed in the wrong part of the system. It lives in a prompt, in a review checklist, in the shared expectation that every future engineer, and now every future model invocation, will remember the invariant and reapply it correctly. That assumption was already weak, and with AI generating most of the code, it fails outright. You can do all the obvious things: put rules in CLAUDE.md , write a careful system prompt, add “authorization IS VERY IMPORTANT” to the agent instructions, and you should do all of that. But after the model has written sixteen thousand lines, the real question remains: how do you know the code does what you wanted? Tests help, but tests are empirical. They check the cases you and the model remembered to write, and they cannot speak for the handler someone adds next week. I want to pull a different lever. My bet, stated plainly, is this: for a wide class of production software, structural backpressure beats incremental improvements in agent intelligence. Existing models can already write almost all of your code. The limiting factor is whether you can know that they did what you wanted, and that knowledge comes from the substrate they write against, not from waiting for a smarter model. Shen-Backpressure is the tool and methodology I built to explore that bet. I will show what it does through a running demo, and then show how to wire the same loop into your own project. Behavioral Gates And Structural Gates Most prompt-level constraints are behavioral gates . We tell the model “do not skip authorization,” “validate inputs,” “use the shared helper.” Models follow these instructions often enough to be useful and fail often enough to make the whole arrangement unstable. A behavioral gate depends on the model remembering the rule, recognizing where it applies, resisting the gravitational pull of local context, and then on a human reviewer maintaining the same invariant across the whole codebase. Structural gates are different. A compiler, a type checker, a test runner, a linter, a proof checker. Each produces a concrete answer about the artifact in front of it. The answer is not perfect, but it is real, and inside its scope it refuses when the code is wrong. That refusal is the point. It lets us move work out of the model’s instruction space and into the substrate the model is building on. Instead of spending tokens begging the model to remember an invariant, we arrange the code so the invariant is hard to violate by accident: take the property you care about most, express it in a form a machine can check, project it into the implementation, and let the loop bounce off that check until the emergent artifact satisfies it. This is what makes backpressure, in the sense Geoff Huntley’s Ralph and the essay Don’t Waste Your Backpressure use the term, powerful. When previous errors are piped into the next iteration, a deterministic gate gives the loop something firmer than vibes to push against. That loop is no longer a niche idea: Codex CLI now ships /goal , OpenAI’s own take on the Ralph loop, keeping a goal alive across turns and refusing to stop until it is met. The Substrate Move The invariants worth enforcing are usually easy to state precisely. A user may access a resource only if authenticated, a member of the tenant, and the resource belongs to that tenant. That is a complete, bounded rule. English is simply the wrong medium in which to enforce it. Shen-Backpressure uses Shen , a small, statically-typed Lisp with a sequent-calculus type system, to write that kind of rule in a form a machine can project into the substrate: the target-language types, constructors, and gate commands the model has to write against. You write the spec once; a code generator ( shengen ) lowers it into guard types in your target language. The model writing Go or TypeScript never needs to know Shen exists. It needs the code to compile and the gates to pass. A Proof Chain For Multi-Tenant Auth Here is the heart of the multi-tenant API demo , an excerpt from specs/core.shen : The horizontal line does the work. Premises above the line must be satisfied before the conclusion below it can be constructed. To get a resource-access you need a tenant-access and proof the resource is owned, and to get a tenant-access you need an authenticated principal and proof of membership. The full chain runs jwt-token → authenticated-user → tenant-access → resource-access . See the full spec for the intermediate rules. These types are witnesses. Constructing a value of one of them requires discharging the premises declared in its rule. From Spec To Guard Types shengen lowers each rule into a guard type in the target language. In Go, the fields are unexported, and the generated constructor is the only way to populate one: There is no exotic trick here, only ordinary Go visibility. Code outside the package cannot write TenantAccess{isMember: true} because the fields are lowercase. The constructor is the only path to a populated value, and it refuses isMember == false . Sum types like authenticated-principal get a sealed interface the same way. See the generated guards . Go is the example throughout this post, but the concepts and the tool are not Go-specific. Go and TypeScript are the production targets today, with reference emitters for Python and Rust. The target language is a real choice with real tradeoffs: the seal is only as strong as the encapsulation the language gives you, and agents are not language-neutral either . Smart constructors are old. Type wrappers are old. Codegen is old. The useful move is putting them in the loop as a single refusal surface, sourced from a spec shorter and more reviewable than the enforcement code it generates. Authorization Without The Hand-Written Check The normal way to write a multi-tenant handler is to put an if in every endpoint: That pattern is reasonable, and it is exactly the sort of reasonable thing that gets forgotten on the seventh handler or the third refactor. In the Shen-Backpressure version the membership check still exists. There is still a database query, but it is concentrated at the construction boundary for TenantAccess instead of scattered across handlers as a convention: The handler then operates on a value that represents the already-traversed chain . The proof travels with the value. In the running demo, Alice, a member of Acme, can list Acme’s resources and is refused Globex’s: If the agent tries to skip the chain and pass a raw value, the build fails before a binary exists: That short, mechanical “no” is the backpressure. I want more of those, and fewer paragraphs in the prompt. Try It The demo is built to be read end to end. Clone the repo and open examples/multi-tenant-api/ : it ships the spec, the generated guards, the Ralph loop that built it ( cmd/ralph/ ), and a curl transcript in demo.md . To wire it into your own project, install the sb CLI and run: sb init scaffolds a starter spec and the gate scripts, and with -config it also scaffolds the sb.toml manifest. Every iteration of the loop runs a fixed set of gates, declared in sb.toml : Gate Command Catches shengen sb gen Drift between the Shen spec and the generated guards test go test ./... Runtime invariant failures and ordinary regressions build go build ./... Type-signature mismatches, invalid proof-chain usage shen tc+ bin/shen-check.sh Internal inconsistency in the Shen spec tcb audit bin/shenguard-audit.sh Hand edits to generated guard code Those five are the default set. sb also has an optional, still-experimental sixth gate ( shen-derive , for spec-equivalence testing) that runs only when explicitly configured. Wh

AI 코딩 에이전트 소프트웨어 테스트 정형 검증 보안 취약점 코드 생성