Hacker News • 106일 전

멀티 에이전트 개발은 분산 시스템 문제다

IMP

8/10

핵심 요약

최근 여러 AI 에이전트가 협력하여 소프트웨어를 개발하는 멀티 에이전트 시스템이 주목받고 있지만, 이는 본질적으로 '분산 시스템의 합의 문제'라고 해당 글은 지적합니다. 일각에서는 다음 세대 LLM이 나오면 협력 문제가 자연스럽게 해결될 것이라며 방관하는 태도를 보이지만, 지능이 아무리 뛰어나도 분산 시스템의 근본적인 한계를 피할 수는 없습니다. 따라서 새로운 프로그래밍 언어와 형식적 모델링을 통해 에이전트 간의 상호작용을 체계적으로 관리하려는 노력이 매우 중요합니다.

번역된 본문

멀티 에이전트 소프트웨어 개발은 분산 시스템 문제입니다 (AGI라고 해도 이 문제에서 벗어날 수는 없습니다).

최근 저는 서로 협력하는 LLM 시스템을 관리하기 위한 스캐폴딩(Scaffolding)과 언어에 대해 많은 고민을 하고 있습니다. 이 분야에서는 새로운 프로그래밍 언어가 이상적인 해결책이 될 수 있습니다. 현재 저희는 멀티 에이전트 워크플로우를 기술하기 위한 재미있는 안무(Choreographic) 언어를 개발하는 논문을 준비하고 있습니다. 안무 언어는 실제 분산 프로토콜을 구현하기에는 다소 약한 면이 있지만, 에이전트 간에 발생하는 맞춤형 상호작용을 기술하기에는 매우 간결하고 우아한 형식주의(Formalism)라는 것을 알게 되었습니다. 특히 여기에 게임 이론을 결합하면 더욱 그렇습니다. 곧 공유할 예정이니 기대해 주세요!

그런데 최근 제가 계속해서 듣는 짜증스러운 피드백 중 하나는(이론적으로 더 잘 알아야 할 다른 검증 연구자들로부터조차), 에이전트를 관리하기 위한 형식주의와 언어 개발 목표에 대한 일종의 무관심입니다. 이들의 주장은 다음 인용문으로 요약할 수 있습니다: "에이전트 조정 문제의 가장 좋은 해결책은 그냥 몇 달만 기다리는 것이다."

이 주장은 대략 다음과 같이 요약됩니다:

현재의 멀티 에이전트 LLM 시스템은 대규모 소프트웨어를 자율적으로 구축할 수 없다 (동의 ✅).
이는 결국 조정(Coordination)의 문제로 귀결된다 (동의 ✅).
다음 세대 모델은 더 똑똑해질 것이다 (동의 ✅).
다음 세대 모델은 조정 문제를 겪지 않을 것이다 (⁇ 무슨 소리인가 ⁇).

이 함의는 이러한 시스템을 기술하고 관리하기 위해 언어와 도구를 구축하는 것은 시지프스적인(Sisyphean) 업무라는 것입니다. 즉, 새로운 모델이 필연적으로 그것들을 구식으로 만들 것이고, 모든 노력이 헛수고가 될 것이라는 주장입니다.

검증 연구자로서 저는 이러한 포기가 다소 성급하고 잘못된 방향이라고 생각합니다. 사람들이 무시하고 있지만, 말 그대로 이 문제에 대해 다루고 있는 풍부한 분산 시스템 문헌이 존재하며, 모델의 능력과 무관하게 성립하는 수많은 불가능성 결과(Impossibility results)가 있습니다. 다음 모델이 AGI라고 할지라도(ㅋㅋ) 조정 문제는 근본적인 문제이며, 단순히 더 똑똑한 에이전트만으로는 이 문제를 피할 수 없습니다.

이 블로그 글에서는 이 아이디어를 구체화하고, 멀티 에이전트 소프트웨어 개발 문제를 형식적 모델로 세분화하여 표준 분산 시스템의 불가능성 결과와의 연관성을 확립하고자 합니다. 참여자가 아무리 AGI 수준이라 해도 분산 합의(Distributed consensus)는 어렵습니다.

소프트웨어 개발의 형식적 모델 "Claude, 레시피를 추적하는 앱을 만들어줘. 실수는 하지 마."

우리는 멀티 에이전트 합성의 문제를 공식적으로 다음과 같이 모델링할 수 있습니다:

프롬프트 (P :=) "(\textit{레시피를 추적하는 앱})"이 주어졌을 때, 공식 (\Phi(P))를 프롬프트와 일치하는 소프트웨어들의 집합으로 정의할 수 있습니다: [ \Phi(P) := { \phi | \phi ~ \text{는 프로그램} ∧ \phi ~\text{는 프롬프트}~P \text{와 일치함}} ]

여기서 핵심은 자연어 프롬프트 특성상 명세가 불충분하다는(Underspecified) 것입니다. 즉, 프롬프트와 일치하는 여러 프로그램이 존재할 수 있습니다. 우리가 LLM을 사용하여 소프트웨어 시스템을 구축하고 작성할 때, 실질적으로 우리는 LLM에게 이 집합의 여러 요소 중 하나를 선택해 달라고 요청하는 것입니다.

반대로 우리가 멀티 에이전트 소프트웨어 개발을 할 때, 즉 여러 에이전트 (A_1, ⋯, A_n)을 실행시키고 그들에게 소프트웨어를 구축하도록 요청할 때, 본질적으로 각자가 소프트웨어 구성 요소 (\phi_1, ⋯, \phi_n)을 생성하도록 요청하는 것이며, 이때 이 모든 구성 요소는 프롬프트의 단일하고 일관된 해석을 정제(Refine)해야 합니다: [ C(\phi_1, \cdots, \phi_n) := \exists \phi \in \Phi(P), \forall i, \phi_i ~~\text{는}~~ \phi \text{를 정제함} ]

다시 말해, 이것은 곧 거대한 분산 합의 문제(Distributed consensus problem)와 다름 아닙니다. 다시 설명하자면, 사용자의 프롬프트 (P)는 먼저 계획을 통해 여러 에이전트 (a_1, \cdots, a_n)에 대한 작업으로 분할됩니다. 그런 다음 이 에이전트들은 각자의 코딩 작업 (\phi_1, \cdots, \phi_n)을 병렬로 수행하며, 합성이 성공적으로 이루어지면 최종적으로 생성된 개별 구성 요소들로 구성된 소프트웨어 시스템 (\phi)가 완성될 것이라고 기대하는 것입니다.

원문 보기

원문 보기 (영어)

Multi-agentic Software Development is a Distributed Systems Problem (AGI can't save you from it) Recently, I've been thinking a lot about scaffolding and languages for managing systems of LLMs coordinating with each other — new programming languages might be the ideal solution for this area. We have a rather fun paper in the works developing a fun choreographic language for describing multi-agent workflows — it turns out that choreographies, while being too weak for any practical distributed protocol, are actually quite a concise and elegant formalism for describing the bespoke interactions that arise between agents, especially so if we incorporate game theory. Keep an eye out for that, we'll be sharing it soon! Now, one annoying piece of feedback that I keep on hearing, even from other verification researchers who should know better, is a sort of apathy about the state of affairs, and towards the goals of developing formalisms and languages to manage agents. The common refrain is best summarised as the quote: "The best solution to agentic coordination is to just wait a few months." The argument roughly summarises to something like: Current multi-agentic LLM systems are unable to build large-scale software autonomously ( agreed ✅ ). This boils down to an issue of coordination ( agreed ✅ ). The next generation of models will be smarter ( agreed ✅ ). The next generation of models will not have coordination problems (⁇ HUH ⁇). The main implication is that building languages and tooling to describe and manage these systems is a sisyphean task; newer models will inevitably render them obsolete, and the entire effort will be in vain. As a verification researcher I find this capitulation a little premature and misguided: there's a rich literature of distributed systems literature, literally about this very problem , that people are ignoring, and a number of impossibility results that are invariant to model capability. Even if the next models are AGI (lol), the problem of coordination is a fundamental one, and smarter agents alone can't escape it. In this blog post, I want to flesh out this idea, and break down the problem of multi-agent software development into a formal model and establish some connections to some standard distributed systems impossibility results. Distributed consensus is difficult, no matter how AGI your participants are. A Formal Model of Software Development Claude. Make me an app to track recipes. Make no mistakes. We can model the problem of multi-agent synthesis formally as follows: Given a prompt \(P :=\) "\(\textit{An app to track recipes}\)", we can define the formula \(\Phi(P)\) as the set of software consistent with the prompt: \[ \Phi(P) := \{ \phi ~|~ \phi ~ \text{program} ∧ \phi ~\text{is consistent with the prompt}~P \} \] The key point here is that a natural language prompt, by its very nature, is underspecified — i.e there may be multiple programs that are consistent with the prompt. When we use LLMs to build and write software systems, we're effectively asking the LLMs to select one element amongst many from this set. Conversely, when we do multi-agentic software development, i.e we spin up several agents, \(A_1, ⋯, A_n\), and ask them to build a piece of software, we're essentially asking them each to produce software components \(\phi_1, ⋯, \phi_n\) such that they all refine one single consistent interpretation of the prompt: \[ C(\phi_1, \cdots, \phi_n) := \exists \phi \in \Phi(P), \forall i, \phi_i ~\text{refines}~ \phi \] This in other words, is nothing other than one big distributed consensus problem. In other words, the user's prompt \(P\) first gets sent split, via a plan, into tasks for several agents \(a_1, \cdots, a_n\). Then, these agents work in parallel to implement their respective coding tasks \(\phi_1, \cdots, \phi_n\), and by the end, if the synthesis was successful, we're hoping that the final generated software system \(\phi\) composed of each of the individual constructions \(\phi := \phi_1 || \cdots || \phi_n\), satisfies the user's request. This is inherently a consensus problem as the agents \(a_1,\cdots,a_n\) must work concurrently to produce their software artefacts \(\phi_1,\cdots,\phi_n\), but communicate and agree enough that the final piece of software \(\phi_1 || \cdots || \phi_n\) is well formed and satisfies the request. Design decisions or choices in one \(\phi_i\) will result in constraints that affect and influence the possible choices of \(\phi_j\) for other agents. For example, if the agent in charge of implementing network connections \(a_{\text{network}}\) chooses a library with a callback-style async API for requests, then whichever agent is responsible for the overall integration \(a_{\text{integration}}\) must organise the infrastructure around that choice and so on and so forth. Similar choices in other modules will influence the design spaces of other agents and overall the process proceeds as a joint synthesis problem. Is this really a Distributed Consensus Problem? As we complete our formal model of agentic software development, at this point, I'd like to also take a second to shoot down some obvious rebuttals. Why is \(P\) underspecified (i.e. can't \(\|\Phi(P)\| = 1\))? This holds by the very nature of natural language as being ambiguous. As it turns out, we do have a way to give a precise and unambiguous specifications for software. It's called a programming language. Anything else leaves room for the agent to make design decisions. Another way of viewing this is that we could spend some time refining our initial prompt to make it less underspecified, but unless we go all the way to code (at which point, you're not using multiple agents, just one), there's going to be some degree of ambiguity to our prompt. Is the problem inherently concurrent? The motivations for throwing multiple agents at a software system are somewhat outside the scope of this post, but as soon as we have made that decision and we have multiple agents working on our tasks, then the problem is inherently concurrent (irrespective of whether they're working in parallel, or interleaved on a single thread), and we have to solve problems of coordination. Can't we have a single supervisor to dictate choices? Why not have agents make proposals in parallel and then have some kind of supervisor which manages merging PRs into the shared codebase? Nice try! A git repo is a pretty standard approach to trying to coordinate parallel software development, but you haven't solved the fundamental concurrency problem, at best you've hard-coded yourself into a single choice of concurrency, and not a particularly good one: when one design decision is chosen and merged into the codebase, what happens to work that must be rebased? what if it has conflicts? Some work has to be lost. The key point that I'm trying to convey here is not that you can't sometimes do multi-agentic software development while ignoring the concurrency problem – evidently people have had some success at software development with agents without pulling out Paxos — but rather that this perspective helps us be prescient about how we are resolving these fundamental concurrency problems with our coordination workflows. Now, if you take me out for a few drinks, maybe I'd go even further to say that if we want multi-agentic software development to truly scale, then these questions have to be thought about and answered carefully. Impossibility Results for Multi-agentic Software Development Now we have this formal model of agentic software development, it's time for the payoff of this blog post! Let's draw some connections to distributed systems and try and sketch out some impossibility results. FLP for multi-agentic systems (Safety, Liveness, Fault Tolerance, pick two) Of course, where else could we start but with FLP, oh my dear FLP. Fischer, Lynch and Paterson's seminal paper "Impossibility of Distributed Consensus with One Fa

멀티 에이전트 분산 시스템 AI 코딩 형식적 검증 합의 알고리즘