Hacker News • 99일 전

AI 에이전트의 미래는 비동기(Async)다

IMP

9/10

핵심 요약

LLM 기반 에이전트가 기존의 실시간 채팅(HTTP 기반 동기 방식)을 넘어, 백그라운드에서 비동기적으로 작동하는 방식으로 진화하고 있습니다. 이에 따라 에이전트의 장시간 작업 수명과 짧은 HTTP 연결 수명 간의 불일치 문제가 대두되며, 새로운 통신 방식과 아키텍처의 필요성이 강조되고 있습니다.

번역된 본문

모든 에이전트는 비동기(Async) 방식으로 전환되고 있다 (2026년 4월 20일 · 8분 읽기)

과거에는 에이전트와 동기식으로 대화했습니다. 이제는 여러분이 일하는 동안 백그라운드에서 실행되는 존재가 되었습니다. 이러한 변화가 생기면서 통신(Transport) 방식에 문제가 발생합니다.

LLM이 존재해 온 대부분의 시간 동안, 우리는 채팅 스타일의 창을 열고 프롬프트를 입력하는 방식으로 사용했습니다. LLM은 토큰 단위로 응답을 스트리밍합니다. ChatGPT, claude.ai, Claude Code가 작동하는 방식입니다. 기본적으로 모든 AI SDK나 AI 라이브러리의 데모도 마찬가지입니다.

LLM 챗봇이 현재 AI가 가능한 모든 것의 한계라고 생각하기 쉽습니다. 하지만 그렇지 않습니다. 대신, 여러분의 모든 에이전트는 비동기(Async) 방식으로 전환되고 있습니다.

에이전트는 크론(Cron) 작업, 웹훅(Webhook) 지원, WhatsApp 연동, 스마트폰을 통한 '원격 제어', 예약된 작업 및 루틴(Routine) 등을 갖추게 되었습니다. 에이전트는 백그라운드에서 실행되며, 우리가 일하는 동안 작업하고 비동기적으로 결과를 보고하는 무언가가 되어가고 있습니다.

에이전트는 Temporal, Vercel WDK, Relay.app 등에서 워크플로우를 얻고 있습니다. 터미널이나 웹 챗 앞에 앉아있는 인간은 이제 단지 하나의 모드일 뿐이며, 점점 더 흥미로운 모드는 아닙니다. 정말 흥미로운 것은 에이전트가 사람의 동기식 감독 없이 무엇을 할 수 있는지입니다.

문제는 챗봇이 주로 HTTP를 기반으로 구축된다는 것입니다. 프롬프트가 포함된 HTTP 요청과 HTTP 응답으로 LLM이 생성한 토큰의 SSE 스트림이 있습니다. 하지만 이 방식은 에이전트가 비동기로 실행될 때 작동하지 않습니다. 응답을 스트리밍할 HTTP 연결 자체가 없기 때문입니다.

OpenClaw의 비동기 도약

OpenClaw는 에이전트가 WhatsApp 채팅 안에 존재할 수 있음을 보여줌으로써 비동기 에이전트를 향한 큰 도약을 이루었습니다. 에이전트는 사용자와 함께 이동할 수 있었고, 백그라운드에서 작업을 수행할 수 있었습니다. OpenClaw는 브라우저나 터미널에 얽매여 있지 않아도 AI가 우리를 위해 일하게 할 수 있음을 보여주었습니다.

Anthropic의 OpenClaw 모델에 대한 직접적인 대응은 MCP 기반의 '채널(Channels)'이며, 이를 통해 외부 채팅 시스템에서 Claude Code 세션으로 비동기적으로 메시지를 푸시할 수 있습니다. 또한 /loop 및 /schedule 슬래시 명령어와 루틴(Routines)을 제공하여 백그라운드에서 에이전트를 예약하고 실행할 수 있게 합니다. 또한 스마트폰이나 다른 브라우저에서 Claude Code 세션을 이어갈 수 있는 '원격 제어(Remote Control)' 기능도 있습니다.

ChatGPT에는 필요시 사용자에게 연락할 수 있는 비동기 에이전트를 트리거하는 '예약 작업(Scheduled tasks)'이 있습니다. Cursor에는 클라우드의 백그라운드에서 실행되는 '백그라운드 에이전트'가 있습니다. 이러한 모든 기능은 터미널이나 채팅 창에 앉은 사람이 에이전트와 턴 바이 턴(Turn-by-turn)으로 상호작용하는 결합을 깨는 것에 관한 것입니다. 이들은 에이전트와의 상호작용을 연속적이고, 원격적이며, 장시간 실행되고, 비동기적으로 만듭니다.

통신(Transport)의 불일치

이 모든 새로운 비동기 기능은 동일한 속성을 공유합니다. 바로 에이전트 작업의 수명(Lifetime)이 단일 HTTP 연결의 수명에서 분리(Decoupled)된다는 것입니다.

챗봇 데모 앱에서 에이전트는 HTTP 연결이 열려 있는 동안만 처리를 수행합니다. LLM은 HTTP 요청에 응답하여 추론을 수행하고 SSE 스트림으로 HTTP 응답에 토큰을 스트리밍합니다.

이전에도 챗봇의 최악의 적은 '페이지 새로고침'이라고 말한 적이 있는데, 이는 전적으로 이 통신 방식의 불일치 때문입니다. HTTP 요청-응답은 페이지 새로고침을 견디지 못하며, 비동기 에이전트에도 서비스할 수 없습니다.

기존의 HTTP 기반 통신이 깔끔하게 처리하지 못하는 네 가지 시나리오가 있습니다:

에이전트가 호출자보다 오래 살아있는 경우 (Agent outlives the caller): 크론(Cron)에 의해 루틴이 실행되거나, 에이전트가 작업을 완료하는 데 오랜 시간이 걸리는 경우입니다. 5분 후에 에이전트는 결과를 가지고 있지만, 더 이상 아무도 듣고 있지 않습니다. 결과는 어디로 가야 할까요? 현재로서는 데이터베이스에 저장되며, 특정 세션 URL을 통해 폴링(Polling)해야 합니다(솔직히 말해서, 별로 좋지 않은 방법입니다).
에이전트가 사전 요청 없이 메시지를 푸시하려는 경우 (Agent wants to push unprompted): 에이전트가 야간 백로그 검토를 마치고 검토해야 할 3개의 PR(Pull Request)을 가지고 있거나, 비동기 워크플로우가 사람의 승인 단계에 도달하여 계속 진행하기 전에 동의가 필요한 상황입니다. 다시 사용자에게 연결할 수 있는 통로가 없습니다. 현재는...

(※ 원문 누락으로 인해 내용이 마무리되지 않은 부분입니다.)

원문 보기

원문 보기 (영어)

All your agents are going async Apr 20, 2026 · 8 min read Agents used to be a thing you talked to synchronously. Now they’re a thing that runs in the background while you work. When you make that change, the transport breaks. For most of the time LLMs have been around, you use them by opening a chat-style window and typing a prompt. The LLM streams the response back token-by-token. It’s how ChatGPT, claude.ai, and Claude Code work. It’s also how the demos work for basically every AI SDK or AI Library. It’s easy to think that LLM chatbots are the ‘art of the possible’ for AI right now. But that’s not the case. Instead, all your agents are going async . Agents are getting crons, webhook support, whatsapp integrations, ‘remote control’ from your phone, scheduled tasks and routines. Agents are becoming something that runs in the background, working while you work, and reporting back results async. Agents are getting workflows in Temporal , Vercel WDK , Relay.app , etc. A human sitting at a terminal or webchat is just one mode now, and increasingly it’s not the interesting one. The interesting thing is what agents can do while not being synchronously supervised by a human. The problem is that chatbots are primarily built on HTTP. An HTTP request with the prompt, and a SSE stream of LLM generated tokens back on the HTTP response. But this doesn’t work when the agent is running async. There’s no HTTP connection to stream the response back. OpenClaw’s async step OpenClaw took a big step towards async agents, by showing people that an agent could live in your WhatsApp chat. The agent could travel around with you, and could work on stuff in the background. OpenClaw showed that you didn’t have to be glued to your browser or terminal to get AI to do work for you. Anthropic’s direct response to the OpenClaw model is Channels , which is MCP based and allows you to push messages async from an external chat system into a Claude Code session. But they also have /loop and /schedule slash commands , as well as Routines , both allowing you to schedule and run agents in the background. Anthropic also has Remote Control , which lets you continue a Claude Code session from your phone or another browser. ChatGPT has scheduled tasks which trigger agents async, that can reach out to you if needed. Cursor has background agents that run in the background in the cloud. All of these features are about breaking the coupling between a human sitting at a terminal or chat window and interacting turn-by-turn with the agent. They make interactions with agents continuous, remote, long-running, and async. The transport mismatch All these new async features share the same property; the lifetime of an agent’s work is decoupled from the lifetime of a single HTTP connection. In chatbot demo apps, the agent is only processing while the HTTP connection is open. The LLM is doing inference in response to an HTTP request and streaming the tokens back on the HTTP response as an SSE stream. I’ve said before that a chatbot’s worst enemy is page refresh , and this is entirely because of the transport mismatch. HTTP request-response can’t survive a page refresh, and it can’t serve async agents. There are four scenarios that the old transport based on HTTP can’t handle cleanly: Agent outlives the caller : A routine fires from a cron, or the agent takes a long time to do its work. Five minutes later the agent has a result, but no one’s listening anymore. Where do the results go? Right now, they go in a database and you have to poll for them with some session URL (which, y’know, sucks). Agent wants to push unprompted : The agent finished a nightly backlog review and has three PRs for you to review. Or your async workflow hits a human-approval step and needs you to say yes before it can keep going. There’s no connection back to you. Right now, they email you or send a slack message. Caller changes : You started a task at your desk, went to lunch, and want to check on it from your phone. Anthropic’s Remote Control handles this, but only by building custom backend session storage and management. It’s not a first-class feature of the HTTP transport. Multiple humans in one session : You have a team of five people working on a task together, and an agent is helping you. The agent needs to be able to push updates to all five of you, and take input from any of you. Part of the reason that folks found OpenClaw so awesome is that it handles all of these scenarios for you. OpenClaw’s model separates the lifetime of the agent’s work from the lifetime of the connection to the human . The agent can do work async, and then use WhatsApp, iMessage, Telegram, Discord or whatever async chat system to push the results back to you when it’s done. This just isn’t possible with HTTP request-response. So how are folks solving this? Looking across the industry, there are a bunch of different solutions to this. Clearly there’s the OpenClaw model where all the interaction is through some external chat provider. This chat provider also provides the conversation history to the agent, so the agent can have context on the conversation even after restarts. But this is just an extension on the chat-based model. I don’t think it’s the most interesting solution. Most folks are pulling more and more of the session state into a centralised and hosted environment. Anthropic is doing this with Routines and Remote Control. More of the session state, conversation history, and agent inference is running in the hosted Anthropic platform. They are consolidating more of the agent lifecycle and agent connection state into their own platform, rather than just being an LLM inference API. Cloudflare are getting involved too with their own Agents platform built on their workers platform. The Cloudflare Sessions API provides the session and conversation storage for agents, accessible over HTTP. And to fix the async notification problem, Cloudflare has launched their Email for Agents product. These solutions solve only one half The problem actually splits into two halves. The first half is durable state . Where does the agent’s state live, how does the agent have access to that state on restart or when processing async tasks, and where does the agent store its output? The second half is durable transport . How do the bytes of the response get between the agent and the humans or other agents, how does the connection survive disconnect, device switch, fan-out, server-initiated push, etc? The Anthropic and Cloudflare solutions are really focused on the first half of the problem. They are building durable state storage and management for agents. Their solution to getting the bytes of the response is still polling, or HTTP requests. Cloudflare does have websocket support, but it doesn’t survive disconnections for streaming LLM responses. Anthropic and Cloudflare’s solution is based on the idea that if they store all the data required, then clients can always HTTP GET that data later. It half works, but it’s not ‘art of the possible’. Durable transport, durable state Right now, the session and the transport are all wrapped up in a single HTTP request-response. Cloudflare and Anthropic’s hosted features go some way to making the session state durable, but the don’t fix the transport problem. You’re still stuck with HTTP gets, or polling, in order to find out something new has happened. Looking at the OpenClaw model, where the conversation history is in the chat channel and the agent process and LLM provider are both separated from that, you can’t build the same design on Cloudflare or Anthropic. There’s no ’enterprise’ version of the OpenClaw channels model that you can run with your own i

비동기 에이전트 AI 아키텍처 에이전트 워크플로우 MCP LLM