Hacker News • 68일 전

멀티 스트림 LLM: 프롬프트·추론·입출력 병렬화

IMP

8/10

핵심 요약

기존 LLM의 순차적 메시지 처리 방식이 갖는 병목 현상을 해결하기 위해, 읽기, 쓰기, 추론을 각각 독립적인 '병렬 스트림'으로 분리하는 새로운 멀티 스트림 LLM 아키텍처가 제안되었습니다. 이 연구는 AI 에이전트가 새로운 입력을 읽으면서 동시에 다른 작업을 수행하거나 추론할 수 있게 만들어, 실행 효율성과 보안성, 모니터링 능력을 획기적으로 향상시킵니다. 이는 자율형 AI 에이전트 및 코딩 어시스턴트의 구조적 한계를 넘어서는 중요한 연구 성과입니다.

번역된 본문

--> 컴퓨터 과학 > 머신러닝 arXiv:2605.12460 (cs) [2026년 5월 12일 제출]

제목: Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs (멀티 스트림 LLM: 사고, 입력 및 출력의 병렬 스트림으로 언어 모델의 병목 현상 해제하기) 저자: Guinan Su, Yanwu Yang, Xueyan Li, Jonas Geiping

초록: 언어 모델의 지속적인 기능 향상으로 인해 코딩이나 컴퓨터 사용 애플리케이션 등 자율 에이전트의 구동 엔진으로서의 광범위한 활용이 가능해졌습니다. 그러나 이러한 시스템의 핵심은 초기의 명령어 미세조정(Instruction-tuning) 모델인 ChatGPT 이후 크게 변경되지 않았습니다. 최첨단 AI 에이전트조차도 단일 연산 스트림 내에서 사용자, 시스템, 자체(즉, 사고의 chain, Chain-of-thought) 및 도구와 메시지를 교환하는 방식에 의존하여 작동합니다. 챗 모델에서 이러한 단일 스트림 방식은 여러 가지 한계를 초래합니다. 예를 들어, 에이전트는 데이터를 읽는 동안 작업(출력 생성)을 수행할 수 없으며, 반대로 작성하는 동안 새로운 정보에 반응할 수 없습니다. 이와 유사하게 에이전트는 사고하는 동안 행동할 수 없고, 정보를 읽거나 행동하는 동안 사고할 수 없습니다.

본 연구에서는 순차적 메시지 형식에 대한 명령어 미세조정에서 여러 개의 병렬 연산 스트림에 대한 명령어 미세조정으로 전환하고, 각 역할을 별도의 스트림으로 분할함으로써 모델의 병목 현상을 해제할 수 있음을 보여줍니다. 이후 언어 모델의 모든 순전파(Forward pass)는 여러 입력 스트림에서 동시에 데이터를 읽고, 여러 출력 스트림에서 토큰(Token)을 생성하며, 이 모든 것은 이전 시간 단계(Timestep)에 인과적으로 종속됩니다. 우리는 이러한 데이터 기반의 변화가 앞서 언급한 여러 사용성 제한을 해결하고, 병렬화를 통해 모델 효율성을 높이며, 관심사의 분리(Separation of concerns)를 통해 모델 보안을 개선하고, 모델 모니터링 기능을 추가로 향상시킬 수 있다고 주장합니다.

코멘트: 프리프린트, 37페이지. 코드는 다음 https URL에서 확인 가능 주제: Machine Learning (cs.LG); Computation and Language (cs.CL) 인용: arXiv:2605.12460 [cs.LG] (또는 이 버전의 경우 arXiv:2605.12460v1 [cs.LG]) https://doi.org/10.48550/arXiv.2605.12460 자세히 보기 DataCite를 통한 arXiv 발급 DOI (등록 대기 중)

제출 기록 보낸 사람: Jonas Geiping [이메일 보기] [v1] 2026년 5월 12일 화요일 17:47:41 UTC (871 KB)

전문 링크: 논문 접근: Guinan Su 및 3명의 공동 저자가 작성한 "Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs"라는 제목의 논문 PDF 보기 PDF 보기 HTML (실험적) TeX 소스 라이선스 보기

현재 탐색 컨텍스트: cs.LG < 이전 | 다음 > 새 글 | 최근 | 2026-05 다음으로 탐색 변경: cs cs.CL

참고 문헌 및 인용 NASA ADS Google 학술 검색 Semantic Scholar BibTeX 인용 내보내기 로딩 중... BibTeX 형식의 인용 × 로딩 중... 제공된 데이터: 북마크 서지 도구 서지 및 인용 도구 서지 탐색기 서지 탐색기 전환 (탐색기란 무엇인가요?) Connected Papers Connected Papers 전환 (Connected Papers란 무엇인가요?) Litmaps Litmaps 전환 (Litmaps란 무엇인가요?) scite.ai scite 스마트 인용 전환 (스마트 인용이란 무엇인가요?)

코드, 데이터, 미디어 본 논문과 관련된 코드, 데이터 및 미디어 alphaXiv alphaXiv 전환 (alphaXiv란 무엇인가요?) 코드 링크 CatalyzeX 논문용 코드 파인더 전환 (CatalyzeX란 무엇인가요?) DagsHub DagsHub 전환 (DagsHub란 무엇인가요?) GotitPub GotitPub 전환 (Gotit.pub란 무엇인가요?) Huggingface Hugging Face 전환 (Hugging Face란 무엇인가요?) ScienceCast ScienceCast 전환 (ScienceCast란 무엇인가요?)

데모 데모 Replicate Replicate 전환 (Replicate란 무엇인가요?) Spaces Hugging Face Spaces 전환 (Spaces란 무엇인가요?) Spaces TXYZ.AI 전환 (TXYZ.AI란 무엇인가요?)

관련 논문 추천 및 검색 도구 영향력 꽃 링크 영향력 꽃 (영향력 꽃이란 무엇인가요?) 핵심 추천 도구 CORE 추천 도구 전환 (CORE란 무엇인가요?) IArxiv 추천 도구 IArxiv 추천 도구 전환 (IArxiv란 무엇인가요?) 저자 장소 기관

원문 보기

원문 보기 (영어)

--> Computer Science > Machine Learning arXiv:2605.12460 (cs) [Submitted on 12 May 2026] Title: Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs Authors: Guinan Su , Yanwu Yang , Xueyan Li , Jonas Geiping View a PDF of the paper titled Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs, by Guinan Su and 3 other authors View PDF HTML (experimental) Abstract: The continued improvements in language model capability have unlocked their widespread use as drivers of autonomous agents, for example in coding or computer use applications. However, the core of these systems has not changed much since early instruction-tuned models like ChatGPT. Even advanced AI agents function on message exchange formats, successively exchanging messages with users, systems, with itself (i.e. chain-of-thought) and tools in a single stream of computation. This bottleneck to a single stream in chat models leads to a number of limitations: the agent cannot act (generate output) while reading, and in reverse, cannot react to new information while writing. Similarly, the agent cannot act while thinking and cannot think while reading or acting on information. In this work, we show that models can be unblocked by switching from instruction-tuning for sequential message formats to instruction-tuning for multiple, parallel streams of computation, splitting each role into a separate stream. Every forward pass of the language model then simultaneously reads from multiple input streams and generates tokens in multiple output streams, all of which causally depend on earlier timesteps. We argue that this data-driven change remedies a number of usability limitations as outlined above, improves model efficiency through parallelization, improves model security through better separation of concerns and can further improve model monitorability. Comments: Preprint, 37 pages. Code at this https URL Subjects: Machine Learning (cs.LG) ; Computation and Language (cs.CL) Cite as: arXiv:2605.12460 [cs.LG] (or arXiv:2605.12460v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2605.12460 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Jonas Geiping [ view email ] [v1] Tue, 12 May 2026 17:47:41 UTC (871 KB) Full-text links: Access Paper: View a PDF of the paper titled Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs, by Guinan Su and 3 other authors View PDF HTML (experimental) TeX Source view license Current browse context: cs.LG < prev | next > new | recent | 2026-05 Change to browse by: cs cs.CL References & Citations NASA ADS Google Scholar Semantic Scholar export BibTeX citation Loading... BibTeX formatted citation × loading... Data provided by: Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer ( What is the Explorer? ) Connected Papers Toggle Connected Papers ( What is Connected Papers? ) Litmaps Toggle Litmaps ( What is Litmaps? ) scite.ai Toggle scite Smart Citations ( What are Smart Citations? ) Code, Data, Media Code, Data and Media Associated with this Article alphaXiv Toggle alphaXiv ( What is alphaXiv? ) Links to Code Toggle CatalyzeX Code Finder for Papers ( What is CatalyzeX? ) DagsHub Toggle DagsHub ( What is DagsHub? ) GotitPub Toggle Gotit.pub ( What is GotitPub? ) Huggingface Toggle Hugging Face ( What is Huggingface? ) ScienceCast Toggle ScienceCast ( What is ScienceCast? ) Demos Demos Replicate Toggle Replicate ( What is Replicate? ) Spaces Toggle Hugging Face Spaces ( What is Spaces? ) Spaces Toggle TXYZ.AI ( What is TXYZ.AI? ) Related Papers Recommenders and Search Tools Link to Influence Flower Influence Flower ( What are Influence Flowers? ) Core recommender toggle CORE Recommender ( What is CORE? ) IArxiv recommender toggle IArxiv Recommender ( What is IArxiv? ) Author Venue Institution Topic About arXivLabs arXivLabs: experimental projects with community collaborators arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs . Which authors of this paper are endorsers? | Disable MathJax ( What is MathJax? )

LLM 아키텍처 AI 에이전트 병렬 처리 머신러닝 논문 명령어 미세조정