Hacker News • 74일 전

대형언어모델을 위한 효율적 온라인 메모리 δ-Mem

IMP

8/10

핵심 요약

대형언어모델(LLM)의 장기 기억 및 에이전트 시스템에서 과거 정보를 효율적으로 재사용하기 위한 경량 메모리 메커니즘인 δ-Mem(델타-맴)이 제안되었습니다. 이 기술은 고정된 크기의 8x8 온라인 메모리 상태 행렬만 사용해 기존 모델의 성능을 평균 1.10배, 메모리 집약적 벤치마크에서는 최대 1.31배 향상시켰습니다. 전체 파인튜닝이나 모델 교체 없이 어텐션(Attention) 연산에 저위상 보정을 적용하는 방식으로 효율적인 메모리 활용을 입증했다는 점에서 실무적 가치가 높습니다.

번역된 본문

컴퓨터 과학 > 인공지능 arXiv:2605.12357 (cs) [2026년 5월 12일 제출]

제목: $δ$-mem: 대형 언어 모델을 위한 효율적인 온라인 메모리 저자: Jingdi Lei, Di Zhang, Junxian Li, Weida Wang, Kaixuan Fan, Xiang Liu, Qihan Liu, Xiaoteng Ma, Baian Chen, Soujanya Poria

초록: 대형 언어 모델(LLM)은 장기 비서 및 에이전트 시스템에서 과거 정보를 지속적으로 축적하고 재사용해야 할 필요성이 점차 커지고 있습니다. 단순히 컨텍스트 윈도우(Context Window)를 확장하는 방식은 비용이 많이 들며, 효과적인 컨텍스트 활용을 보장하지 못하는 경우가 많습니다. 본 논문에서는 고정된 풀 어텐션(Full-attention) 백본(Backbone)에 연상 메모리(Associative Memory)의 소형 온라인 상태를 추가하여 성능을 높이는 경량 메모리 메커니즘인 $δ$-mem을 제안합니다. $δ$-mem은 과거 정보를 델타 규칙 학습(delta-rule learning)을 통해 업데이트되는 고정 크기의 상태 행렬로 압축하며, 생성 과정에서 이를 읽어 들여 백본의 어텐션 연산에 대한 저위상(Low-rank) 보정을 생성합니다. 단 8x8 크기의 온라인 메모리 상태만으로도 $δ$-mem은 평균 점수를 고정된 백본 대비 1.10배, 가장 강력한 비-$δ$-mem 메모리 베이스라인 대비 1.15배 향상시켰습니다. 특히 메모리 사용이 집중되는 벤치마크에서 더 큰 성능 향상을 달성하여, MemoryAgentBench에서 1.31배, LoCoMo에서 1.20배의 성능을 기록했으며 동시에 일반적인 기능은 대부분 그대로 유지했습니다. 이러한 결과는 전체 파인튜닝(Full Fine-tuning), 백본 교체, 명시적 컨텍스트 확장 없이도 소형 온라인 상태를 어텐션 연산에 직접 결합하여 효과적인 메모리를 구현할 수 있음을 보여줍니다.

원문 보기

원문 보기 (영어)

--> Computer Science > Artificial Intelligence arXiv:2605.12357 (cs) [Submitted on 12 May 2026] Title: $δ$-mem: Efficient Online Memory for Large Language Models Authors: Jingdi Lei , Di Zhang , Junxian Li , Weida Wang , Kaixuan Fan , Xiang Liu , Qihan Liu , Xiaoteng Ma , Baian Chen , Soujanya Poria View a PDF of the paper titled $\delta$-mem: Efficient Online Memory for Large Language Models, by Jingdi Lei and 9 other authors View PDF Abstract: Large language models increasingly need to accumulate and reuse historical information in long-term assistants and agent systems. Simply expanding the context window is costly and often fails to ensure effective context utilization. We propose $\delta$-mem, a lightweight memory mechanism that augments a frozen full-attention backbone with a compact online state of associative memory. $\delta$-mem compresses past information into a fixed-size state matrix updated by delta-rule learning, and uses its readout to generate low-rank corrections to the backbone's attention computation during generation. With only an $8\times8$ online memory state, $\delta$-mem improves the average score to $1.10\times$ that of the frozen backbone and $1.15\times$ that of the strongest non-$\delta$-mem memory baseline. It achieves larger gains on memory-heavy benchmarks, reaching $1.31\times$ on MemoryAgentBench and $1.20\times$ on LoCoMo, while largely preserving general capabilities. These results show that effective memory can be realized through a compact online state directly coupled with attention computation, without full fine-tuning, backbone replacement, or explicit context extension. Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2605.12357 [cs.AI] (or arXiv:2605.12357v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2605.12357 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Jingdi Lei [ view email ] [v1] Tue, 12 May 2026 16:31:44 UTC (609 KB) Full-text links: Access Paper: View a PDF of the paper titled $\delta$-mem: Efficient Online Memory for Large Language Models, by Jingdi Lei and 9 other authors View PDF TeX Source view license Current browse context: cs.AI < prev | next > new | recent | 2026-05 Change to browse by: cs References & Citations NASA ADS Google Scholar Semantic Scholar export BibTeX citation Loading... BibTeX formatted citation × loading... Data provided by: Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer ( What is the Explorer? ) Connected Papers Toggle Connected Papers ( What is Connected Papers? ) Litmaps Toggle Litmaps ( What is Litmaps? ) scite.ai Toggle scite Smart Citations ( What are Smart Citations? ) Code, Data, Media Code, Data and Media Associated with this Article alphaXiv Toggle alphaXiv ( What is alphaXiv? ) Links to Code Toggle CatalyzeX Code Finder for Papers ( What is CatalyzeX? ) DagsHub Toggle DagsHub ( What is DagsHub? ) GotitPub Toggle Gotit.pub ( What is GotitPub? ) Huggingface Toggle Hugging Face ( What is Huggingface? ) ScienceCast Toggle ScienceCast ( What is ScienceCast? ) Demos Demos Replicate Toggle Replicate ( What is Replicate? ) Spaces Toggle Hugging Face Spaces ( What is Spaces? ) Spaces Toggle TXYZ.AI ( What is TXYZ.AI? ) Related Papers Recommenders and Search Tools Link to Influence Flower Influence Flower ( What are Influence Flowers? ) Core recommender toggle CORE Recommender ( What is CORE? ) Author Venue Institution Topic About arXivLabs arXivLabs: experimental projects with community collaborators arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs . Which authors of this paper are endorsers? | Disable MathJax ( What is MathJax? )

인공지능 대형언어모델 메모리 어텐션 연구