The Decoder • 101일 전

항상 켜져 있는 레이반 메타 안경, 일상 작업 효율 향상

IMP

8/10

핵심 요약

연구진이 항상 켜져 있는(Always-on) 스마트 안경과 자율 에이전트를 결합한 'VisionClaw' 시스템을 발표했습니다. 사용자의 시점을 지속적으로 인식하는 AI가 브라우저, 이메일 등의 디지털 작업을 자동으로 수행하여 작업 완료 시간과 인지 부담을 크게 줄여주는 것으로 나타났습니다. 이는 단순한 음성 명령을 넘어, 실제 환경과 디지털 데이터가 결합된 문맥 기반의 연속적인 AI 사용 패턴으로의 전환을 시사합니다.

번역된 본문

항상 켜져 있는(Always-on) 레이반 메타(Ray-Ban Meta) 안경, 오픈클로(OpenClaw) 기반으로 일상 작업 속도 향상 입증 Tomislav Bezmalinović | 2026년 4월 19일 | THE DECODER 제공

연구진이 스마트 안경을 위한 오픈클로 에이전트를 개발하여, AI가 사용자의 환경을 지속적으로 인식하는 것이 사람들이 에이전틱 AI(Agentic AI) 시스템을 사용하는 방식을 어떻게 변화시키는지 조사했습니다.

콜로라도 대학교, 광주과학기술원, 구글의 연구진은 연속된 1인칭 시점 인식과 디지털 작업의 자율적 실행을 결합한 항상 켜져 있는 에이전틱 AI, '비전클로(VisionClaw)'를 소개했습니다.

연구팀은 디지털 세계와 현실 세계 사이의 간극을 좁히고자 했습니다. AI 에이전트는 소프트웨어를 실행하고 웹에서 작업을 처리할 수 있지만, 물리적 세계를 들여다볼 수는 없습니다. 반면 스마트 안경은 카메라와 마이크를 통해 주변 환경을 포착하지만 스스로 행동하는 데는 한계가 있습니다. 연구팀은 비전클로를 통해 항상 켜져 있는 AI가 일상생활에서 제 역할을 하는지, 그리고 인식과 행동이 하나의 시스템에 통합될 때 실제 상호작용이 어떻게 변화하는지 알아보고자 했습니다.

비전클로 작동 방식 비전클로는 디스플레이가 없는 레이반 메타 안경을 맞춤형 스마트폰 앱을 통해 젬니 라이브(Gemini Live) 및 오픈클로와 연결합니다. 안경은 사용자 주변의 오디오와 개별 프레임을 지속적으로 젬니(Gemini)에 스트리밍하며, 젬니는 이러한 멀티모달 입력을 처리하여 직접 음성으로 답변하거나 오픈클로를 통해 작업을 시작합니다. 에이전트는 브라우저, 이메일, 캘린더, 웹 검색과 같은 도구를 사용한 뒤, 그 결과를 다시 대형 언어 모델(LLM)에 전달합니다. 이 설정은 연속적인 1인칭 시점 인식과 디지털 작업의 에이전틱 실행을 결합합니다.

연구진은 비전클로가 실제로 얼마나 잘 작동하는지, 그리고 사람들이 이러한 시스템을 실제로 어떻게 사용하는지 확인하기 위해 두 가지 연구를 진행했습니다.

첫 번째 연구에서는 12명의 참가자를 대상으로 비전클로를 두 가지 기능이 제한된 시스템과 비교했습니다. 하나는 주변 환경을 인식할 수 있지만 범용적인 에이전트 행동을 수행할 수 없는 레이반 메타 기반의 항상 켜져 있는 AI였고, 다른 하나는 에이전틱 작업을 처리하지만 주변 환경에 대한 지속적인 인식 기능이 없는 오픈클로의 스마트폰 버전이었습니다. 참가자들은 실제 물건이나 물리적 문서와 관련된 네 가지 작업(서류에서 메모하기, 이메일 작성, 제품 조사, 기기 제어 등)을 수행했습니다.

더 적은 노력으로 더 빠른 결과 논문에 따르면, 비전클로는 작업에 따라 13%에서 37% 더 빠르게 작업을 완료했으며, 사용자들은 요구되는 노력이 7%에서 46% 적다고 평가했습니다. 정신적 노력, 시간적 압박, 좌절감이 모두 감소했습니다. 전반적인 성공률은 통계적으로 유사했지만, 안경의 카메라가 영수증과 같이 작거나 시각적으로 읽기 어려운 물체를 안정적으로 캡처하지 못해 비전클로의 메모 작성 작업 성공률은 약 58%로 떨어졌습니다.

연구진은 "결과는 인식과 실행을 통합하는 것이 항상 켜져 있지 않은 상태 및 비에이전트 기준선과 비교하여 작업 완료를 더 빠르게 하고 상호작용 오버헤드를 줄여준다는 것을 보여줍니다"라고 설명했습니다.

두 번째 자전적 현장 연구에서는 비전클로가 일상적인 사용에서 어떻게 수행되는지 조사했습니다. 논문 저자 중 4명이 장기간 직접 시스템을 사용하며 55일의 활발한 참여 일수를 기록했습니다. 그 기간 동안 총 25.8시간의 사용 시간 동안 555건의 음성 주도형 상호작용이 발생했습니다.

연구진은 사람들이 비전클로를 실제로 무엇에 사용했는지 분석하고 6가지 사용 범주를 파악했습니다: 정보 검색(30%), 쇼핑(19%), 콘텐츠 저장(16%), 커뮤니케이션(14%), 기억(12%), 제어(9%).

이러한 범주 외에도, 이 현장 연구는 네 가지 새로운 상호작용 패턴을 밝혀냈습니다: AI 에이전트와의 개방적이고 다단계적인 대화, 정보의 자발적 캡처 및 이후 회상, 더 눈에 띄지 않지만 때로는 덜 안정적인 스크린리스 AI 사용, 시스템이 개인 데이터를 축적함에 따라 시간이 지날수록 유용성이 증가하는 것입니다.

논문은 이러한 결과를 종합하여, 고립된 음성 명령에서 지속적이고 문맥 기반의 사용으로의 전환을 가리킨다고 주장합니다. "성능 향상을 넘어, 배포는..."

원문 보기

원문 보기 (영어)

Always-on Ray-Ban Meta glasses powered by OpenClaw speed up everyday tasks in new study Tomislav Bezmalinović Apr 19, 2026 Nano Banana Pro prompted by THE DECODER A research team developed an OpenClaw agent for smart glasses to find out how continuously perceiving AI changes the way people use agentic AI systems. Researchers from the University of Colorado, the Gwangju Institute of Science and Technology, and Google have introduced VisionClaw, an always-on agentic AI that pairs continuous first-person perception with the autonomous execution of digital tasks. The team set out to bridge the gap between digital and real life: AI agents can run software and handle tasks on the web, but they have no window into the physical world. Smart glasses, on the other hand, capture their surroundings through cameras and microphones but can barely act on their own. With VisionClaw, the researchers wanted to find out whether an always-on AI holds up in everyday life and how real-world interactions shift when perception and action live inside a single system. How VisionClaw works VisionClaw connects a displayless Ray-Ban Meta to Gemini Live and OpenClaw through a custom smartphone app. The glasses continuously stream audio and individual frames from the user's surroundings to Gemini, which processes the multimodal input and either replies directly by voice or kicks off tasks through OpenClaw. The agent taps into tools like a browser, email, calendar, or web search, then feeds the results back to the language model. The setup ties continuous first-person perception to agentic execution of digital tasks. The researchers ran two studies to see how well VisionClaw holds up in practice and how people actually use a system like this. In the first study, they compared VisionClaw against two stripped-down systems with 12 participants: an always-on AI running on the Ray-Ban Meta that perceives the environment but can't perform general agent actions, and a smartphone version of OpenClaw that handles agentic tasks but has no continuous awareness of the surroundings. Participants worked through four tasks involving real objects or physical documents, such as taking notes from paperwork, composing emails, researching products, or controlling devices. Faster results with less effort According to the paper, VisionClaw completed tasks 13 to 37 percent faster depending on the task, and users rated it 7 to 46 percent less demanding. Mental effort, time pressure, and frustration all dropped. Success rates were statistically similar overall, but VisionClaw fell to around 58 percent on the note-taking task because the glasses' camera couldn't reliably capture small or visually challenging objects like receipts. "Results show that integrating perception and execution enables faster task completion and reduces interaction overhead compared to non-always-on and non-agent baselines", the researchers write. In a second, autobiographical field study, the researchers looked at how VisionClaw performs in daily use. Four of the paper's authors used the system themselves over an extended period, logging 55 active participant days. During that time, they generated 555 voice-initiated interactions totaling 25.8 hours of use. The researchers analyzed what people actually used VisionClaw for and identified six usage categories: information retrieval (30 percent), shopping (19 percent), saving content (16 percent), communication (14 percent), remembering (12 percent), and control (9 percent). Beyond those categories, the field study surfaced four emergent interaction patterns: open-ended, multi-step conversations with the AI agent; spontaneous capture and later recall of information; more unobtrusive but sometimes less reliable screenless AI use; and growing usefulness over time as the system accumulated personal data. Taken together, the paper argues, this points to a shift from isolated voice commands toward continuous, context-driven use. "Beyond performance gains, deployment findings reveal a shift in interaction: tasks are initiated opportunistically during ongoing activities, and execution is increasingly delegated rather than manually controlled. These results suggest a new paradigm for wearable AI agents, where perception and action are continuously coupled to support situated, hands-free interaction", the researchers write. VisionClaw: Open source on GitHub The authors argue that VisionClaw points beyond individual use cases toward a new kind of human-AI interaction. Rather than responding to one-off commands like a traditional voice assistant, an always-on system acts more like a continuous, context-aware companion, with perception, memory, and action all working in concert. They also flag open challenges: privacy risks from constant recording, the handling of large volumes of personal data, and the need to design systems that stay unobtrusive in the background. On the technical side, it's worth noting that the researchers used a Ray-Ban Meta without a display, even though Meta already sells a version with a built-in display in the US. A display could meaningfully expand and simplify AI use by surfacing results directly in the user's field of view, making them easier to verify at a glance. Methodologically, the small sample sizes limit what we can take away: the first study included only 12 participants, and the second just four. The bigger problem is that the field study was conducted entirely by four of the paper's authors: people who built the system and know exactly how it works. Google researchers were also involved, and Google has said it plans to launch AI glasses based on Android XR and Gemini later this year . With that in mind, the study shouldn't be read as a fully unbiased evaluation. The paper " VisionClaw: Always-On AI Agents Through Smart Glasses " is freely available online, and VisionClaw itself is open source on GitHub . AI News Without the Hype – Curated by Humans Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section. Subscribe now --> AI news without the hype Curated by humans. More than 16% discount. Read without distractions – no Google ads. Access to comments and community discussions. Weekly AI newsletter. 6 times a year: “AI Radar” – deep dives on key AI topics. Up to 25 % off on KI Pro online events. Access to our full ten-year archive. Get the latest AI news from The Decoder. Subscribe to The Decoder -->

스마트 안경 에이전틱 AI 휴먼-컴퓨터 상호작용 멀티모달 AI 메타(Meta)