Hacker News • 77일 전

AI 시대를 위한 마우스 커서의 재발명

IMP

8/10

핵심 요약

기존의 독립된 AI 창을 띄워야만 했던 불편함을 없애고, 사용자의 워크플로우를 방해하지 않으며 화면의 커서 자체에서 직관적으로 작동하는 'AI 기반 포인터(Pointer)' 연구 결과입니다. 커서가 가리키는 시각적 맥락과 객체를 이해하여, 길고 복잡한 프롬프트 없이도 '이것'과 '저것'을 가리키며 자연스럽게 AI에 명령을 내릴 수 있게 하는 것이 핵심입니다. 이 기술은 크롬(Crhome) 및 구글의 새로운 기기 경험에 통합되어, 단순히 픽셀만 추적하던 커서를 실행 가능한 인터랙티브 객체로 변화시킵니다.

번역된 본문

2026년 5월 12일 연구 AI 시대를 위한 마우스 포인터의 재발명 Adrien Baranes와 Rob Marchant 공유

우리는 AI와 더 매끄럽고 직관적으로 협업할 수 있는 방법을 개발하고 있습니다.

마우스 포인터는 모든 웹사이트, 문서, 워크플로우에서 컴퓨터 화면의 변함없는 동반자였습니다. 기술이 엄청나게 발전했음에도 불구하고, 포인터는 지난 반세기 동안 거의 진화하지 않았습니다. 우리는 포인터가 가리키고 있는 대상이 무엇인지, 그리고 그것이 사용자에게 왜 중요한지 이해할 수 있도록 돕는 새로운 AI 기반 기능을 탐구해 왔습니다.

우리의 목표는 흔한 불만을 해결하는 것입니다. 일반적인 AI 도구는 자체 창 안에 존재하기 때문에, 사용자는 자신의 세계(작업 환경)를 직접 AI 창 안으로 끌어다 놓아야 합니다. 우리는 정반대를 원합니다. 사용자의 흐름을 방해하지 않으면서, 사용자가 사용하는 모든 도구에서 자연스럽게 만나는 직관적인 AI 말입니다.

예를 들어, 건물 이미지를 가리키며 "길 찾아줘"라고 요청하는 상황을 상상해 보십시오. AI 시스템이 이미 맥락을 이해하고 있다면 그 이상의 조치는 필요하지 않습니다.

오늘 우리는 미래의 사용자 인터페이스에 대한 우리의 생각을 이끄는 기본 원칙을 개괄하고, 제미나이(Gemini)가 지원하는 AI 활성화 포인터의 실험용 데모를 공유합니다. 예를 들어, Google AI Studio에 방문하여 단순히 가리키고 말만 해도 이미지를 편집하거나 지도에서 장소를 찾을 수 있습니다.

우리의 인터랙션 원칙 우리는 맥락과 의도를 전달하는 힘든 작업을 사용자에게서 컴퓨터로 전환하고, 텍스트가 많은 프롬프트를 더 간단하고 직관적인 상호작용으로 대체하는 네 가지 원칙을 개발했습니다. 다음은 우리의 접근 방식과 원칙에 대한 설명입니다.

흐름 유지 (Maintain the flow) AI 기능은 모든 앱에서 작동해야 하며, 사용자가 앱들 사이에서 'AI 우회로'를 강요받지 않도록 해야 합니다. 우리의 프로토타입인 AI 활성화 포인터는 사용자가 작업하는 모든 곳에서 사용할 수 있습니다. 예를 들어, PDF를 가리키고 글머리 기호 요약을 요청하여 이메일에 바로 붙여넣거나, 통계 표 위에 커서를 올리고 파이 차트로 만들어달라고 하거나, 요리법을 강조 표시하고 모든 재료의 양을 두 배로 늘려달라고 요청할 수 있습니다.

보여주고 말하기 (Show and tell) 현재의 AI 모델은 정확한 지시를 요구합니다. 좋은 응답을 얻으려면 사용자가 상세한 프롬프트를 작성해야 합니다. AI 활성화 포인터는 포인터 주변의 시각적, 의미적 맥락을 부드럽게 캡처하여 컴퓨터가 사용자에게 중요한 것을 '보고' 이해할 수 있게 함으로써 이 과정을 간소화할 것입니다. 우리의 실험적 시스템에서는 단지 가리키기만 하면 AI가 사용자가 단어, 단락, 이미지의 일부 또는 코드 블록 중 정확히 어느 부분에 대한 도움이 필요한지 알 수 있습니다.

'이것(This)'과 '저것(That)'의 힘 받아들이기 일상적인 대화에서 인간은 길고 상세한 문단으로 말하는 경우가 거의 없습니다. 우리는 이해의 빈틈을 채우기 위해 물리적 제스처와 공유된 맥락에 의존하면서 "이것 좀 고쳐줘", "저것을 여기로 옮겨줘", 또는 "이거 무슨 뜻이야?"라고 말할지 모릅니다. 이러한 맥락, 가리키기, 말하기의 조합을 이해하는 AI 시스템은 사용자가 복잡한 까다로운 프롬프트 작성 없이도 자연스러운 암어로 복잡한 요청을 할 수 있게 해줍니다.

픽셀을 실행 가능한 객체로 전환하기 수십 년 동안 컴퓨터는 우리가 가리키는 '위치'만 추적했습니다. 이제 AI는 사용자가 '무엇을' 가리키고 있는지도 이해할 수 있습니다. 이를 통해 픽셀이 장소, 날짜, 객체와 같은 사용자가 즉시 상호작용할 수 있는 구조화된 객체로 변환됩니다. 휘갈겨 쓴 메모의 사진은 대화형 할 일 목록이 되고, 여행 동영상의 일시 정지된 프레임은 멋져 보이는 식당의 예약 링크가 됩니다.

인간의 행동에 기술을 맞추고, 사용자가 기술에 맞추도록 강요하지 않는 것은 AI와의 협업이 진정으로 직관적이고 유연하며 매끄럽게 느껴지는 미래를 가능하게 합니다. 우리는 이러한 인간 중심의 개념이 우리가 매일 사용하는 제품에 녹아들고 있다는 사실에 흥분하고 있습니다.

제품에 이 연구 적용하기 우리는 이제 이러한 원칙을 통합하여 크롬(Chrome)과 새로운 구글북(Googlebook) 노트북 경험에서 포인팅을 재구상하고 있습니다. 오늘부터 복잡한 프롬프트를 작성하는 대신, 포인터를 사용하여 크롬의 제미나이에게 관심 있는 웹 페이지의 부분에 대해 질문할 수 있습니다. 예를 들어, 페이지에서 몇 가지 제품을 선택하여 비교를 요청하거나, 가리키거나...

원문 보기

원문 보기 (영어)

May 12, 2026 Research Reimagining the mouse pointer for the AI era Adrien Baranes and Rob Marchant Share We are developing more seamless, intuitive ways to collaborate with AI The mouse pointer has been a constant companion on computer screens, across every website, document and workflow. Despite how technologies have changed, the pointer has barely evolved in more than half a century. We’ve been exploring new AI-powered capabilities to help the pointer not only understand what it’s pointing at, but also why it matters to the user. Our goal is to address a common frustration: because a typical AI tool lives in its own window, users need to drag their world into it. We want the opposite: intuitive AI that meets users across all the tools they use, without interrupting their flow. For example, imagine pointing to an image of a building, and requesting “Show me directions”. Nothing more is needed when the AI system already understands the context. Today, we’re outlining the underlying principles guiding our thinking on future user interfaces, and sharing experimental demos of an AI-enabled pointer, powered by Gemini. For example, you could visit Google AI Studio to edit an image or find places on the map , just by pointing and speaking. Our interaction principles We’ve developed four principles that together shift the hard work of conveying context and intent from the user to the computer, replacing text-heavy prompts with simpler, more intuitive interactions. Here are illustrations of our approach and principles. Maintain the flow AI capabilities should work across all apps, not force users into “AI detours” between them. Our prototype AI-enabled pointer is available wherever the user is working. For example, they could point at a PDF and request a bullet-point summary to paste directly into an email, hover over a table of statistics and request a pie chart version, or highlight a recipe and ask for all the ingredients doubled. Show and tell Current AI models demand precise instructions. To get a good response, a user has to write a detailed prompt. An AI-enabled pointer would streamline this process by smoothly capturing the visual and semantic context around the pointer, letting the computer “see” and understand what’s important to the user. In our experimental system, just point, and the AI knows exactly which word, paragraph, part of an image, or code block the user needs help with. Embrace the power of "This" and "That" In everyday interactions with each other, humans rarely speak in long, detailed paragraphs. We might say, "Fix this", "Move that here", or “What does this mean?” — while relying on physical gestures and our shared context to fill in any gaps in understanding. An AI system that understands this combination of context, pointing and speech would allow users to make complex requests in natural shorthand, no fiddly prompting required. Turn pixels into actionable entities For decades, computers have only tracked where we are pointing. AI can now also understand what the user is pointing at. This transforms pixels into structured entities, such as places, dates, and objects, that users can interact with instantly. A photo of a scribbled note becomes an interactive to-do list; a paused frame in a travel video becomes a booking link for that cool-looking restaurant. Building technology that adapts to human behavior — rather than forcing users to adapt to it — enables a future where collaborating with AI feels truly intuitive, fluid and seamless. We’re excited that these human-first concepts are being woven into products we use every day. Applying this work in our products We are now integrating these principles to reimagine pointing in Chrome and our new Googlebook laptop experience. Starting today, instead of writing a complex prompt, you can now use your pointer to ask Gemini in Chrome about the part of the webpage you care about. For example, you can select a few products on a page and ask to compare, or point to where you want to visualize a new couch in your living room. Similarly, we'll soon roll out Magic Pointer in Googlebook, allowing users to harness Gemini at their fingertips for a more intuitive experience. Because there are so many other potentially great applications, we'll continue to test future concepts across our platforms, including Google Labs’ Disco . Try the AI-enabled pointer in Google AI Studio Edit an image Find places on the map

사용자 인터페이스(UI) 제미나이(Gemini) 인간-컴퓨터 상호작용 마우스 포인터