Wired AI • 69일 전

오픈클로 에이전트에 로봇 팔을 달아주다

IMP

8/10

핵심 요약

AI 에이전트(OpenClaw)에 오픈소스 로봇 팔(LeRobot)을 연결해 물체 인식 및 파지, 모델 학습까지 수행하는 실험 결과입니다. 기존에는 로봇 제어·학습이 고도의 전문성을 요구했으나, 최신 코딩 에이전트가 자동 설정·캘리브레이션·스크립트 작성을 처리해 진입 장벽을 크게 낮춥니다. 연구진은 ‘코드를 정책(Code as Policy)으로’ 방식이 로봇 공학의 범용성과 신뢰성을 동시에 끌어올릴 차세대 패러다임이라고 평가합니다.

번역된 본문

최근 내 OpenClaw 에이전트에 실제 로봇 팔을 장착해 실험해봤다. 결과는 내 신경망을 날려버릴 정도로 놀라웠다. AI 에이전트는 로봇 팔을 설정하고, 카메라로 물체를 인식해 천천히 집어올렸으며, 심지어 특정 물체를 집어 옮기는 다른 AI 모델까지 학습시켰다. 사람들은 여전히 AGI(범용 인공지능)가 몇 년은 더 걸릴 거라고 말하지만! (농담이다, 아마 그럴 것이다.) 이 결과는 우리가 로봇 공학의 돌파구 바로 앞에 서 있을지도 모른다는 확신을 줬다. 과거에는 로봇을 학습시키고 제어하는 데 상당한 기술이 필요했다. 오늘날의 AI 모델은 이를 거의 쉽게 만들 수 있다.

“AI 기반 코딩은 매우 흥미롭습니다. 신뢰할 수 있지만 범용성이 떨어지는 기존 엔지니어링 방식과, 범용성은 뛰어나지만 아직 신뢰성이 부족한 최신 비전-언어-행동(Vision-Language-Action) 모델 사이의 간극을 메울 수 있기 때문입니다.”라고 이 접근법을 연구 중인 UC 버클리의 로봇 공학자 켄 골드버그(Ken Goldberg)는 말했다.

나는 LeRobot 101이라는 조립 완제품 로봇 팔을 구매했다. 이것은 HuggingFace의 오픈소스 프로젝트의 일환으로, 로봇 공학을 시작하고 실험하는 데 상대적으로 저렴한 비용으로 진입할 수 있게 해준다. LeRobot은 두 개의 팔로 구성된다. 사람이 핸들과 트리거를 조작하는 제어용 팔(Controller arm)과, 카메라가 부착되어 그 움직임을 그대로 재현하는 추종용 팔(Follower arm)이다. 제어용 팔을 원격 조작(Teleoperation)하면서, AI 모델이 카메라에 보이는 것을 바탕으로 추종용 팔을 어떻게 움직일지 학습하게 할 수 있다.

OpenClaw로 구축하기 OpenClaw를 사용하기 전, 나는 로봇을 연결하고 캘리브레이션하는 데 몇 시간을 허비했고, 한때는 잘못된 설정을 적용해 모터가 과열되어 거의 망가질 뻔하기도 했다. 그 후 OpenClaw와 Codex의 도움으로 빔 코드(Vibe code) 방식을 사용해 빨간 공을 발견하면 로봇 집게를 닫는 간단한 프로그램을 작성할 수 있었다. 터미널에서 Codex는 로봇과의 연결을 설정하는 까다로운 작업을 처리했다. 그런 다음 내 도움을 받아 관절의 위치를 캘리브레이션했다. 또한 여러 라이브러리를 사용해 해당 공을 인식하고 집어 올리는 Python 스크립트도 작성했다. 물론 빔 코딩이 완벽하진 않으며, 특히 다른 하드웨어를 다룰 때 환각(Hallucination) 현상이 버그를 유발할 수 있지만, 그 결과는 인상적이었다. 꽤 깔끔한 결과였지만, 터미네이터 수준은 아니었다. 다음으로는 OpenClaw가 팔을 제어하는 모델을 학습시키는 것을 도와달라고 요청해봤다. 우리는 몇 가지 다른 접근 방식을 실험했고, OpenClaw는 과정을 안내하고 매 학습 실행 후 모델의 오류율을 확인하는 데 능숙했다.

코드를 정책으로(Code as Policy) AI 기반 코딩이 로봇을 구축하는 강력한 새로운 방법을 제공할 수 있다는 아이디어는 2022년 ‘코드를 정책으로(Code as policy)’라는 이름을 붙인 연구 논문에서 처음 강조되었다. 그 이후 AI의 코딩 능력은 dizzying pace로 발전했으며, 코드를 정책으로 하는 방식은 많은 연구실에서 지지를 얻고 있다. 골드버그의 연구팀은 엔비디아(Nvidia), 카네기멜론 대학(Carnegie Mellon University), 스탠퍼드 대학(Stanford)의 연구자들과 함께 최근 코딩 모델의 로봇 역량을 측정하는 CaP-X라는 새로운 벤치마크를 개발했다. 흥미롭게도 CaP-X는 로봇 프로그래밍에 가장 적합한 모델이 Claude나 ChatGPT가 아니라 Gemini임을 보여준다. 아마도 구글 딥마인드(Google DeepMind)가 모델을 멀티모달(Multimodal)로 학습시키고 물리적 세계를 이해하는 데 집중해왔기 때문일 것이다. 벤치마크와 함께 연구진은 코딩 에이전트가 시뮬레이션된 로봇과 실제 로봇을 모두 제어할 수 있게 하는 환경인 CaP-Gym을 만들었다. 또한 코딩 모델의 성능을 크게 끌어올려 일부 조작 작업에서 로봇의 움직임을 직접 제어하도록 학습된 모델을 능가하게 만든 에이전트 프레임워크인 CaP-Agent0도 개발했다. 골드버그의 팀은 엔비디아와 협력해 코드를 정책으로 하는 접근 방식의 잠재력을 탐구하고 있다. 나는 사람들이 직접 로봇을 빔 코딩해볼 수 있도록 사내 해커톤을 조직하는 데 관여해온 스펜서 황(Spencer Huang, 다름 아닌 젠슨 황의 아들)과 대화를 나눴다. 황은 현재 골드버그와 함께 코드를 정책으로 만드는 연구 프로젝트를 진행 중이며, 이는 앞으로의 가능성을 더욱 확장할 것이다.

원문 보기

원문 보기 (영어)

Comment Loader Save Story Save this story Comment Loader Save Story Save this story I recently gave my OpenClaw a real robot arm to play with. The results just about blew my own neural network. The AI agent was able to configure the arm, use it to see and slowly grab things, and even train another AI model to pick up and place specific objects. And they say AGI is still a few years away! (I’m joking, it probably is). The results have me convinced that we may be on the brink of a robotics breakthrough. Training and controlling robots used to require considerable skill. Today’s AI models can make it almost easy. “AI-powered coding is super exciting because it has the potential to bridge the gap between conventional engineering methods, which are reliable but don't generalize, and contemporary vision-language-action models, which generalize but are not yet reliable,” says Ken Goldberg, a roboticist at UC Berkeley who is exploring the approach. I bought a prebuilt arm called a LeRobot 101 . It’s part of an open-source project from HuggingFace that makes it relatively cheap to start building and experimenting with robotics. The LeRobot comes with two arms: a controller arm that a person operates using a handle and a trigger, and a follower arm with a camera that replicates those movements. You can train an AI model by teleoperating the controller arm and having the model learn how to move the follower in response to what it sees on the camera. Building With OpenClaw Before using OpenClaw, I spent several hours trying to connect and calibrate the robot, at one point nearly breaking the motors by applying the wrong settings, which caused them to overheat. Then, with help from OpenClaw and Codex, I was able to vibe code a simple program that closed the claw’s gripper when it spotted a red ball. In the terminal, Codex went through the tricky work of configuring the connections to the robot. Then, with my help, it calibrated the positions of its joints. It also wrote a Python script that used several libraries to identify and grip the ball in question. Vibe-coding isn't perfect of course, and hallucinations can introduce bugs especially when working with different hardware, but the results were impressive. A neat result, yes, but not exactly Terminator. Next I tried having OpenClaw help me train a model to control the arm. We experimented with a few different approaches, and OpenClaw was adept at guiding me through the process and checking the error rate of the model after each training run. Code as Policy The idea that AI-powered coding could offer a powerful new way to build robots was first highlighted in a research paper from 2022 that dubbed the approach “code as policy.” Since then, AI’s coding skills have advanced at a dizzying pace, and the code-as-policy method has gained traction in many labs. Goldberg’s research group, together with researchers from Nvidia, Carnegie Mellon University, and Stanford, recently developed a new benchmark called CaP-X to measure the robot capabilities of coding models. Interestingly, CaP-X shows that the best model for programming robots isn’t Claude or ChatGPT but Gemini—perhaps because Google DeepMind has focused on training its models to be multimodal and make sense of the physical world. Along with the benchmark, the researchers created CaP-Gym, an environment that lets coding agents control both simulated and real robots. They also developed CaP-Agent0, an agentic framework that boosts the performance of coding models so much that they beat models trained to control a robot’s movements directly on some manipulation tasks. Goldberg’s team is working with Nvidia to explore the potential of the code-as-policy approach. I spoke to Spencer Huang (none other than Jensen Huang’s son), who has been involved in organizing hackathons inside the company to let people try their hand at vibe coding robots. Huang is currently working on a research project with Goldberg that should make the code-as-policy approach compatible with more robot software tools. “Nearly anyone can get into robotics, which is the true holy grail,” Huang tells me. Making it possible for people to control robots with spoken or typed commands, or by demonstrating an action, is the “critical unlock for robots in society,” he adds.

로봇 공학 에이전트 코드 생성 오픈소스 HuggingFace