BL
Wired AI • 32일 전
로봇이 챗GPT 순간을 맞이할 때
IMP 8/10
핵심 요약
MIT 스피너오프인 스타트업 Eka는 인간과 같은 자연스러운 손재주를 갖춘 로봇 팔을 선보였습니다. 이 로봇은 전구를 돌려 끼우거나 열쇠를 집어 드는 등 섬세한 작업을 수행하며, 로봇 분야의 마지막 난제 중 하나인 물리적 '손재주' 문제를 강화학습과 시뮬레이션을 통해 해결하고자 합니다.
원문 보기 (영어)
Comment Loader Save Story Save this story Comment Loader Save Story Save this story A robot’s claw hurtles toward a light bulb on a table. I wince, waiting for the crunch. But suddenly the claw decelerates. It starts gingerly pawing around the table, as if searching for its glasses on the nightstand. It gently positions the bulb between its two pincers. The bulb rolls away. The claw goes chasing it across the table. After a few nips, the bulb is back in its grasp. The robot swiftly screws the bulb into a nearby socket, illuminating its work area. In more than a decade of writing about robots , I have never seen one move so naturally. Most are ham-fisted klutzes, even when remotely controlled by a person. Of the few dozen robot arms on the market today, not one can screw in a light bulb. I have come to visit Eka, a startup located in Kendall Square, Cambridge, Massachusetts, a short walk from MIT and a slightly longer bike ride from my home. The company’s office is a few floors above one of my favorite restaurants, called Shy Bird, a place I often come to work with my own pincers—typing out stories for WIRED. Eka’s office is small, and it’s packed with different robot arms, assorted grippers and hands, and tables covered with odd knicknacks of different shapes, sizes, and textures—gloves, small boxes of earplugs, hairbrushes, key rings, and so on. I try putting a few things beneath the robot. First the earplugs box, then a hairbrush, and finally—in an attempt to trip it up—my own jumble of keys, which have a plush key ring. Each time, the robot swoops down and nips gently at the item a few times before grasping and lifting it up. When I try to take my keys back from Eka’s machine, the robot resists for just a moment, then lets go and instantly turns its attention back to the table, hunting for something else to pick up. Its dedication to picking is impressive. It is also kind of freaky. Watching Eka’s robot in action reminds me of the first time I tried talking to ChatGPT. The robots are so fluid, so natural-seeming, that I can’t help but feel there’s something genuinely intelligent, if not quite human, behind them. In a conference room not far from the robots, Eka’s cofounders, Pulkit Agrawal, a professor at MIT, and Tuomas Haarnoja, an ex-Google DeepMind robotics researcher, lay out their vision for the curious new machine. “A couple of years ago, we realized that dexterity can finally be cracked,” Agrawal says. Eka’s robot demos suggest that the company’s approach should enable real robot dexterity with further training. If that’s true, it could revolutionize how robots are used —not only in factories and warehouses but also in shops, restaurants, even households. “Trillions of dollars flow through the human hand,” Agrawal says. “To me, this is the biggest problem in the world to be solved.” The two men believe they are halfway there. Solving dexterity, they say, is now just a question of scaling up the approach. The fastest humans can solve a Rubik’s Cube in about three seconds. In those same three seconds, a computer with a virtual Rubik’s Cube could solve thousands of variations of the puzzle. As the Austrian computer scientist Hans Moravec famously noted in the late 1980s, the tasks that often seem hardest to us humans are child’s play for a machine; the things a child does without thinking are often a struggle for machines. Moravec suggested that the ability to interact with the physical realm evolved so long ago that for us it’s innate, more so than “higher-level” reasoning. The question has been: Can we impart that embodied intelligence to machines? Back in October 2018, about four years before launching ChatGPT, OpenAI created Dactyl, a robotic hand that later used AI to solve a Rubik’s Cube. The company took an off-the-shelf hand from Shadow Robot and created a detailed simulation of its joints, servos, motors, and more—a virtual hand holding a virtual cube. Using reinforcement learning, which combines experimentation with positive and negative feedback, OpenAI trained an artificial neural network to manipulate the digital cube over and over. After many thousands of repetitions of wiggling its virtual fingers, Dactyl had figured out how to move the facets of the real thing. In a press release, OpenAI suggested that Dactyl had achieved “close to human-level dexterity.” In fact, the robot lacked elements of physical intelligence that we take for granted. If the cube began to slip from its grasp, it couldn’t recover. If its hands weren’t placed at a precise angle, it couldn’t manipulate the cube at all. Even under perfect conditions, the only object it could handle was a Rubik’s Cube. And that Rubik’s Cube wasn’t even a standard one—it had sensors that tracked the movement of the squares to feed back to Dactyl. A few years later, OpenAI gave up on its robotics work to focus on large language models and chatbots. (The company has since restarted work on robotics.) Agrawal, who has remained in touch with a couple members of the Dactyl team, says the project’s simulation approach was considered a dead end because of the so-called sim-to-real gap. But both he and Haarnoja, working at separate labs, remained convinced that they could close that gap by making the sim closer to the real. At Google DeepMind, Haarnoja was on a project that used virtual reinforcement learning to train small humanoid robots to play soccer. (If this sounds more complicated than training a robotic hand to screw in a light bulb, consider that the soccer field doesn’t roll around beneath the players’ feet.) At MIT, Agrawal was researching how to train a robotic hand to grasp objects from above, not just hold them in its palm. Where Dactyl had simply moved its unfeeling pincers until the sensors in the Rubik’s cube showed its squares shifting to the desired state, Agrawal’s system would need to know what its fingers were doing and how the cube was reacting at any given moment—while accounting for the pull of gravity. When he told someone who used to work on Dactyl about the project, he says, “I got a one-hour lecture from them saying, ‘This will never work.’” Agrawal persevered. “Pulkit is a very creative thinker,” says Ken Goldberg, a professor at UC Berkeley who has known Agrawal since his student days and is currently an adviser to his company. “He's always pushing in a direction that other people aren't.” (I first met him in 2017 at a big AI conference in Long Beach, California. Then a graduate student, he had just published a paper outlining a new way for computers to learn to play video games.) By late 2021, Agrawal had created a virtual hand capable of manipulating 2,000 objects upside down. Yet simulation was continuing to lose favor among roboticists, and ChatGPT fever was taking hold. If vast amounts of human-written text could yield a remarkably general linguistic intelligence, then perhaps showing robots enough examples of humans using their hands could give them physical intelligence , too. A handful of well-funded startups are pursuing this vision, training what are called vision-language-action (VLA) models. To build one, you show the model videos of, say, humans folding T-shirts, or humans controlling T-shirt-folding robots. The hope is that with enough data, new robotic skills will emerge. Plenty of video is already available online, but a small industry has now emerged to generate more of this data. Companies pay people to spend hours doing routine tasks with their hands while wearing cameras and motion-capture gloves. Agrawal and Haarnoja, who originally met as graduate students at UC Berkeley, teamed up to pursue a different approach with Eka. Rather than having humans provide training data, the company wants robots to learn how to do things for themselves. They spend thousands of computer hours practicing movements inside simulated worlds and inventing their own solutions. In this sense, Eka’s bot is more like AlphaZero, the Google DeepMind p