Wired AI • 68일 전

구글 제미나이로 복제한 내 AI 아바타, 소름 돋도록 똑같다

IMP

7/10

핵심 요약

구글 제미나이(Gemini) 앱의 신규 '아바타(Avatar)' 기능을 통해 사용자의 외모와 목소리를 닮은 AI 딥페이크 영상을 생성해 보았습니다. 단 5분 만에 얼굴 스캔을 통해 디지털 복제인간을 만들어냈으며, 실제와 소름 돋도록 유사한 퀄리티에 동시에 놀라움과 거부감을 느꼈습니다. 이 기능은 오픈AI 소라(Sora)와 유사하지만 본인의 아바타로만 영상을 제작할 수 있다는 강력한 안전장치를 두고 있다는 점이 특징입니다.

번역된 본문

샌프란시스코의 돌로레스 파크(Dolores Park)는 아름답고 화창한 오후입니다. 그리고 저는 선사시대 공룡에게 생일 축하 노래를 부르고 있습니다. 세레나데를 마치자마자 텅 빈 제 손에 핑크색 초가 꽂힌 컵케이크가 마법처럼 나타납니다. 불을 끄자, CGI 같은 그 생명체의 얼굴에 평온하고 만족스러운 표정이 스쳐 지나갑니다.

이 AI 영상 속 남자는 저와 똑같이 생기고 똑같은 목소리를 내지만, 이 클립은 실제로 구글 제미나이(Gemini) 앱의 새로운 기능 중 하나인 '아바타(Avatar)'를 사용하여 생성된 것입니다. 이러한 디지털 재현물은 현재 폐기된 오픈AI의 소라(Sora) 앱의 핵심 기능과 유사합니다. 즉, 사용자의 디지털 복제물을 만들어 AI 영상 속에 삽입할 수 있는 기술입니다. 아바타 기능은 구글의 새로운 '옴니(Omni)' 비디오 모델을 기반으로 구동되며, 유료 구독자에게만 제공됩니다.

저는 구글 AI Pro 요금제에 매달 20달러를 지불하고 있는데, 제미나이 사용 한도(5시간마다 초기화됨)를 금방 초과해버렸습니다. 몇 가지 질문을 던지고 제 아바타가 등장하는 10초짜리 클립 두 개를 생성하자마자, 나중에 다시 하라는 메시지를 받았습니다. 옴니 모델이 제 모습을 가지고 무엇을 해냈는지 처음 본 두 영상은 샌프란시스코에서 공룡에게 노래를 부르는 모습과 골든게이트 브릿지 아래에서 서핑을 즐기는 모습이었습니다. 저는 감탄과 동시에 소름이 돋았습니다. 영상 내용은 어색함이 많았고, 뒤죽박죽인 순간들과 말이 안 되는 복장도 등장했지만, 영상 속 그 남자는 분명 저였습니다. 저는 손가락으로 얼굴을 확대하고 입이 움직이는 모습을 자세히 살펴보았습니다. 치아가 약간 어색했지만, 그 외에는 턱살까지 완벽하게 저(리스)였습니다.

이전에 사용자가 타인의 초상을 이용해 AI 영상을 만들 수 있도록 허용했던 오픈AI와 달리, 구글은 성인 사용자 본인의 아바타로만 영상을 제작할 수 있도록 제한했습니다. 제미나이 앱을 통해 아바타를 설정하는 데는 약 5분밖에 걸리지 않았습니다. 그 과정은 조명이 밝은 방에 앉아 휴대폰 카메라를 얼굴에 향하게 한 뒤 두 자리 숫자를 읽는 것이었습니다. 그런 다음 천천히 오른쪽을 바라보고 고개를 왼쪽으로 돌리는 것으로 모든 설정이 끝났습니다. 제 디지털 복제인간인 '리스 2.0'이 탄생했고, 딥페이크의 주인공이 될 준비를 마쳤습니다. (이 과정에서 어떤 옷을 입고 있는지 주의해야 합니다. 해당 옷차림이 AI 생성 결과물에 그대로 나타날 가능성이 높기 때문입니다. 이에 대해서는 나중에 자세히 다루겠습니다.)

이 고통스러운 감정을 제대로 풀어보기 위해, 방금 전 생일 축하 클립을 프레임 단위로 분석해 봅시다.

전체 프롬프트: 돌로레스 파크 언덕 꼭대기에서 늙은 공룡에게 생일 축하 노래를 부르는 내 모습의 비디오를 생성해줘.

첫 1초는 밀레니얼 특유의 멈칫함(Millennial pause)으로 시작됩니다. AI로 만들어진 저조차 몸에 밴 습관이 있는 모양입니다. 처음에 가장 인상적인 것은 사진처럼 실사 같은 배경입니다. 구글의 AI 영상은 제 아바타를 이름 모를 공원의 거대한 언덕에 덩그러니 놓는 대신, 실제 장소와 놀라울 정도로 유사한 배경을 구현했습니다. 야자수가 늘어선 보도부터 멀리 우뚝 솟은 세일즈포스 타워까지, 완벽하지는 않더라도 어떤 공원인지 즉시 알아볼 수 있었습니다. 지구 전체를 지도로 만든 회사가 이런 것을 해내는 것은 충분히 납득이 갑니다.

AI로 만들어진 제가 노래를 부르기 시작했는데, 제가 실제로 낼 수 있는 피치보다 약간 낮고 안정적인 바리톤으로 처음 몇 마디는 자연스러웠습니다. 마치 미니 지휘자처럼 박자에 맞춰 손을 위아래로 흔들었습니다. 그러다 'to'라는 단어에서 말을 더듬었고, 진정한 혼돈이 시작되면서 제미나이는 앵글을 더 넓은 샷으로 전환했습니다. 바닐라 컵케이크가 갑자기 나타났고, 축하 촛불을 끄기 위해 연기 구름을 내쉬었습니다. (솔직히 말해서, AI인 제가 너무 무례하군요. 당신의 특별한 날이 아니잖아요.)

아바타 기능을 사용해 생성한 다른 AI 클립 역시 혼돈스러운 순서들과 카메라를 향해 말하는 실사 같은 제 모습이 혼합되어 있었습니다.

전체 프롬프트: 골든게이트 브릿지 아래에서 서핑하는 내 모습의 비디오를 생성해줘.

웨트수트 대신, 저는 머리부터 발끝까지 온통 데님 차림이었습니다. 그래도 서프보드 위에서 신발은 신지 않았더군요. 이 AI 생성 영상에는 서프보드에 부착된 고프로(GoPro)로 촬영한 것처럼 보이는 샷이 포함되어 있었습니다.

더 많은 사람들이 생성형 AI를 사용함에 따라, 특히 엄격한 안전장치(Guardrails)가 없는 모델을 사용하게 되면, 이러한 도구들은 점점 더 여성 등 취약 계층을 표적으로 삼는 데 악용될 위험이 커지고 있습니다.

원문 보기

원문 보기 (영어)

Comment Loader Save Story Save this story Comment Loader Save Story Save this story It’s a beautiful, balmy afternoon at Dolores Park in San Francisco, and I’m singing a birthday song to a prehistoric dinosaur. A cupcake with a pink candle magically appears in my empty hand as I finish my serenade. When I blow out the flame, a calm look of contentment washes over the CGI-esque creature. While the man in this AI video looks and sounds just like me, the clip was actually generated using one of the new features available in Google’s Gemini app: avatars . These digital recreations are similar to the core features of OpenAI’s now-defunct Sora app . It’s a digital clone of you that can be inserted into AI videos. Avatars are powered by the company’s new Omni video model, and the feature is only available to subscribers. I pay $20 a month for Google’s AI Pro plan and quickly maxed out Gemini’s usage limits, which reset every 5 hours. I simply asked a few questions and generated two 10-second clips featuring my avatar, before I was told to wait until later. My first two glimpses of what Omni can do with my likeness were of me singing to a dino in San Francisco and surfing under the Golden Gate Bridge. I was simultaneously impressed and freaked out. The content was cringeworthy, with some jumbled moments and nonsensical outfits, but that man in the video was me. I used my fingers to zoom in on its face and really watch the mouth move. The teeth were a bit off, but otherwise that’s Reece, right on down to the chin fat. Unlike OpenAI , which previously let users decide whether they wanted others to generate AI videos using their likeness, Google only lets adult users make videos with their own avatar. It took me about five minutes to set up my avatar through the Gemini app. The process involved sitting in a well-lit room with my phone’s camera pointed at my face and reading a string of two-digit numbers. Then I slowly looked to the right and swivelled my head to the left, and it was all over. Reece 2.0 was born and ready to be my deepfake star. (Be mindful of what you’re wearing during this process, since your fit will likely show up in the AI generations, but more on that later.) Let’s break down the birthday clip frame by frame to really unpack my feelings here. Full prompt: Generate a video of me singing the happy birthday song to an aging dinosaur at the top of the hill at Dolores Park . The first second starts with a millennial pause , because even AI Reece has some ingrained habits. What’s most striking initially is the photorealistic setting. Rather than placing my avatar on some oversized hill at a random park, the background of Google’s AI video is remarkably similar to the actual location. From the palm tree-lined sidewalks to the looming Salesforce in the distance, it’s immediately evident which park is depicted here, even though the output isn’t perfect. It makes sense that a company known for mapping the planet could pull this off. As AI me started to sing, with a less pitchy baritone than I can actually pull off, the first few bars seemed natural. I bounced my hands up and down on the beat, like a mini conductor. Then, I stutter on the word “to,” and Gemini cuts to a wider-angle shot as the real chaos begins. A vanilla cupcake appears randomly, and I exhale a cloud of smoke to blow out the celebration candle. (Honestly, how rude of AI Reece. It’s not your special day.) The other AI clip I generated using the avatar feature also blended chaotic moments with lifelike shots of me talking to the camera. Full prompt: Generate a video of me surfing beneath the Golden Gate Bridge . Instead of putting me in a wetsuit, I was wearing head-to-toe denim. No shoes on the surfboard, at least, I guess. This AI generation included shots that looked as if they were captured on a GoPro attached to the surfboard. As more people use generative AI, especially models without strict guardrails, these tools are being used increasingly to target women with nonconsensual deepfakes . Google claims it has safety at the forefront as it rolls out this new feature. “We try to prevent harm,” says Nicole Brichtova, who leads the product team working on Omni at Google DeepMind. “And, we try to do it in a way where we’re not blocking benign things.” Despite the stuttering and other errors in the clips of AI Reece, these hyperrealized versions of myself felt more real than when I listen back to a voicemail or rewatch a clip of a fun weekend out. The avatar didn’t necessarily look like a hotter version of myself, no, it was something eerier. My digital clone was seamless Reece. Always ready to be anywhere, to do anything, to be me.

제미나이 구글 AI 아바타 딥페이크 비디오 생성