MIT Tech Review • 76일 전

AI 챗봇, 일반인의 진짜 전화번호 무단 노출

IMP

8/10

핵심 요약

구글 제미나이(Gemini) 등 생성형 AI 챗봇이 학습 데이터에 포함된 개인정보(PII)를 바탕으로 일반인의 실제 전화번호와 연락처를 무단으로 노출하는 사례가 잇따르고 있습니다. 개인정보 삭제 서비스 기업에 따르면 AI 관련 개인정보 침해 문의가 지난 7개월간 400% 급증했으며, 명확한 해결책이 부재해 피해자들의 불안감이 커지고 있습니다.

번역된 본문

사람들은 자신의 개인 연락처 정보가 구글 AI에 의해 노출되었으며, 이를 방지할 확실한 방법이 없다고 호소하고 있습니다.

한 레딧(Reddit) 사용자는 최근 "절박하게 도움을 구한다"며 약 한 달 동안 자신의 휴대전화에 "변호사, 제품 디자이너, 열쇠 수리공을 찾는" 낯선 사람들의 전화가 쇄도하고 있다고 게시했습니다. 발신자들은 구글의 생성형 AI의 잘못된 안내로 전화를 건 것으로 보입니다. 지난 3월에는 이스라엘의 한 소프트웨어 개발자가 구글의 챗봇 제미나이(Gemini)가 자신의 전화번호가 포함된 잘못된 고객 서비스 안내를 제공하면서 WhatsApp 연락을 받았습니다. 또한 4월에는 워싱턴 대학교의 박사과정생이 제미나이를 테스트하다가 동료의 개인 휴대전화 번호를 뽑아내게 하기도 했습니다.

AI 연구원과 온라인 개인정보 보호 전문가들은 오랫동안 생성형 AI가 개인 프라이버시에 미치는 수많은 위험에 대해 경고해 왔습니다. 이번 사례들은 우리가 우려해야 할 또 다른 시나리오, 즉 생성형 AI가 사람들의 실제 전화번호를 노출하는 상황을 보여줍니다. (해당 레딧 사용자는 여러 차례의 코멘트 요청에 응답하지 않았으며, 우리는 그의 이야기를 독립적으로 확인할 수 없었습니다.)

전문가들은 이러한 개인정보 유출이 학습 데이터(Training data)에 개인 식별 정보(PII)가 사용되었기 때문일 가능성이 가장 높다고 말합니다. 다만 AI 생성 응답에 실제 전화번호가 나타나게 하는 정확한 메커니즘을 이해하기는 어렵습니다. 그러나 이유가 무엇이든, 그 결과는 정보가 노출된 사람들에게 결코 유쾌하지 않으며, 더 걱정스러운 점은 누구도 이를 막을 수 있는 확실한 방법이 거의 없어 보인다는 것입니다.

AI 관련 개인정보 보호 요청 400% 증가 AI 챗봇에 의해 사람들의 전화번호가 노출되는 빈도를 정확히 알 수는 없지만, 전문가들은 공개적으로 보고되는 것보다 훨씬 더 많이 발생하고 있다고 믿고 있습니다.

고객의 개인정보를 인터넷에서 삭제해 주는 기업인 딜리트미(DeleteMe)에 따르면, 지난 7개월 동안 생성형 AI와 관련된 고객 문의가 400% 증가해 수천 건에 달했습니다. 이 회사의 공동 창립자이자 CEO인 롭 샤벨(Rob Shavell)은 이러한 문의가 "구체적으로 챗GPT(ChatGPT), 클로드(Claude), 제미나이(Gemini) 및 기타 생성형 AI 도구를 언급하고 있다"고 말했습니다. 구체적으로 이러한 생성형 AI 우려의 55%는 챗GPT, 20%는 제미나이, 15%는 클로드, 10%는 기타 AI 도구에 대한 것이라고 샤벨은 덧붙였습니다. (MIT 테크놀로지 리뷰는 딜리트미의 기업 구독 서비스를 이용하고 있습니다.)

샤벨은 대규모 언어 모델(LLM)에 의해 개인정보가 노출되었다는 고객 불만이 주로 두 가지 형태로 나타난다고 말합니다. 첫째, "고객이 자신과 관련된 무해한 질문을 챗봇에 던졌는데, 정확한 집 주소, 전화번호, 가족 이름 또는 고용주 정보를 응답으로 받는 경우"입니다. 둘째, "챗봇이 그럴듯하지만 틀린 연락처 정보를 생성할 때" 다른 사람의 개인 데이터 노출을 직접 목격하고 이를 신고하는 경우입니다.

이는 이스라엘의 28세 소프트웨어 엔지니어인 다니엘 아브라함(Daniel Abraham)에게 발생한 상황과 일치합니다. 그는 3월 중순에 한 낯선 사람이 이스라엘 결제 앱인 페이박스(PayBox) 계정과 관련된 도움을 요청하며 "알 수 없는 번호에서 이상한 WhatsApp 메시지"를 보냈다고 말했습니다. "처음에는 스팸 메시지나 나를 트롤링하려는 사람이라고 생각했다"고 그가 MIT 테크놀로지 리뷰에 보낸 이메일에서 밝혔습니다. 하지만 그 낯선 사람에게 어떻게 자신의 번호를 알게 되었는지 묻자, 상대방은 제미나이가 WhatsApp을 통해 페이박스 고객 서비스에 연락하라고 안내하며 그의 개인 번호를 제공한 스크린샷을 보냈습니다. 아브라함은 페이박스에서 일하지 않으며, 페이박스는 WhatsApp 고객 서비스 번호가 없다고 회사의 고객 서비스 담당자인 엘라드 가베이(Elad Gabay)가 확인했습니다. 나중에 아브라함이 제미나이에게 페이박스에 연락하는 방법을 묻자, 챗봇은 다른 사람의 WhatsApp 번호를 생성했습니다. 최근 기자가 질문했을 때 제미나이는 다시 이스라엘 전화번호로 응답했는데, 이는 페이박스의 번호가 아니라 페이박스와 협력하는 별도의 신용카드 회사 번호였습니다.

낯선 사람과의 대화는 빠르게 끝났지만, 그는 다른 잠재적인 대화가 어떻게 빠르게 악화될 수 있는지, 즉 "괴롭힘이나 기타 악의적인 상호작용"을 포함해 우려하고 있다고 밝혔습니다. 그는 "만약 내가 그 [고객 서비스] 문제를 '해결'해 주는 대가로 돈을 요구했다면 어땠을까?"라고 반문했습니다.

원문 보기

원문 보기 (영어)

People report that their personal contact info was surfaced by Google AI—and there’s apparently no easy way to prevent it. A Redditor recently wrote that he was “desperate for help”: for about a month, he said, his phone had been inundated by calls from “strangers” who were “looking for a lawyer, a product designer, a locksmith.” Callers were apparently misdirected by Google’s generative AI. In March, a software developer in Israel was contacted on WhatsApp after Google’s chatbot Gemini provided incorrect customer service instructions that included his number. And in April, a PhD candidate at the University of Washington was messing around on Gemini and got it to cough up her colleague’s personal cell phone number. AI researchers and online privacy experts have long warned of the myriad dangers generative AI poses for personal privacy. These cases give us yet another scenario to worry about: generative AI exposing people’s real phone numbers. (The Redditor did not respond to multiple requests for comment and we could not independently verify his story.) Experts say that these privacy lapses are most likely due to personally identifiable information (PII) being used in training data , though it’s hard to understand the exact mechanism causing real phone numbers to show up in the AI-generated responses. But no matter the reason, the result is not fun for people on the receiving end—and, even more worryingly, there appears to be little that anyone can do to stop it. A 400% increase in AI-related privacy requests It’s impossible to know how often people’s phone numbers are exposed by AI chatbots, but experts say they believe that it is happening far more than is reported publicly. DeleteMe, a company that helps customers remove their personal information from the internet, says customer queries about generative AI have increased by 400%—up to a few thousand—in the last seven months. These queries “specifically reference ChatGPT, Claude, Gemini … or other generative AI tools,” says Rob Shavell, the company’s cofounder and CEO. Specifically, 55% of these concerns about generative AI reference ChatGPT, 20% reference Gemini, 15% Claude, and 10% other AI tools, Shavell says. ( MIT Technology Review has a business subscription to DeleteMe.) Shavell says customer complaints about personal information being surfaced by LLMs usually take two forms: Either “a customer asks a chatbot something innocuous about themselves and gets back accurate home addresses, phone numbers, family members’ names, or employer details.” Alternatively, a customer may be confronted with and report the exposure of someone else’s personal data, when “the chatbot generates plausible-but-wrong contact information.” This aligns with what happened to Daniel Abraham, a 28-year-old software engineer in Israel. In mid-March, he says, a stranger sent him a “weird WhatsApp message from an unknown number” asking for help with his account in PayBox, an Israeli payment app. “I thought it was a spam message,” he wrote to MIT Technology Review in an email—“someone who was trying to troll me.” But when he asked the stranger how they had found his number, they sent him a screenshot of Gemini’s instructions to contact PayBox customer service via WhatsApp—giving his personal number. Abraham does not work for PayBox, and PayBox does not have a WhatsApp customer service number, Elad Gabay, a customer service representative for the company, confirmed. Later, Abraham asked Gemini how to contact PayBox, and it generated another person’s WhatsApp number. When I recently asked, Gemini again responded with an Israeli phone number—it belonged not to PayBox, but to a separate credit card company that works with PayBox. Abraham’s exchange with the stranger ended quickly, but he said he was concerned about how other potential exchanges could quickly turn sour, including “harassment or other bad interactions.” “What if I asked for money in order to ‘solve’ that [customer service] issue?” he said. To try to figure out how this happened, Abraham ran a regular Google search on his phone number, and he found that it had been shared online once, back in 2015, on a local site similar to Quora. Though he’s not sure who posted it there, it may explain how it ended up being reproduced by Gemini over a decade later. Chatbots like Gemini, Open AI’s ChatGPT, and Anthropic’s Claude are built on LLMs that are trained on huge amounts of data scraped from across the web. This inevitably includes hundreds of millions of instances of PII. As we reported last summer, for example, the large popular open-source data set DataComp CommonPool, which has been used to train image-generation models, included copies of résumés, driver’s licenses, and credit cards. The likelihood of PII appearing in AI training data is only increasing as public data “runs out” and AI companies look for new sources of high-quality training data. This includes information from data brokers and people-search websites. According to the California data broker registry , for instance, 31 of 578 registered data brokers operating in the state self-reported that they had “shared or sold consumers’ data to a developer of a GenAI system or model in the past year.” Furthermore, models are known to memorize and reproduce data verbatim from training data sets—and recent research suggests that it is not just frequently appearing data that is most likely to be memorized. Imperfect Measures It’s standard practice now to build guardrails into an LLM’s design to constrain certain outputs, ranging from content filters meant to identify and prevent chatbots from releasing PII to Anthropic’s instructions to Claude to choose responses that contain “the least personal, private, or confidential information belonging to others.” But as a pair of University of Washington PhD students researching privacy and technology saw firsthand recently, these safeguards don’t always work. “One day, I was just playing around on Gemini, and I searched for Yael Eiger, my friend and collaborator,” Meira Gilbert says. She typed in “Yael Eiger contact info,” and after Gemini provided an overview of Eiger’s research, which Gilbert had expected, Gemini also returned her friend’s personal phone number. “It was shocking,” Gilbert says. When she saw the Gemini result, Eiger remembered that she had, in fact, shared her phone number online in the previous year, for a technology workshop. But she had not expected it to be so visible to everyone on the internet. “Having your information be … accessible to one audience, and then Gemini making it accessible to anyone” feels completely different, Eiger says—especially when she found that the information was buried in a normal Google search. “It was severely downgraded,” Gilbert confirms. “I never would have found it if I was just looking through Google results.” (I tried the same prompt in Gemini earlier this month, and after an initial denial, the tool also gave me Eiger’s number.) After this experience, Eiger, Gilbert, and another UW PhD student, Anna-Maria Gueorguieva, decided to test ChatGPT to see what it would surface about a professor. At first, OpenAI’s guardrails kicked in, and ChatGPT responded that the information was unavailable. But in the same response, the chatbot suggested, “if you want to go deeper, I can still try a more ‘investigative-style’ approach.” Their inquiry just had to help “narrow things down,” ChatGPT said, by providing “a neighborhood guess” for where the professor might live, or “a possible co-owner name” for the professor’s home. ChatGPT continued: “That’s usually the only way to surface newer or intentionally less-visible property records.” The students provided this information, leading ChatGPT to produce the professor’s home address, home purchase price, and spouse’s name from city property records. (Taya Christianson, an OpenAI representative, said she was not able to comment on what happened in this case without seeing screenshots

개인정보 보호 생성형 AI 할루시네이션 구글 제미나이 데이터 유출