TechCrunch AI • 73일 전

AI가 대신 논문 작성? 아크라이브, 1년 제명 조치

IMP

7/10

핵심 요약

글로벌 프리프린트 저장소인 아크라이브(arXiv)가 연구자가 대규모 언어 모델(LLM)을 활용해 생성한 결과를 검증하지 않고 제출할 경우, 최대 1년간 사이트 이용을 금지하는 강력한 제재를 발표했습니다. 이는 AI 생성 콘텐츠로 인한 학술 연구의 신뢰도 하락과 조작된 인용 문헌 등의 문제를 방지하기 위한 조치로, AI 사용을 완전히 금지하는 것이 아니라 연구자의 최종 검증과 책임을 강제하는 것이 핵심입니다.

번역된 본문

연구 논문 프리프린트를 위한 널리 사용되는 오픈 저장소인 아크라이브(ArXiv)가 과학 논문에서 대규모 언어 모델(LLM)의 무분별한 사용을 단속하기 위한 추가적인 조치를 취하고 있습니다.

논문들은 동료 평가(peer-reviewed)를 거치기 전에 해당 사이트에 게시되지만, '아카이브'라고 발음되는 아크라이브는 컴퓨터 과학 및 수학과 같은 분야에서 연구가 유통되는 주요 방법 중 하나가 되었으며, 사이트 자체가 과학 연구 동향에 대한 데이터 출처로 자리 잡았습니다. 아크라이브는 이미 조작된 허위 참고문헌과 같은 저품질 AI 생성 논문의 증가에 대처하기 위한 조치를 취한 바 있습니다. 예를 들어, 처음으로 논문을 올리는 사람은 이미 검증된 기존 저자의 보증(endorsement)을 받도록 요구하고 있습니다. 또한 코넬 대학교(Cornell)에서 20년 넘게 호스팅된 후, 이 조직은 독립적인 비영리 단체로 전환되고 있으며, 이를 통해 AI가 생성한 쓰레기 데이터(AI slop)와 같은 문제를 해결하기 위해 더 많은 자금을 모금할 수 있을 것으로 기대되고 있습니다.

최근의 조치로서, 아크라이브의 컴퓨터 과학 부문 의장인 토마스 디터리히(Thomas Dietterich)는 목요일에 다음과 같이 게시했습니다. "제출된 논문에 저자가 LLM 생성 결과를 검증하지 않았다는 반박할 수 없는 증거가 포함되어 있다면, 이는 논문의 모든 내용을 신뢰할 수 없다는 것을 의미합니다." 그러한 반박할 수 없는 증거에는 '환각(hallucinated)된 참고문헌'이나 LLM과 주고받은 프롬프트 및 지시문 등이 포함될 수 있다고 디터리히는 덧붙였습니다.

이러한 증거가 발견되면, 해당 논문의 저자들은 '아크라이브에서 1년간 활동 정지 및 이후 아크라이브 제출 시 반드시 평판이 좋은 동료 평가(peer-reviewed) 학술지를 통해 먼저 승인받아야 하는 요건'에 직면하게 됩니다. 이것이 LLM 사용에 대한 전면적인 금지는 아니라는 점에 유의하십시오. 오히려 디터리히가 표현한 대로 '콘텐츠가 어떻게 생성되었든 상관없이' 저자가 콘텐츠에 대해 '전적인 책임'을 져야 한다는 것을 고집하는 것입니다. 따라서 연구자가 LLM에서 생성된 '부적절한 언어, 표절된 콘텐츠, 편향된 콘텐츠, 오류, 실수, 잘못된 참고문헌 또는 오해의 소지가 있는 콘텐츠'를 복사하여 붙여넣은 경우 해당 저자가 그에 대한 책임을 져야 합니다.

디터리히는 404 미디어(404 Media)와의 인터뷰에서 이것이 '원스트라이크(one-strike)' 규칙이 될 것이지만, 처벌을 가하기 전에 중재자가 문제를 표시하고 부문 의장이 증거를 확인해야 하며, 저자 역시 해당 결정에 항소할 수 있다고 밝혔습니다. 최근의 동료 평가 연구에 따르면 생명의학 연구에서 LLM의 영향으로 인해 조작된 인용이 증가하고 있는 것으로 나타났습니다. 다만 공정하게 말하자면, AI가 만들어낸 허위 인용문을 사용하다 적발하는 것은 과학자들만의 문제는 아닙니다.

원문 보기

원문 보기 (영어)

ArXiv , a widely used open repository for preprint research, is doing more to crack down on the careless use of large language models in scientific papers. Although papers are posted to the site before they are peer-reviewed, arXiv (pronounced “archive”) has become one of the main ways that research circulates in fields like computer science and math, and the site itself has become a source of data on trends in scientific research . ArXiv has already taken steps to combat a growing number of low-quality, AI-generated papers, for example by requiring first-time posters to get an endorsement from an established author . And after being hosted by Cornell for more than 20 years, the organization is becoming an independent nonprofit, which should allow it to raise more money to address issues like AI slop . In its latest move, Thomas Dietterich — the chair of arXiv’s computer science section — posted Thursday that “if a submission contains incontrovertible evidence that the authors did not check the results of LLM generation, this means we can’t trust anything in the paper.” That incontrovertible evidence could include things like “hallucinated references” and comments to or from the LLM, Dietterich said. If such evidence is found, a paper’s authors will face “a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted by a reputable peer-reviewed venue.” Note that this isn’t an outright prohibition on using LLMs, but rather an insistence that, as Dietterich put it, authors take “full responsibility” for the content, “irrespective of how the contents are generated.” So if researchers copy-paste “inappropriate language, plagiarized content, biased content, errors, mistakes, incorrect references, or misleading content” directly from an LLM, then they’re still responsible for it. Dietterich told 404 Media that this will be a “one-strike” rule, but moderators must flag the issue and section chairs must confirm the evidence before imposing the penalty. Authors will also be able to appeal the decision. Recent peer-reviewed research has found that fabricated citations are on the rise in biomedical research, likely due to LLMs — though to be fair, scientists aren’t the only ones getting caught using citations that were made up by AI . Topics AI , arxiv When you purchase through links in our articles, we may earn a small commission . This doesn’t affect our editorial independence. Anthony Ha Anthony Ha is TechCrunch's weekend editor. Previously, he worked as a tech reporter at Adweek, a senior editor at VentureBeat, a local government reporter at the Hollister Free Lance, and vice president of content at a VC firm. He lives in New York City. You can contact or verify outreach from Anthony by emailing anthony.ha@techcrunch.com . View Bio May 27 Athens, Greece StrictlyVC Athens is up next. Hear unfiltered insights straight from Europe’s tech leaders and connect with the people shaping what’s ahead. Lock in your spot before it’s gone. REGISTER NOW Most Popular OpenAI launches ChatGPT for personal finance, will let you connect bank accounts Ivan Mehta US orders travelers on Air Force One to throw away gifts, pins, and burner phones after China trip Lorenzo Franceschi-Bicchierai OpenAI is reportedly preparing legal action against Apple; it wouldn't be the first partner to feel burned Connie Loizos How to turn off Instagram's new Instants feature and retract photos you accidentally shared Aisha Malik Musk’s xAI is running nearly 50 gas turbines unchecked at its Mississippi data center Tim De Chant AI voice startup Vapi hits $500M valuation after winning Amazon Ring over 40 rivals Jagmeet Singh Amazon launches 30-minute delivery across the US Sarah Perez

아크라이브 AI 규제 학술 연구 대규모 언어 모델