404 Media • 74일 전

AI 저열물 논문 제출 시 연구자 1년 제명

IMP

8/10

핵심 요약

아카이브(arXiv)는 생성형 AI가 작성한 저열물(AI slop)이나 검증되지 않은 오류가 포함된 논문을 제출한 연구자를 적발 시 1년간 사이트 접근 및 제출을 금지하는 강력한 제재 규정을 발표했습니다. AI로 인해 조작된 인용과 허위 데이터가 담긴 논문이 기하급수적으로 증가함에 따라, 학술계의 피어 리뷰 및 출판 시스템의 신뢰성을 보호하기 위한 강력한 조치로 평가됩니다.

번역된 본문

아카이브(arXiv), AI로 작성된 저열물(Slop) 제출 시 연구자 1년 제명

학술 논문 프리프린트 오픈 액세스 저장소인 아카이브(arXiv)는 명백하게 AI가 생성한 것으로 보이는 연구물을 제출할 경우 해당 논문의 저자를 1년간 활동 금지하는 조치를 취할 것이라고 밝혔다.

목요일 늦은 밤, 아카이브의 컴퓨터 공학 분과 위원장인 토마스 디터리히(Thomas Dietterich)는 X(옛 트위터)에 다음과 같이 작성했다. "생성형 AI 도구가 부적절한 언어, 표절된 콘텐츠, 편향된 콘텐츠, 오류, 잘못된 참고 문헌 또는 오해의 소지가 있는 콘텐츠를 생성했을 때, 그 결과물이 과학 연구 논문에 포함된다면 그것은 전적으로 저자의 책임입니다. 우리는 최근 이에 대한 처벌 수위를 명확히 했습니다. 제출된 논문에 저자가 대형 언어 모델(LLM)의 결과물을 검증하지 않았다는 반박할 수 없는 명백한 증거가 포함되어 있다면, 이는 해당 논문의 모든 내용을 신뢰할 수 없음을 의미합니다."

그는 반박할 수 없는 명백한 증거의 예로 "환각(Hallucination) 현상으로 인해 조작된 참고 문헌이나 LLM의 메타 주석(예: '200단어 요약입니다; 수정이 필요하신가요?', '이 표의 데이터는 예시용이므로 실험의 실제 숫자로 채워주세요' 등)"이 포함된 경우를 들었다.

디터리히 위원장은 "처벌은 1년간 아카이브 제출 및 활동 금지이며, 이후에는 평판이 좋은 피어 리뷰(peer-reviewed) 학술지에서 먼저 게재 승인을 받아야만 아카이브에 제출할 수 있다"고 덧붙였다.

금요일 아침 이메일 인터뷰에서 디터리히 위원장은 이것이 '원 스트라이크(1아웃) 규칙'이라고 밝혔다. 즉, AI로 생성된 쓰레기 데이터를 포함하여 적발된 저자는 단 한 번의 적발으로도 제명 처리되지만, 이러한 결정에 대해서는 항소가 가능하다는 설명이다. 그는 "반박할 수 없는 명백한 증거가 있는 경우에만 이 규정을 적용한다는 점을 강조하고 싶다"며, "아울러 당사 내부 절차상 먼저 중재자(modorator)가 문제를 문서화하고, 징벌을 확정하기 전에 분과 위원장(Section Chair)의 확인을 거쳐야 한다"고 덧붙였다.

2025년 11월, 아카이브는 AI 저열물(AI slop)로 인해 시스템이 '범람(flooded)'하고 있다며 더 이상 컴퓨터 공학 분야의 리뷰 논문과 포지션 페이퍼를 접수하지 않겠다고 발표한 바 있다. 당시 변경 사항에 대한 보도자료에서 아카이브는 "생성형 AI와 대형 언어 모델은 특히 새로운 연구 결과를 도입하지 않는 논문을 빠르고 쉽게 작성할 수 있게 만들어 이러한 범람을 부추겼습니다. 아카이브의 모든 카테고리에서 논문 제출이 크게 증가했지만, 특히 컴퓨터 공학(CS) 분야에서 두드러집니다"라고 지적했다.

또한 올해 1월에는 허위 제출 증가에 대응하기 위해 신규 제출자의 경우 기존의 검증된 저자로부터 추천서(endorsement)를 받아야만 제출할 수 있도록 정책을 변경했다.

연구 분야에서 AI가 생성하고 조작한 인용문은 이미 심각한 문제로 대두되고 있다. 콜롬비아 대학교 연구진의 최근 연구에 따르면, 3년 동안 250만 건의 생명의학 논문을 조사한 결과 2026년 첫 7주 동안 출판된 논문 중 277편당 1편(약 0.36%)의 꼴로 조작된 참고 문헌이 포함된 것으로 나타났다. 이러한 비율은 2023년의 경우 2,828편당 1편, 2025년에는 458편당 1편이었으며, 그 증가 속도가 매우 가파르다.

생성형 AI가 만들어낸 허위 인용과 논문은 이미 학계의 피어 리뷰 프로세스에 막대한 부담을 주고 있으며, 앞서 언급한 LLM의 메타 주석이나 허위 데이터가 전혀 걸러지지 않은 채 그대로 게재되는 논문도 점점 더 많아지고 있다.

현재 코넬 테크(Cornell Tech)가 운영 중인 아카이브는 오는 7월 독립적인 비영리 기관으로 전환될 예정이다. 코넬 테크의 그렉 모리셋(Greg Morrisett) 학장 겸 부총장은 사이언스닷오알지(Science.org)와의 인터뷰에서 이러한 변화가 아카이브가 더 많은 기부자로부터 자금을 조달할 수 있도록 도울 것이며, 이는 급증하는 'AI 저열물(AI slop)' 문제에 대처하기 위해 반드시 필요하다고 밝혔다.

원문 보기

원문 보기 (영어)

ArXiv, the open-access repository of preprint academic research, will ban authors of papers for a year if they submit obviously AI-generated work. Late Thursday evening, Thomas Dietterich, chair of the computer science section of ArXiv, wrote on X : “If generative AI tools generate inappropriate language, plagiarized content, biased content, errors, mistakes, incorrect references, or misleading content, and that output is included in scientific works, it is the responsibility of the author(s). We have recently clarified our penalties for this. If a submission contains incontrovertible evidence that the authors did not check the results of LLM generation, this means we can't trust anything in the paper.” Examples of incontrovertible evidence, he wrote, include “hallucinated references, meta-comments from the LLM (‘here is a 200 word summary; would you like me to make any changes?’; ‘the data in this table is illustrative, fill it in with the real numbers from your experiments’.” “The penalty is a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue,” Dietterich wrote. Dietterich told me in an email on Friday morning that this is a one-strike rule—meaning authors caught just once including AI slop in submissions will be banned—but that decisions will be open to appeal. “I want to emphasize that we only apply this to cases of incontrovertible evidence,” he said. “I should also add that our internal process requires first a moderator to document the problem and then for the Section Chair to confirm before imposing the penalty.” In November 2025, arXiv announced it would no longer accept computer science review articles and position papers because it was being “flooded” with AI slop. “Generative AI/large language models have added to this flood by making papers—especially papers not introducing new research results—fast and easy to write. While categories across arXiv have all seen a major increase in submissions, it’s particularly pronounced in arXiv’s CS category,” arXiv wrote in a press release about the change at the time. And in January , it announced first-time submitters would need an endorsement from an established author due to a rise in fraudulent submissions. AI-generated, fabricated citations are a huge problem in research. A recent study by Columbia University researchers examined 2.5 million biomedical papers across three years, and found that one in 277 papers published in the first seven weeks of 2026 contained fabricated references; In 2023, it was one in 2,828, and in 2025, one in 458. AI-generated citations and papers are already straining the peer-review process , and more and more papers are making it through the pipeline with those meta-comments and hallucinated data intact. ArXiv is managed by Cornell Tech, but this July, it will become an independent nonprofit corporation. Greg Morrisett, dean and vice provost of Cornell Tech, told Science.org that this change will help arXiv raise more money from a wider range of donors, which Morrisett said is needed to deal with the emergence of “AI slop.” About the author Sam Cole is writing from the far reaches of the internet, about sexuality, the adult industry, online culture, and AI. She's the author of How Sex Changed the Internet and the Internet Changed Sex. More from Samantha Cole

아카이브(arXiv) AI 편향 및 검열 학술 연구 연구 윤리 생성형 AI