MIT Tech Review • 89일 전

스타트업 굿파이어, LLM 내부 디버깅 툴 실리콘(Silico) 출시

IMP

8/10

핵심 요약

샌프란시스코 스타업 굿파이어(Goodfire)는 AI 모델 내부를 들여다보고 훈련 과정에서 동작을 세밀하게 조정할 수 있는 최초의 상용 도구 실리콘(Silico)을 출시했습니다. 이 도구는 신경망 경로를 매핑하는 '기계적 해석 가능성(Mechanistic Interpretability)' 기술을 활용해 환각 현상 감소 등 모델의 문제를 해결하고 엔지니어링 수준의 정밀한 제어를 가능하게 합니다. 업계 전문가들은 여전히 연금술적 한계가 남아있다고 지적하지만, 오픈소스 LLM 개발 과정에서 시행착오를 줄여줄 유용한 플랫폼으로 평가받고 있습니다.

번역된 본문

샌프란시스코에 본사를 둔 스타트업 굿파이어(Goodfire)가 연구자와 엔지니어들이 AI 모델 내부를 들여다보고, 훈련 중에 모델의 행동을 결정하는 설정인 파라미터(parameters)를 조정할 수 있는 새로운 도구인 실리콘(Silico)을 출시했습니다. 이 도구는 이전에는 불가능하다고 여겨졌던 것보다 AI 기술이 구축되는 방식에 대해 모델 개발자들에게 더욱 세밀한 제어권을 제공할 수 있습니다. 굿파이어는 실리콘이 데이터셋 구축부터 모델 훈련에 이르는 모든 개발 단계에서 개발자들이 디버깅을 할 수 있도록 돕는 최초의 상용(Off-the-shelf) 도구라고 주장합니다. 이 회사는 자사의 미션이 AI 모델 구축을 연금술(alchemy)처럼 손에 잡히지 않는 작업에서 과학적인 작업으로 만드는 것이라고 밝혔습니다.

물론 ChatGPT나 Gemini 같은 대형 언어 모델(LLM)이 놀라운 일들을 해낼 수 있는 것은 사실입니다. 하지만 그것이 정확히 어떻게, 왜 작동하는지 아무도 완벽히 알지 못하며, 이로 인해 모델의 결함을 수정하거나 원치 않는 동작을 차단하기가 어렵습니다.

굿파이어의 에릭 호(Eric Ho) CEO는 실리콘 출시에 앞선 MIT 테크놀로지 리뷰와의 단독 인터뷰에서 "우리는 모델이 얼마나 잘 이해되는지와 그것이 얼마나 널리 배포되고 있는지 사이의 격차가 점점 벌어지는 것을 보았다"며 "오늘날 모든 주요 최고 수준(Frontier) AI 연구소의 지배적인 분위기는 그저 더 많은 규모, 더 많은 컴퓨팅 파워, 더 많은 데이터가 필요하고, 그러면 범용 인공지능(AGI)을 얻게 되며 다른 것은 중요하지 않다는 것입니다. 하지만 우리는 '아니오, 더 나은 방법이 있다'고 말하는 것"이라고 말했습니다.

굿파이어는 업계 리더인 앤스로픽(Anthropic), 오픈AI(OpenAI), 구글 딥마인드(Google DeepMind)를 포함한 소수의 기업 중 하나로, 모델 내 뉴런(neurons)과 그들 사이의 경로를 매핑하여 작업을 수행할 때 AI 모델 내부에서 어떤 일이 일어나는지 이해하는 것을 목표로 하는 '기계적 해석 가능성(mechanistic interpretability)'이라는 기술을 개척하고 있습니다. (MIT 테크놀로지 리뷰는 이 기계적 해석 가능성을 2026년 10대 돌파구 기술 중 하나로 선정했습니다.) 굿파이어는 이러한 접근 방식을 이미 훈련된 모델을 연구하는 감사(audit) 목적뿐만 아니라, 애초에 모델을 설계하는 데 도움을 주기 위해 사용하고자 합니다.

호 CEO는 "우리는 시행착오를 제거하고 모델 훈련을 정밀 엔지니어링으로 전환하기를 원합니다. 이는 훈련 과정에서 실제로 사용할 수 있도록 내부의 노브(knobs)와 다이얼을 밖으로 노출시키는 것을 의미합니다"라고 말했습니다. 굿파이어는 이미 자사의 기술과 도구를 사용하여 LLM의 동작을 조정해 왔습니다. 예를 들어, 모델이 생성하는 환각(hallucination) 현상의 수를 줄이는 것이 그것입니다. 실리콘을 통해 이 회사는 이제 그러한 사내 기술의 많은 부분을 패키징하여 하나의 제품으로 출시하고 있습니다. 이 도구는 에이전트(agent)를 사용하여 복잡한 작업의 상당 부분을 자동화합니다.

호 CEO는 "이제 에이전트가 인간이 하던 많은 해석 작업을 수행할 만큼 강력해졌습니다. 이것이 고객이 직접 사용할 수 있는 실행 가능한 플랫폼이 되기 전에 해결해야 했던 핵심 간극이었습니다"라고 덧붙였습니다. 기계적 해석 가능성 분야에서 연구해 온 암스테르담 대학교의 레너드 베레스카(Leonard Bereska) 연구원은 실리콘이 유용한 도구로 보인다고 생각합니다. 하지만 그는 굿파이어의 더 높은 포부에는 동의하지 않습니다. "현실적으로 그들은 연금술에 정밀함을 더하는 것"이라며 "이것을 엔지니어링이라고 부르는 것은 실제보다 원칙적인 것처럼 들리게 만드는 것"이라고 지적했습니다.

모델 매핑하기 실리콘을 사용하면 개별 뉴런이나 뉴런 그룹과 같이 훈련된 모델의 특정 부분을 확대하여 자세히 살펴보고, 해당 뉴런이 무슨 역할을 하는지 확인하기 위해 실험을 실행할 수 있습니다. (이는 모델의 내부 구조에 접근할 수 있다고 가정했을 때의 이야기입니다. 대부분의 사람들은 실리콘을 사용해 ChatGPT나 Gemini 내부를 살펴볼 수 없지만, 다양한 오픈소스 모델 내부의 파라미터를 들여다보는 데는 사용할 수 있습니다.) 그런 다음 어떤 입력값이 다른 뉴런을 활성화시키는지 확인하고, 특정 뉴런의 상류 및 하류 경로를 추적하여 다른 뉴런이 그것에 어떤 영향을 미치고, 그것이 차례로 다른 뉴런에 어떤 영향을 미치는지 살펴볼 수 있습니다.

예를 들어, 굿파이어는 오픈소스 모델인 큐웬(Qwen 3) 내부에서 이른바 '트롤리 딜레마(trolley problem)'와 관련된 하나의 뉴런을 발견했습니다. 이 뉴런을 활성화시키면 모델의 응답이 바뀌며, 출력 결과를 명시적인 도덕적 딜레마로 구성하게 만들었습니다. 호 CEO는 "이 뉴런이 활성화되면 온갖 이상한 일들이 일어납니다"라고 말했습니다. 이처럼 이상한 동작의 원인을 정확히 찾아내는 것은 이제 꽤 표준적인 작업이 되었습니다. 하지만 굿파이어는 이 과정을 더욱 쉽게 만들고자 합니다.

원문 보기

원문 보기 (영어)

The San Francisco–based startup Goodfire just released a new tool, called Silico, that lets researchers and engineers peer inside an AI model and adjust its parameters—the settings that determine a model’s behavior —during training. This could give model makers more fine-grained control over how this technology is built than was once thought possible. Goodfire claims Silico is the first off-the-shelf tool of its kind that can help developers debug all stages of the development process, from building a data set to training a model. The company says its mission is to make building AI models less like alchemy and more like a science. Sure, LLMs like ChatGPT and Gemini can do amazing things. But nobody knows exactly how or why they work, and that can make it hard to fix their flaws or block unwanted behaviors. “We saw this widening gap between how well models were understood and just how widely they were being deployed,” Goodfire’s CEO, Eric Ho, tells MIT Technology Review in an exclusive chat ahead of Silico’s release. “I think the dominant feeling in every single major frontier lab today is that you just need more scale, more compute, more data, and then you get AGI [artificial general intelligence] and nothing else matters. And we’re saying no, there’s a better way.” Goodfire is one of a small handful of companies, including industry leaders Anthropic, OpenAI, and Google DeepMind, pioneering a technique known as mechanistic interpretability, which aims to understand what goes on inside an AI model when it carries out a task by mapping its neurons and the pathways between them. ( MIT Technology Review picked mechanistic interpretability as one of its 10 Breakthrough Technologies of 2026.) Goodfire wants to use this approach not only to audit models—that is, studying those that have already been trained—but to help design them in the first place. “We want to remove the trial and error and turn training models into precision engineering,” says Ho. “And that means exposing the knobs and dials so that you can actually use them during the training process.” Goodfire has already used its techniques and tools to tweak the behaviors of LLMs—for example, reducing the number of hallucinations they produce . With Silico, the company is now packaging up many of those in-house techniques and shipping them as a product. The tool uses agents to automate much of the complex work. “Agents are now strong enough to do a lot of the interpretability work that we were doing using humans,” says Ho. “That was kind of the gap that needed to be bridged before this was actually a viable platform that customers could use themselves.” Leonard Bereska, a researcher at the University of Amsterdam who has worked on mechanistic interpretability, thinks Silico looks like a useful tool. But he pushes back on Goodfire’s loftier aspirations. “In reality, they are adding precision to the alchemy,” he says. “Calling it engineering makes it sound more principled than it is.” Mapping models Silico lets you zoom in on specific parts of a trained model, such as individual neurons or groups of neurons, and run experiments to see what those neurons do. (Assuming you have access to the model’s inner workings. Most people won't be able to use Silico to poke around inside ChatGPT or Gemini, but you can use it to look at the parameters inside many open-source models.) You can then check what inputs make different neurons fire, and trace pathways upstream and downstream of a neuron to see how other neurons affect it and how it affects other neurons in turn. For example, Goodfire found one neuron inside the open-source model Qwen 3 that was associated with the so-called trolley problem. Activating this neuron changed the model’s responses, making it frame its outputs as explicit moral dilemmas. “When this neuron’s active, all sorts of weird things happen,” says Ho. Pinpointing the source of odd behavior like this is now pretty standard practice. But Goodfire wants to make it easier to adjust that behavior. Using Silico, developers can now adjust the parameters connected to individual neurons to boost or suppress certain behaviors. In another example, Goodfire researchers asked a model whether a company should disclose that its AI behaves deceptively in 0.3% of cases, affecting 200 million users. The model said no, citing the negative business impact of such a disclosure. By looking inside the model, the researchers found that boosting neurons that were found to be associated with transparency and disclosure flipped the answer from no to yes nine out of 10 times. “The model already had the ethical reasoning circuitry, but it was being outweighed by the commercial risk assessment,” says Ho. Tweaking the values of a model in this way is just one approach. Silico can also help steer the training process by filtering out certain training data to avoid setting unwanted values for certain parameters in the first place. For example, many models will tell you that 9.11 is greater than 9.9 . Looking inside a model to see what’s going on might reveal that it is being influenced by neurons associated with the Bible, in which verse 9.9 comes before 9.11, or by code repositories where consecutive updates are numbered 9.9, 9.10, 9.11 and so on. Using this information, the model can be retrained to make it avoid its “Bible” neurons when doing math. By releasing Silico, Goodfire wants to put techniques previously available to a few top labs into the hands of smaller firms and research teams that want to build their own model or adapt an open-source one. The tool will be available for a fee determined on a case-by-case basis according to customers’ requirements (Goodfire declined to give specific pricing details). “If we can make training models a lot more like building software, there’s no reason why there can’t be many more companies designing models that fit their needs,” says Ho. Bereska agrees that tools like Silico could help firms build more trustworthy models. These techniques could be essential for safety-critical applications in health care and finance, he says. “Frontier labs already have internal interpretability teams,” he adds. “Silico arms the next tier of companies, where the value is not having to hire interpretability researchers.” Deep Dive Artificial intelligence OpenAI is throwing everything into building a fully automated researcher An exclusive conversation with OpenAI’s chief scientist, Jakub Pachocki, about his firm's new grand challenge and the future of AI. By Will Douglas Heaven archive page How Pokémon Go is giving delivery robots an inch-perfect view of the world Exclusive: Niantic's AI spinout is training a new world model using 30 billion images of urban landmarks crowdsourced from players. By Will Douglas Heaven archive page Want to understand the current state of AI? Check out these charts. According to Stanford’s 2026 AI Index, AI is sprinting, and we’re struggling to keep up. By Michelle Kim archive page This startup wants to change how mathematicians do math Axiom Math is giving away a powerful new AI tool. But it remains to be seen if it speeds up research as much as the company hopes. By Will Douglas Heaven archive page Stay connected Illustration by Rose Wong Get the latest updates from MIT Technology Review Discover special offers, top stories, upcoming events, and more. Enter your email Privacy Policy Thank you for submitting your email! Explore more newsletters It looks like something went wrong. We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.

기계적 해석 가능성 LLM 디버깅 Goodfire 모델 훈련 AI 안전성