TechCrunch AI • 61일 전

AI 컴퓨팅 전쟁, 차세대 세레브라스를 찾았나

IMP

7/10

핵심 요약

AI 추론(Inference) 전문 클라우드 스타트업 General Compute가 SambaNova의 새로운 특수 칩을 활용해 1,500만 달러의 시드 투자를 유치했습니다. 이 회사는 발열이 적은 공랭식 설계를 통해 기존 데이터센터나 비트코인 채굴장 인프라를 쉽게 재활용할 수 있다는 강점을 가집니다. 이는 향후 AI 생태계가 모델 학습을 넘어 속도와 비용 효율이 핵심인 '추론 클라우드' 중심으로 빠르게 재편되고 있음을 보여주는 중요한 사례입니다.

번역된 본문

AI 모델을 구동할 컴퓨터에 대한 폭발적인 수요는 가속화될 뿐이지만, 이 사업에 뛰어드는 누구나 해결해야 할 두 가지 주요 장애물이 있습니다. 바로 적합한 칩을 확보하는 것과, 이를 수익을 창출할 수 있는 데이터센터에 설치하는 것입니다. 모델이 학습되는 단계가 아닌 실제 실행되어 사용자의 질문에 답하는 '추론(Inference)' 단계에 특화된 AI 처리 능력을 대여하는 새로운 추론 클라우드 기업 General Compute는 AI 생태계가 나아갈 방향을 보여주는 이 질문들에 대한 해답을 가지고 있습니다. 이러한 답변은 FUSE VC의 리드와 Carya Venture Partners, Village Global Ventures의 참여로 6천만 달러의 투자 후 기업가치(Post-money Valuation)로 1,500만 달러의 시드 라운드를 유치하는 데 도움이 되었습니다.

첫째, 적합한 칩이란 무엇일까요? GPU에 대한 수요는 폭증했지만, AI 모델이 학습된 후 이를 실행하는 데 GPU가 가장 적합한 칩이 아니라는 것이 점차 업계의 정설로 자리 잡고 있습니다. 모델이 실제로 응답을 생성하는 AI 추론 단계는 학습과는 다른 계산 요구 사항을 가지며, 이를 위해 특별히 설계된 새로운 클래스의 칩들이 등장하고 있습니다. 작년 12월 엔비디아(Nvidia)의 200억 달러 규모 그록(Groq) 인수와 최근 세레브라스(Cerebras)의 570억 달러 IPO가 그 방향성을 가리키고 있습니다.

이 두 회사의 생산 능력이 한계에 부딪히자, General Compute의 CEO 핀 프클로스키(Finn Puklowski)와 CTO 제이슨 굿이슨(Jason Goodison) 공동 창업자는 다른 대안을 찾았습니다. 이들은 실리콘밸리의 주요 대화에서 다소 멀어졌지만 추론에 집중하는 인텔(Intel) 지원 칩 제조사인 삼바노바(SambaNova)가 제작한 특수 칩으로 눈을 돌렸습니다. 삼바노바가 올해 새로운 칩을 출시하면 이런 상황은 변할 수 있습니다. 이 아키텍처는 유연성이 뛰어나고 추론 계산 중 컨텍스트를 저장하기 위해 더 많은 메모리를 사용합니다. 삼바노바는 자사 칩이 GPU뿐만 아니라 그록이나 세레브라스 등이 만든 다른 특수 칩보다도 뛰어난 성능을 발휘한다고 주장합니다. 프클로스키 대표는 새 칩이 초당 600~700개의 토큰을 생성하는 반면, GPU는 초당 약 250개의 토큰을 생성한다고 말했습니다.

General Compute는 3억 달러 어치의 이 회사 SN50 칩을 주문했으며, 이를 배치할 최초의 클라우드 기업이 될 것이라고 밝혔습니다. 이 칩들은 General Compute의 두 번째 큰 문제인 '어디에 설치할 것인가'를 해결하는 데에도 도움이 됩니다. 수랭식이 아닌 공랭식으로 열을 식히고 전력 소모가 적어 새로운 인프라 투자 없이도 기존 데이터센터 시설에 설치할 수 있습니다. 프클로스키 대표는 General Compute가 자사의 하드웨어를 타인의 시설에 설치하는 형태인 '코로케이션(Colocation)' 계약을 추진하고 있습니다. 이는 단순히 데이터센터 제공업체뿐만 아니라 비트코인 채굴 수익성이 악화되어 인프라 활용 방안을 찾고 있는 암호화폐 채굴업체들과의 협력도 포함됩니다.

General Compute는 지난주 자사 클라우드 서비스를 출시하면서, 강력한 오픈소스 대형 언어 모델(LLM)인 미니맥스 2.7(MiniMax 2.7)을 실행하는 데 있어 이미 가장 빠른 속도를 자랑한다고 주장했습니다.

벤처 투자자 조 헤슬먼(Joe Hassleman)은 2021년 그록에 투자하며 추론 붐 초기에 뛰어든 인물입니다. 올해 그는 AI 분야에 집중하는 새로운 펀드인 에버크레스트 파트너스(Evercrest Partners)를 설립했고, General Compute를 첫 투자 대상으로 삼았습니다. 헤슬먼은 삼바노바와 General Compute의 파트너십에서 코어위브(Coreweave)와 엔비디아의 관계, 그리고 과거 그록의 칩 제조와 클라우드 서비스 간의 결합과 유사한 점을 발견합니다. 헤슬먼은 "칩 제조사는 자사 칩을 높은 성장이 기대되는 환경에 구축할 건강한 고객군이 필요합니다"라고 말했습니다. "General Compute가 삼바노바에 베팅하는 만큼, 삼바노바 역시 General Compute에 베팅하는 셈입니다."

문제는 AI의 미래에 어떤 컴퓨터 아키텍처가 가장 큰 가치를 차지할 것인가입니다. 추론 클라우드는 다양한 모델과 에이전트가 공존하는 세상, 즉 어느 한 공급자가 독점하지 않고 추론의 속도와 비용이 핵심 경쟁 변수가 되는 세상을 향한 암묵적인 베팅입니다. 이번 주 오픈라우터(OpenRouter)가 유치한 1억 1,300만 달러의 시리즈 B 투자를 생각해 보십시오. 이는 고객이 토큰 지출을 최적화하기 위해 다양한 모델에 접근할 수 있도록 제공하는 회사의 역량을 반영한 것입니다. 이러한 계산에서 속도는 가격 및 기능 측면에서 매우 중요합니다. 프클로스키 대표는 다음과 같이 말하고 싶어 했습니다.

원문 보기

원문 보기 (영어)

The raging demand for computers to run AI models has only accelerated, but there are two major obstacles that anyone in the business needs to overcome: getting the right chips, and getting them into data centers where they can start generating revenue. General Compute, a new inference neocloud — a company that rents out AI processing power, specializing in the phase when models are running and responding to users rather than being trained — has answers to those questions that illuminate where the AI ecosystem is headed. Those answers helped it raise a $15 million seed round at a $60 million post-money valuation, led by FUSE VC with participation from Carya Venture Partners and Village Global Ventures. First, what is the right chip? The demand for GPUs has gone through the roof, but it's becoming conventional wisdom that they aren't the best-suited chips for running AI models once they have been trained. The phase of AI where a model is actively generating responses has different computational requirements than training, and a new class of chips is being designed specifically for it. Nvidia's $20 billion Groq transaction in December and Cerebras' $57 billion IPO last week point the way. With capacity strained at both those companies, the co-founders of General Compute, CEO Finn Puklowski and CTO Jason Goodison, found another option. They're turning to specialized chips built by SambaNova, an Intel-backed chipmaker focused on inference that has fallen a bit out of the Silicon Valley conversation. That may change when SambaNova releases its new chips this year. The architecture is more flexible and uses more memory to store context during inference calculations, and SambaNova claims that it outperforms not just GPUs but also other specialized chips built by the likes of Groq or Cerebras. Puklowski says the new chips will generate 600 to 700 tokens per second, versus about 250 tokens per second for GPUs. General Compute has $300 million of the company's SN50 chips on order and says it will be the first neocloud deploying them. These chips also help solve the second big problem—where to put them—for General Compute: They are air-cooled, not water-cooled, and consume less power, so they can be installed in existing data center facilities without new infrastructure investments. Puklowski is pursuing colocation deals — arrangements where General Compute installs its hardware in someone else's facility — not just with data center providers, but also with crypto miners looking to repurpose their infrastructure as the cost of producing a bitcoin has often exceeded its price. General Compute launched its cloud offering last week, claiming it is already the fastest at running MiniMax 2.7, a powerful open-source LLM. Joe Hassleman is a venture investor who got in on the ground floor of the inference boom when he invested in Groq in 2021. This year, he launched a new fund, Evercrest Partners, focused on the AI space, and made General Compute his first investment. Hassleman sees in SambaNova's partnership with General Compute parallels to Coreweave's relationship with Nvidia — and to the pairing of Groq's chip-making with its former cloud offering. "They do need a healthy mix of customers that are going to put their chips in environments that are going to have high growth to them," Hassleman said. "As much as General Compute is making a bet on SambaNova, SambaNova is making a bet on General Compute." The question is what kind of computer architecture will capture the most value in the AI future. Inference clouds are implicit bets on a world of multiple models and agents, one where no single provider dominates and speed and cost of inference become the key competitive variables. Consider the $113 million Series B raised for OpenRouter this week, reflecting the company's ability to offer customers access to multiple models in order to optimize their token spend. Speed matters in that calculation, for price, and for capability. Puklowski wants to turn hour-long workloads for coding agents into five- or ten-minute tasks, and make audio agents for customer service, which require faster inference to converse effectively, more economical. "If you use ChatGPT and it gives you 50 tokens per second, that's still a heck of a lot faster than we can read," Puklowski told TechCrunch, "Now that things have moved to agent-to-agent, where agents are out there reading on our behalf or pinging databases, they need to go faster." Topics AI , Exclusive , General COmpute , SambaNova When you purchase through links in our articles, we may earn a small commission . This doesn’t affect our editorial independence. Tim Fernholz Senior Reporter Tim Fernholz is a journalist who writes about technology, finance and public policy. He has closely covered the rise of the private space industry and is the author of Rocket Billionaires: Elon Musk, Jeff Bezos and the New Space Race. Formerly, he was a senior reporter at Quartz, the global business news site, for more than a decade, and began his career as a political reporter in Washington, D.C. You can contact or verify outreach from Tim by emailing tim.fernholz@techcrunch.com or via an encrypted message to tim_fernholz.21 on Signal. View Bio May 27 Athens, Greece StrictlyVC Athens is up next. Hear unfiltered insights straight from Europe’s tech leaders and connect with the people shaping what’s ahead. Lock in your spot before it’s gone. REGISTER NOW Most Popular Meta launches Instagram, Facebook, and WhatsApp subscriptions, with more to come, including AI plans Sarah Perez Tech CEOs are apparently suffering from AI psychosis Julie Bort DuckDuckGo installs are up 30% as users reject being ‘force-fed’ Google’s AI Search Rebecca Bellan Starship's path to reusability looks murky after SpaceX's S-1 Tim Fernholz 6 kitchen gadgets that make adulting feel easier Lauren Forristal I tried Amazon's Bee wearable and am both intrigued and slightly creeped out Lucas Ropek You can no longer Google the word ‘disregard' Russell Brandom

AI 인프라 추론 클라우드 삼바노바 AI 반도체 스타트업 투자