메뉴
HN
Hacker News 39일 전

구글 8세대 TPU 공개: 에이전트 시대를 위한 두 개의 칩

IMP
8/10
핵심 요약

구글이 대규모 AI 모델 학습용 'TPU 8t'와 고속 추론용 'TPU 8i' 두 가지 목적에 특화된 8세대 TPU를 발표했습니다. 복잡한 추론과 다단계 워크플로우를 수행하는 '에이전트(Agent)' 시대의 인프라 요구를 충족하기 위해 설계되었으며, 기존 대비 극대화된 전력 효율과 성능을 자랑합니다. 이 칩들은 올해 하반기 일반 공급될 예정이며, AI 실무자들의 대규모 워크로드 확장을 강력히 지원할 전망입니다.

번역된 본문

오늘 구글 클라우드 넥스트(Google Cloud Next) 행사에서, 우리는 학습과 추론을 위해 특별히 설계된 두 가지 독립적인 아키텍처인 TPU 8t와 TPU 8i를 탑재한 구글의 8세대 커스텀 텐서 프로세싱 유닛(TPU)을 소개합니다. 이 두 가지 칩은 최첨단 모델 학습 및 에이전트 개발부터 대규모 추론 워크로드에 이르기까지 모든 것을 구동하기 위해 우리가 자체 설계한 슈퍼컴퓨터를 파워업하도록 설계되었습니다.

수년 동안 TPU는 Gemini를 포함한 선도적인 파운데이션 모델들을 구동해 왔습니다. 이 8세대 TPU는 학습, 서비스 제공, 에이전트 워크로드 전반에 걸쳐 확장성, 효율성 및 기능을 제공할 것입니다. 이 에이전트 AI의 시대에 모델은 문제를 추론하고, 다단계 워크플로우를 실행하며, 지속적인 루프 속에서 자신의 행동으로부터 학습해야 합니다. 이는 인프라에 새로운 요구 사항을 부과하며, TPU 8t와 TPU 8i는 가장 까다로운 AI 워크로드를 처리하고 대규모로 진화하는 모델 아키텍처에 적응하기 위해 구글 딥마인드와의 협력을 통해 설계되었습니다.

TPU는 커스텀 연산 방식, 액체 냉각, 맞춤형 상호 연결망 등 다수의 ML 슈퍼컴퓨팅 구성 요소의 표준을 제정해 왔으며, 우리의 8세대 TPU는 10년 이상의 개발 과정의 정수입니다. 원래 TPU 설계의 핵심 통찰력은 오늘날에도 여전히 유효합니다. 모델 아키텍처 및 애플리케이션 요구 사항을 포함하여 하드웨어, 네트워킹 및 소프트웨어와 실리콘을 커스터마이징 및 공동 설계함으로써 전력 효율과 절대적인 성능을 획기적으로 향상시킬 수 있습니다.

우리는 10년간의 혁신이 실제 세계의 획기적인 발전으로 어떻게 이어지는지 보게 되어 기쁩니다. 오늘날 시타델 시큐리티즈(Citadel Securities)와 같은 선구적인 조직은 가능성의 한계를 넓히고 있으며, 최첨단 AI 워크로드를 구동하기 위해 TPU를 선택하고 있습니다.

원문 보기
원문 보기 (영어)
Our eighth generation TPUs: two chips for the agentic era Share x.com Facebook LinkedIn Mail Copy link The culmination of a decade of development, TPU 8t and TPU 8i are custom-engineered to power the next generation of supercomputing with efficiency and scale. Amin Vahdat SVP and Chief Technologist, AI and Infrastructure Read AI-generated summary General summary Google is launching its eighth-generation Tensor Processor Units, featuring two specialized chips: the TPU 8t for massive model training and the TPU 8i for high-speed inference. These chips are purpose-built to handle the complex, iterative demands of AI agents while delivering significant gains in power efficiency and performance. You can request more information now to prepare for their general availability later this year. Summaries were generated by Google AI. Generative AI is experimental. Bullet points Google’s new eighth generation TPUs, TPU 8t and 8i, power the next era of AI. The TPU 8t is a training powerhouse built to speed up complex model development. The TPU 8i specializes in low-latency inference to support fast, collaborative AI agents. Both chips use custom hardware to deliver better performance and energy efficiency than before. These new systems will be available later this year to help scale your AI workloads. Summaries were generated by Google AI. Generative AI is experimental. Basic explainer Google just announced its eighth generation of custom AI chips, the TPU 8t and TPU 8i. These chips are built to handle the heavy lifting required for training massive AI models and running complex AI agents. By specializing each chip for either training or performance, Google makes AI faster and more energy-efficient. This new hardware helps developers build smarter tools that can reason and solve problems more effectively. Summaries were generated by Google AI. Generative AI is experimental. Explore other styles: General summary Bullet points Basic explainer Share x.com Facebook LinkedIn Mail Copy link Your browser does not support the audio element. Listen to article This content is generated by Google AI. Generative AI is experimental [[duration]] minutes Voice Speed Voice Speed 0.75X 1X 1.5X 2X Today at Google Cloud Next, we are introducing the eighth generation of Google's custom Tensor Processor Unit (TPU), coming soon with two distinct, purpose-built architectures for training and inference: TPU 8t and TPU 8i. These two chips are designed to power our custom-built supercomputers, to drive everything from cutting-edge model training and agent development, to massive inference workloads. TPUs have been powering leading foundation models, including Gemini, for years. These 8th generation TPUs together will deliver scale, efficiency and capabilities across training, serving and agentic workloads. In this age of AI agents, models must reason through problems, execute multi-step workflows and learn from their own actions in continuous loops. This places a new set of demands on infrastructure, and TPU 8t and TPU 8i were designed in partnership with Google DeepMind to take on the most demanding AI workloads and adapt to evolving model architectures at scale. TPUs set the standard for a number of ML supercomputing components including custom numerics, liquid cooling, custom interconnects and more, and our eighth generation TPUs are the culmination of more than a decade of development. The key insight behind the original TPU design continues to hold today: by customizing and co-designing silicon with hardware, networking and software, including model architecture and application requirements, we can deliver dramatically more power efficiency and absolute performance. We are thrilled to see how a decade of innovation translates into real-world breakthroughs. Today, pioneering organizations like Citadel Securities are pushing the boundaries of what's possible, choosing TPUs to power their cutting-edge AI workloads: Two chips to meet the moment Hardware development cycles are much longer than software. With each generation of TPUs, we need to consider what technologies and demands will exist by the time they are brought to market. Several years ago, we anticipated rising demand for inference from customers as frontier AI models are deployed in production and at scale. And with the rise of AI agents, we determined the community would benefit from chips individually specialized to the needs of training and serving. TPU 8t shines at massive, compute-intensive training workloads designed with larger compute throughput and more scale-up bandwidth. TPU 8i is designed with more memory bandwidth to serve the most latency-sensitive inference workloads, which is critical because interactions between agents at scale magnify even small inefficiencies. Importantly, both chips can run various workloads, but specialization unlocks significant efficiencies and gains. TPU 8t: The training powerhouse TPU 8t is built to reduce the frontier model development cycle from months to weeks. By balancing the highest possible compute throughput, shared memory and interchip bandwidth with the best possible power efficiency and productive compute time, we have crafted a system that delivers nearly 3x the compute performance per pod over the previous generation, enabling faster innovation to ensure our customers continue to set the pace for the industry. Massive scale : A single TPU 8t superpod now scales to 9,600 chips and two petabytes of shared high bandwidth memory, with double the interchip bandwidth of the previous generation. This architecture delivers 121 ExaFlops of compute and allows the most complex models to leverage a single, massive pool of memory. Maximum utilization : By also integrating 10x faster storage access, combined with TPUDirect to pull data directly into the TPU, TPU 8t helps ensure maximum utilization of the end-to-end system. Near-linear scaling : Our new Virgo Network , combined with JAX and our Pathways software, means TPU 8t can provide near-linear scaling for up to a million chips in a single logical cluster. In addition to raw performance, TPU 8t is engineered to target over 97% “goodput” — a measure of useful, productive compute time — through a comprehensive set of Reliability, Availability and Serviceability (RAS) capabilities. These include real-time telemetry across tens of thousands of chips, automatic detection and rerouting around faulty ICI links without interrupting a job, and Optical Circuit Switching (OCS) that reconfigures hardware around failures with no human intervention. Every hardware failure, network stall or checkpoint restart is time the cluster is not training, and at frontier training scale, every percentage point can translate into days of active training time. TPU 8i: The reasoning engine In the agentic era, users expect to be able to ask questions, delegate tasks and get outcomes. TPU 8i is designed to handle the intricate, collaborative, iterative work of many specialized agents, often “swarming” together in complex flows to deliver solutions and insights for the most challenging tasks. We redesigned the stack to eliminate the “waiting room” effect through four key innovations: Breaking the “memory wall” : To stop processors from sitting idle, TPU 8i pairs 288 GB of high-bandwidth memory with 384 MB of on-chip SRAM — 3x more than the previous generation — keeping a model's active working set entirely on-chip. Axion-powered efficiency : We doubled the physical CPU hosts per server, moving to our custom Axion Arm-based CPUs. By using a non-uniform memory architecture (NUMA) for isolation, we have optimized the full system for superior performance. Scaling MoE models : For modern Mixture of Expert (MoE) models, we doubled the Interconnect (ICI) bandwidth to 19.2 Tb/s. Our new Boardfly architecture reduces the maximum network diameter by more than 50%, ensuring the system works as one cohesive, low-latency unit. Eliminating lag: Our new on-chip Collectives Acce