Hacker News • 70일 전

CopyFail 취약점: 파드에서 호스트로 권한 상승

IMP

9/10

핵심 요약

새로운 리눅스 커널 취약점인 CopyFail(CVE-2026-31431)을 통해 공격자가 컨테이너 경계를 넘어 호스트 시스템의 루트 권한까지 탈취할 수 있습니다. 이 취약점은 코드 주입 없이도 커널의 페이지 캐시(Page Cache)를 조작하여 크로스 컨테이너 오염 및 컨테이너 탈출을 가능하게 합니다. 디스크 상의 파일은 변경되지 않아 탐지가 매우 어렵고, 쿠버네티스(Kubernetes) 환경에 치명적인 위협이 될 수 있어 실무자들의 즉각적인 파악과 대응이 필요합니다.

번역된 본문

취약점 연구 보고서: 보안 오픈소스 프로젝트를 위한 Copy Fail: 파드(Pod)에서 호스트(Host)로. Copy Fail(CVE-2026-31431) 취약점이 컨테이너 탈출 기법으로 어떻게 악용되는지에 대한 상세 분석입니다. 이는 4바이트 페이지 캐시 쓰기에서 시작해 쿠버네티스 호스트의 루트 권한을 획득하는 과정을 다룹니다. 작성자: Juno Im, 2026년 5월 19일.

목차: 왜 페이지 캐시가 컨테이너 경계를 넘는가 / 시나리오 1: 크로스 컨테이너 오염 / 1-1: 기반 레이어를 공유하는 침해된 파드 / 1-2: 파드 생성 권한 / 시나리오 2: 컨테이너 탈출 / 탐지 및 완화 / 커뮤니티 PoC

2주 전, 우리는 새롭고 매우 위험한 리눅스 로컬 권한 상승 취약점인 Copy Fail을 공개했습니다. Copy Fail은 실행 중인 커널에 코드를 주입하지 않고도 커널 메모리 손상 결함을 악용하므로, 그 크기가 작고 이식성이 매우 뛰어납니다. Copy Fail은 공격자가 읽기 가능한 파일을 지원하는 리눅스 페이지 캐시에 반복적이고 제어된 4바이트 쓰기를 수행할 수 있게 합니다. 즉, 공격자가 리눅스 파일 시스템에 있는 파일의 캐시된 내용을 다시 쓸(rewrite) 수 있게 됩니다.

운영자들이 Copy Fail에 대한 자신들의 취약점 노출 정도를 파악할 수 있도록, 우리는 개념 증명(PoC) 익스플로잇과 모델 공격 경로를 공개했습니다. 우리의 모델 공격은 대부분의 리눅스 시스템에 존재하는 su 바이너리를 대상으로 합니다. su는 setuid root 권한이 설정되어 있기 때문에, 이를 다시 쓰고 실행할 수 있는 공격자는 루트 권한으로 상승할 수 있습니다. 다시 쓰여진 su는 루트 비밀번호를 묻고 확인하는 대신, 이러한 인증 절차를 건너뛰고 호출자를 바로 루트 셸(root shell)로 안내합니다.

우리의 개념 증명(PoC)으로 인해 일부 사람들은 su와 같은 setuid 바이너리를 다시 쓰는 것이 이 공격의 전부라고 믿게 되었습니다. 그렇지 않습니다! Copy Fail 및 관련 페이지 캐시 쓰기 익스플로잇이 공격자에게 제공하는 기능은 강력하고 다재다능합니다. 예를 들어, 이를 사용하여 네임스페이스가 적용된 컨테이너에서 벗어나는(탈출하는) 방법을 살펴보겠습니다.

이 새로운 익스플로잇 패턴을 이해하려면 Copy Fail의 내부 동작 원리에 대해 약간 이해해야 합니다. Copy Fail은 IPSec ESP 확장 시퀀스 번호(authencesn)를 처리하는 커널 코드를 혼란스럽게 만드는 방식으로 작동합니다. 이 코드는 리눅스 커널 암호화 하위 시스템에 대한 사용자 공간의 인터페이스인 AF_ALG 소켓을 통해 권한이 없는 사용자에게 노출됩니다.

구체적으로, Copy Fail은 authencesn 코드가 실제로는 변경 가능한 페이지 캐시 참조를 다루고 있을 때, 자신이 일회용 스크래치 메모리(임시 메모리)를 보고 있다고 생각하도록 속입니다. 그리고 splice(2) 시스템 콜을 사용한 길이가 0인 복사를 통해 제공된 바이트를 사용하여 커널의 암호화 코드에 암호문 블롭을 해독하도록 지시합니다. IPSec ESN의 전송 형식이 암호화 코드가 작동하는 암시적 형식이 아니기 때문에, authencesn 코드는 시퀀스 번호를 이리저리 재배열합니다. 그러나 이 코드는 패킷의 일회용 버퍼를 처리하는 것이 아닙니다. Copy Fail이 이를 속여 캐시된 파일에 대한 참조를 조작하도록 만든 것입니다.

크로스 컨테이너 커널 공격은 일반적으로 커널 메모리를 손상시킵니다. 예를 들어 레이스 윈도우(Race windows), UAF(Use-After-Free), 특정 버전에 종속적인 페이로드 등이 있습니다. 이러한 기법은 커널 수준에서 코드를 실행할 수 있게 해주므로 강력하지만, 매우 취약하고 불안정합니다. 하지만 Copy Fail은 확정적(deterministic)입니다. 이는 커널 코드 실행에 의존하지 않고도 크로스 파드 침해나 런타임 독성 코드 주입을 위한 더욱 안정적인 기본 요소(primitive)입니다.

두 가지 주요 공격 시나리오가 있습니다. 시나리오 1: 크로스 컨테이너 오염. 침해된 파드 또는 새로 생성된 공격자 파드(파드 생성 권한만 필요)에서, 동일한 기반 address_space를 통해 동일한 취약한 하위 레이어 파일에 액세스하는 공동 배치된(co-located) 파드에 잠재적으로 백도어를 심을 수 있습니다. 이미지 참조가 달라도 레이어 해시(layer hash)만 일치하면 됩니다. 이 침해는 커널 페이지 캐시에만 존재하므로 디스크 상의 바이트는 변경되지 않으며, 에이전트가 없는 디스크 스캐너에는 보이지 않습니다.

시나리오 2: 컨테이너 탈출. 권한 없는 컨테이너 내부에서, 또는 호스트 파일 시스템 마운트를 가진 침해된 DaemonSet으로부터 호스트에서 루트 셸을 획득합니다.

왜 페이지 캐시가 컨테이너 경계를 넘는가? 페이지 캐시는 컨테이너 간에 공유됩니다. 사용자가 어떤 네임스페이스에 있든 상관없이, 커널이 다루는 모든 struct file(파일 구조체)은 일반적으로 기반 inode의 i_mapping에서 가져오는 f_mapping 포인터를 가지고 있습니다. 이것은 임의의 두 파일 디스크립터가...

원문 보기

원문 보기 (영어)

Vulnerability Research AI for Security Open Source Projects Copy Fail: From Pod to Host. A walkthrough of Copy Fail (CVE-2026-31431) as a container escape primitive: from a 4-byte page cache write to host root on Kubernetes. Juno Im May 19, 2026 Contents Why the Page Cache Crosses Container Boundaries Scenario 1: Cross-Container Poisoning 1-1: Compromised pod sharing a base layer 1-2: Pod creation rights Scenario 2: Container Escape Detection and Mitigation Community PoCs Two weeks ago, we disclosed Copy Fail , a new and exceptionally dangerous Linux local-privilege escalation vulnerability. Copy Fail exploits a kernel memory corruption flaw without injecting code into a running kernel, which makes it small and unusually portable. Copy Fail gives attackers a repeatable, controlled 4-byte write into the Linux page cache backing any readable file; in other words, it allows attackers to rewrite the cached contents of files on a Linux filesystem. To help operators determine their susceptibility to Copy Fail, we published a proof-of-concept exploit and a model attack path. Our model attack targets the su binary present on most Linux systems. Because su is setuid root, an attacker who can rewrite it and then execute it can escalate to root. Instead of having it ask for and check a root password, the rewritten su skips the paperwork and drops the caller straight into a root shell. Our proof-of-concept led some to believe that rewriting setuid binaries like su was the extent of the attack. Not so! The capability that Copy Fail and related page cache writing exploits extend to attackers is powerful and versatile. As an example, let’s walk through how to use it to break out of a namespaced container. To understand this new exploit pattern, you have to understand a little bit about what’s happening under the hood in Copy Fail. Copy Fail works by confusing the kernel code that handles IPSec ESP Extended Sequence Numbers ( authencesn ). This code is exposed to unprivileged users via AF_ALG sockets, which are userland’s interface to Linux’s kernel cryptography subsystem. Specifically, Copy Fail sets the authencesn code up to think it’s looking at disposable scratch memory when it’s really handling a mutable reference to the page cache. It tells the kernel’s cryptography code to decrypt a ciphertext blob, using bytes supplied by a zero-length copy from a pipe using splice(2) . Because the wire format for IPSec ESNs isn’t the implicit format the crypto code operates on, the authencesn code shuffles sequence numbers around. But the code isn’t handling a disposable buffer from a packet; Copy Fail has tricked it into operating on a reference to a cached file. Cross-container kernel attacks usually corrupt kernel memory: race windows, UAFs, version-bound payloads. These primitives are powerful, as they can allow code execution at the kernel level. But they’re fragile. Copy Fail is deterministic. It’s a more reliable primitive for cross-pod compromise or runtime poisoning, without relying on kernel code execution. There are two primary attack scenarios: Scenario 1: cross-container poisoning. From a compromised pod, or from a freshly-launched attacker pod (only create pods rights required), potentially backdoor co-located pods that access the same vulnerable lower-layer file through the same underlying address_space. Image references can differ; only a layer hash needs to match. The compromise lives only in the kernel page cache so on-disk bytes are unchanged and it is invisible to agent-less disk scanners. Scenario 2: container escape. From inside an unprivileged container, or from a compromised DaemonSet with host-filesystem mounts, get a root shell on the host. Why the Page Cache Crosses Container Boundaries The page cache is shared across containers. No matter what namespace you’re in, every struct file the kernel handles carries an f_mapping pointer, which usually comes from the underlying inode’s i_mapping . That means that any two file descriptors sharing an f_mapping share the same cache data. The kernel’s representation of contiguous pages of memory is called a “folio”. For ordinary buffered I/O on regular files, a write through one fd updates the cached folios. Subsequent reads, on every related fd, see the updated data (subject to normal concurrency and ordering rules). Copy Fail mutates the same folios via the AF_ALG/splice() path described in Part 1, bypassing the regular write accounting. The visibility property is unchanged: any fd whose f_mapping points at the affected address_space reads the modified bytes on its next page cache hit. All of this is independent of containers. Container isolation lives in mount, network, PID, user, and IPC namespaces. None of them creates a per-container address_space or page cache. Containers share cached folios when their file accesses reach the same underlying address_space . A Kubernetes container's root filesystem is commonly an overlayfs mount stitched together from a writable upper layer (usually per-container scratch) and one or more read-only lower layers (image layers). Container runtimes (containerd, CRI-O, others) deduplicate layers by content hash : if two containers on the same node use the same unpacked layer/snapshot, the corresponding lower-layer files can be backed by the same host inode/address_space, regardless of what the images are named. This reuse allows lowering the storage requirements for images by sharing common layers. python:3.12-slim and xint-flask-app:v1 (built FROM python:3.12-slim ) share the Python layer. Both share debian:bookworm-slim underneath. A redis:7-bookworm pod on the same node shares the Debian layer with both. In normal operation on an overlayfs mount, opening a lower-layer file for write access or truncation triggers overlayfs copy-up before writes proceed, allocating a new inode in the pod's upper layer so the change is private. By storing only this small set of differences, containers can reuse their lower layers efficiently while still allowing a writable copy to be presented to applications. However Copy Fail skips the standard write path entirely. The folios it mutates belong to the lower-layer address_space itself, shared host-wide, rather than the upper layers that were meant to store write deltas. The pods' overlayfs mounts each present what looks like a private /usr/local/lib/python3.12/site-packages/foo.py (or /lib/x86_64-linux-gnu/libc.so.6 ), but overlayfs delegates file I/O to the real lower backing file. If those backing files are the same lower inode/address_space, the cached folios are shared: Copy Fail's 4-byte write goes into that one underlying entry. Anything that subsequently reads the same lower-layer file through the same underlying address_space can read the poisoned bytes, until the page is evicted or the layer is dropped. The on-disk inode is unchanged so of course image-registry scanners, file-integrity monitors examining the disk hash, and offline, snapshot, or block-level scanners that bypass the affected running kernel's page cache see the original content. Scenario 1: Cross-Container Poisoning Threat model. Unprivileged attacker, no privileged capabilities, no node access, no admission rights to mutate other workloads. Two ways to start: code execution in a pod the attacker already controls (1-1), or just create pods rights (1-2). Target. Pick a file in a layer widely shared on the node: a Python site-packages/ module if the node hosts Python-derived workloads, a shared object such as glibc for broader reach, subject to executable mapping, patch alignment, and crash-safety constraints anything inside a Debian/Ubuntu/Alpine base layer. We will use a Python source file for this demo. Pick a module imported during interpreter startup or during a common framework's init, so target pods load it early. The write. Python files are a good target for a demo because they are easier to read and are more portable than shellcode. Any changes to Python files

보안 취약점 컨테이너 탈출 쿠버네티스 리눅스 커널 Zero-Day