Hacker News • 89일 전

2026년에도 여전히 순수 도커 컴포즈를 프로덕션 환경에서 써야 할까?

IMP

7/10

핵심 요약

2026년이 되어서도 순수 Docker Compose는 특정 운영 환경에서 여전히 훌륭한 선택지입니다. 하지만 자체적인 스케줄링이나 상태 복구 기능이 없기 때문에, 고아(Orphan) 컨테이너 정리나 디스크 관리 같은 운영상의 간극을 엔지니어가 직접 채워야만 프로덕션 환경에서 안정적으로 사용할 수 있습니다.

번역된 본문

블로그 필자인 Philip은 소프트웨어 및 AI 기업들이 자체 관리 환경에 애플리케이션을 배포할 수 있도록 돕는 Distr의 엔지니어입니다. 당사의 오픈 소스 소프트웨어 배포 플랫폼은 GitHub( github.com/distr-sh/distr )에서 이용 가능하며, 고객 호스트에서 매일같이 Docker Compose 및 Docker Swarm 배포를 오케스트레이션하고 있습니다.

제가 Docker Compose 호스트에서 겪은 대부분의 프로덕션 장애는 소수의 반복적인 특성에서 비롯되었습니다. 제거되었어야 할 오래된 컨테이너, 밤새 꽉 찬 디스크, 문제를 감지하고도 아무 조치도 취하지 않은 헬스 체크, 새로운 곳을 가리키게 된 :latest 태그, 혹은 아무도 신경 쓰지 않았던 소켓 마운트 등이죠. 이들은 결코 Docker의 버그가 아닙니다. 이는 원래 "내 컴퓨터에서는 되는데"라는 문제를 해결하기 위해 LXC를 래핑했던 PaaS 기업인 dotCloud의 내부 도구로 시작해, 현재 수많은 실제 비즈니스의 백엔드를 구동하고 있는 도구의 의도된 설계 트레이드오프일 뿐입니다. 이 글은 그러한 반복적인 문제들과, 각 문제를 해결하기 위한 명령어 및 운영적 해결책을 모아놓은 것입니다.

짧은 대답은 이렇습니다. 네, 2026년에도 순수 Docker Compose는 실제 프로덕션 워크로드를 실행할 수 있습니다. 단, 여러분이 직접 그 운영상의 간극을 메꿔줄 경우에 한해서요.

프로덕션 환경에서 순수 Docker Compose가 적합한 곳 문제의 목록을 살펴보기 전에, 대상 독자에 대해 잠깐 짚고 넘어가겠습니다. Docker Compose는 다중 컨테이너 애플리케이션을 연결하는 선언적 방법입니다. 하나의 YAML 파일이 서비스, 서비스 간의 네트워크, 공유하는 볼륨, 필요한 환경변수, 그리고 서비스 구성을 덮어쓰거나 패치하는 패턴을 통해 애플리케이션이 기대하는 온디스크 구성을 설명합니다. docker compose up 명령은 호스트의 상태를 해당 파일에 맞게 조정(reconcile)합니다.

프로덕션에서 이것이 가장 잘 맞는 특징적인 사용 사례는 단일 노드 배포입니다. 즉, 벤더가 다중 컨테이너 애플리케이션을 고객 환경에 푸시하거나, 내부 팀이 쿠버네티스 클러스터를 구성할 정도의 가치가 없는 소규모 롱테일 서비스를 운영하거나, 소매점의 엣지 박스를 실행하는 경우 등이죠. 공간을 적게 차지하고, 운영 오버헤드가 낮으며, 유능한 운영자라면 단일 docker-compose.yaml 파일만 보고도 전체 스택을 파악할 수 있습니다.

하지만 Compose 자체에는 그 뒤에 숨겨진 제어 평면(control plane)이 없습니다. 호스트를 감시하는 스케줄러도, 상태를 다시 적용하는 리컨실러(reconciler)도, 다른 곳에서 업데이트를 푸시하는 오퍼레이터도 없습니다. docker compose up은 한 번 실행되고 종료됩니다. 바로 이러한 아키텍처의 단순함 때문에 앞서 말한 골치 아픈 문제들이 발생합니다. Compose는 여러분(혹은 호스트를 운영하는 사람)이 다른 누구도 하지 않는 운영 작업을 직접 수행할 것이라고 가정합니다. 만약 고객에게 Compose 파일을 전달한다면, 고객이 그 작업을 하지 않을 것이라고 가정하는 것이 더 안전합니다.

이 글의 나머지 부분은 손수 직접 하든, 이를 대신해 줄 에이전트를 사용하든 간에, Compose가 하는 일과 프로덕션 호스트가 실제로 필요로 하는 것 사이의 간극을 줄이는 방법에 대한 것입니다. 만약 그 간극이 너무 크다고 판단하여 한 단계 더 높은 수준의 대안과 비교하고 싶다면, 저희가 작성한 Docker Compose 대비 Kubernetes 비교 분석 글을 읽어보시기 바랍니다.

Docker Compose 고아(Orphan) 컨테이너와 --remove-orphans 옵션 docker-compose.yaml 파일에서 서비스 하나를 지우고 docker compose up -d를 실행하면, 지웠던 서비스의 컨테이너는 계속 실행됩니다. 이 컨테이너는 프로젝트에서 분리되었지만 여전히 동일한 네트워크와 포트에 바인딩되어 있습니다. Compose는 현재 파일에 있는 내용만 나열하기 때문에 docker compose ps 명령어로는 이 컨테이너를 볼 수 없습니다. 하지만 Docker에는 여전히 해당 컨테이너에 레이블이 남아있기 때문에, docker ps --filter label=com.docker.compose.project= 명령어를 사용하면 확인할 수 있습니다.

이런 식으로 6개월이 지나서야 지난번 리팩토링 이후 오래된 워커 서비스가 조용히 RAM을 잡아먹고 있었다는 사실을 발견하게 됩니다. 해결책은 단 하나의 플래그를 추가하는 것입니다. 이 플래그는 Compose에게 이 프로젝트의 일부였지만 더 이상 파일에 존재하지 않는 모든 컨테이너를 제거하라고 지시합니다. 프로젝트를 위해 Compose가 생성한 네트워크 역시 매번 up이 호출될 때마다 동일한 방식으로 상태가 조정되므로, 고아 네트워크도 함께 사라집니다.

볼륨은 예외입니다. Compose는 기본적으로 데이터를 보호하기 위해 네임드 볼륨(named volumes)을 보존하며, 제거된 서비스가 사용하던 볼륨을 삭제하는 일회성 플래그는 존재하지 않습니다. 해당 공간을 되찾으려면 수동으로 처리해야 합니다. docker volume ls --filter dangling=true 명령으로 삭제 후보를 나열한 뒤, 이름을 확인하고 docker volume rm 명령을 통해 볼륨을 직접 지워주어야 합니다.

원문 보기

원문 보기 (영어)

Blog I am Philip—an engineer working at Distr, which helps software and AI companies distribute their applications to self-managed environments. Our Open Source Software Distribution platform is available on GitHub ( github.com/distr-sh/distr ) and orchestrates both Docker Compose and Docker Swarm deployments on customer hosts every day. Most of the production incidents I have seen on Docker Compose hosts come from the same handful of quirks: an old container that should have been removed, a disk that filled up overnight, a health check that detected a problem and then did nothing about it, a :latest tag that pointed somewhere new, or a socket mount nobody thought twice about. None of these are bugs in Docker. They are deliberate trade-offs in a tool that started as internal tooling at dotCloud, a PaaS company that wrapped LXC to fix “it works on my machine,” and is now running the back end of a lot of real businesses. This post collects the recurring ones, with the commands and the operational answer for each. Short answer: yes—plain Docker Compose can still run real production workloads in 2026, but only if you handle the operational gaps it leaves yourself. Where Plain Docker Compose Fits in Production Before the list of quirks, a quick word on the audience. Docker Compose is a declarative way to wire up a multi-container application: one YAML file describes the services, the networks between them, the volumes they share, the environment they need, and—through the patterns for overwriting or patching service configuration —the on-disk configuration each application expects. docker compose up reconciles the host to that file. The sweet spot in production is the single-node deployment built around exactly that—a vendor pushing a multi-container application into a customer environment, an internal team running a long-tail service that does not justify a Kubernetes cluster, an edge box in a retail location. The footprint is small, the operational overhead is low, and a competent operator can reason about the whole stack from one docker-compose.yaml . There is no control plane behind Compose itself—no scheduler watching the host, no reconciler reapplying state, no operator pushing updates from somewhere else. docker compose up runs once and exits. That architectural simplicity is exactly why the quirks bite. Compose assumes you—or whoever runs the host—will do the operational work nothing else is doing, and if you ship Compose files to customers the safe assumption is that the customer will not. The rest of this post is about closing the gap between what Compose does and what a production host actually needs, either by hand or with an agent that does it for you. If you have already concluded that the gap is too wide and want to compare with the next step up, read our Docker Compose vs Kubernetes breakdown. Docker Compose Orphan Containers and --remove-orphans Remove a service from docker-compose.yaml , run docker compose up -d , and the container you removed keeps running. It is detached from the project but still bound to the same networks and ports. docker compose ps will not show it, because Compose only lists what is in the current file. docker ps --filter label=com.docker.compose.project=<name> will, because Docker still has the label on the container. This is how you discover, six months in, that an old worker service has been quietly consuming RAM since the last refactor. The fix is one flag: The flag tells Compose: any container that was once part of this project but is no longer in the file should be removed. Networks Compose created for the project are reconciled the same way on each up , so orphan networks go away too. Volumes are the exception—Compose preserves named volumes by default to protect data, and there is no per-service flag to drop the ones a removed service used. To reclaim that space you have to do it manually: list candidates with docker volume ls --filter dangling=true and docker volume rm by name, or use docker compose down -v if you intend to wipe the project’s volumes wholesale. To audit before deleting, list everything Docker still associates with the project name: Distr’s Docker agent passes RemoveOrphans: true on every Compose Up call, so customer hosts never accumulate orphans across deployment updates. That single flag has eliminated a recurring class of “the old version is still answering on port 8080” support tickets. Pruning Docker Images and Capping Container Logs Every docker compose pull keeps the previous image on disk. Every container with the default json-file log driver writes unbounded JSON to /var/lib/docker/containers/<id>/<id>-json.log . On a busy host this is one of the most common reasons for an outage: the disk fills and Docker stops being able to write anything—logs, metadata, image layers—at which point containers start failing in confusing ways. The first thing to learn is the audit command: -v breaks the totals down per image, container, volume, and build cache, which is usually enough to spot the offender. From there, the targeted prune commands: docker volume prune -f exists too, and it is genuinely useful, but read the next aside before you run it. The other half of the disk story is logs. Cap them at the daemon level, once, in /etc/docker/daemon.json : After systemctl restart docker , every new container will rotate its logs at 10 MB and keep at most three rotated files—30 MB ceiling per container, instead of “until the disk is gone.” Existing containers need to be recreated to pick up the new defaults. This is one of the topics worth getting right before you ship. In Distr’s Docker agent the cleanup is built in: each deployment target has an opt-out container image cleanup setting that removes the previous version’s images automatically after a successful update, with retries on failure. It only fires on success, so the previous image stays on disk if something goes wrong and you need to roll back. Docker Health Checks Don’t Restart Unhealthy Containers This is the one that surprises people the most. You add a HEALTHCHECK to your Dockerfile or a healthcheck: block to the service in Compose, you watch the container go from healthy to unhealthy , and then… nothing happens. The Docker Engine reports the status. It does not act on it. restart: unless-stopped is triggered by the container exiting , not by it being marked unhealthy. You can confirm what Docker actually thinks: You will see the status, the streak of failures, and the last few probe outputs—useful information that is silently ignored by the engine. There are three answers to this: Run an autoheal sidecar. The community standard is willfarrell/docker-autoheal : a tiny container that mounts the Docker socket, watches for unhealthy events, and restarts the offending container. You opt containers in by labeling them autoheal=true (or set AUTOHEAL_CONTAINER_LABEL=all to monitor everything). Run on Docker Swarm. Swarm restarts unhealthy tasks by default. If you are already considering Swarm, this is one of the better reasons. Use Distr. Every Distr Docker agent deploys an adapted autoheal service alongside it. The “Enable autoheal for all containers” toggle is on by default at deployment-target creation, so customer-side restarts of unhealthy containers happen without anyone configuring it. Whichever path you pick, the takeaway is the same: a HEALTHCHECK without something acting on it is a status light, not a self-healing system. Pinning Docker Images by Digest Instead of :latest Docker tags are mutable references. myapp:1.4 today is whatever the registry currently has under that tag; tomorrow it can point at a different layer set after a re-push. :latest is the worst offender because everyone treats it as a synonym for “stable” when in practice it often means “whatever was pushed most recently.” It is also the silent default: an unqualified image: nginx in a Compose file is treated as image: nginx:latest , so even Compose files that nev

Docker 인프라 운영 배포 컨테이너 DevOps