Hacker News • 62일 전

PostHog, 고객 데이터 기반 자체 AI 모델 학습 (기본 opt-in 적용)

IMP

6/10

핵심 요약

프로덕트 분석 플랫폼 PostHog가 고객의 데이터를 활용해 자체 AI 모델을 학습하겠다고 발표했습니다. 신규 기능인 'PostHog Code'와 대규모 세션 리플레이 분석 등을 구현하기 위함이며, 미국 클라우드 사용자는 기본적으로 데이터 활용에 동의(Opt-in) 처리됩니다. 고객 데이터는 익명화되어 처리되며, 언제든 설정에서 데이터 제공을 거부(Opt-out)할 수 있어 개인정보 및 데이터 주권 측면에서 사용자의 명시적인 확인이 필요합니다.

번역된 본문

PostHog가 직접 AI 모델을 학습합니다 제임스 호킨스 (James Hawkins) 2026년 5월 27일 CEO 일기

우리가 만들고자 하는 것
이것이 작동하는 방식
왜 Opt-in(사전 동의)이 아니라 Opt-out(사후 거부)인가

앞으로 6개월 동안 우리가 가장 훌륭한 결과물을 내기 직전에 있다고 확신합니다. 지난 1년 동안 우리는 AI 설치 마법사(Installation wizard), PostHog AI, 그리고 MCP 등 AI 기반 기능을 PostHog에 더 많이 도입하기 시작했습니다. 이 기능들은 모두 엄청난 인기를 끌고 있지만, 이것은 시작에 불과합니다.

PostHog의 다음 장은 더욱 능동적이고 자율적인(Self-driving) 제품을 만드는 것입니다. 여러분을 위해 대답과 해결책을 제시하고, 이를 실행하며, 시간이 지남에 따라 개선되는 제품 말입니다. 이것이 현재 베타 버전인 PostHog Code에 대한 비전입니다.

이를 가능하게 하고, 이와 유사한 더 많은 제품을 만들기 위해 우리는 새로운 시도를 하고자 합니다. 바로 PostHog 내의 데이터를 사용하여 모델을 학습시키는 것입니다.

우리가 만들고자 하는 것 여기에는 두 가지 목표가 있습니다:

기존 제품을 더 똑똑하고, 더 능동적이며, 사용자에게 더 유용하게 만들기
PostHog Code와 같이 팀이 더 나은 제품을 더 빠르게 만들 수 있도록 돕는 완전히 새로운 제품 구축

우리가 특히 관심을 가지는 첫 번째 분야는 세션 리플레이(Session replay) 분석입니다. PostHog AI는 이미 리플레이에서 문제를 감지할 수 있지만, 비용이 많이 들고 규모를 확장하기가 쉽지 않습니다. 우리는 리플레이가 개별 사용자의 문제를 진단할 때뿐만 아니라 대규모 환경에서도 강력하게 작동하길 원하며, 리플레이를 구동하는 기반 데이터로 학습된 모델이 이를 달성하는 데 도움이 될 것이라고 생각합니다.

제가 특히 기대하는 또 다른 아이디어는 합성 사용자 테스트(Synthetic user testing)입니다. 즉, 사용자 행동에 대한 우리의 지식을 활용하여 실제 프로덕션 환경에 배포하기 전에 사용자가 혼란스러워할 부분이나 문제가 발생할 수 있는 흐름을 파악하는 것입니다. 코딩 모델이 발전함에 따라 많은 사람이 테스트 및 코드 리뷰의 업무량이 크게 증가하는 것을 목격하고 있습니다. 우리는 이를 자동화하여 여러분이 제품 자체에 집중할 수 있도록 만들고 싶습니다.

또한, 사용자 행동 예측을 더 잘하게 되면 이미 출시한 기능의 전환율을 높이고 사용자의 불만을 줄일 수 있는 변경 사항을 제안할 수 있을 것입니다. 이러한 작업을 자동화해 드린다면, 여러분은 수동 분석에 소비하는 시간을 줄이고 토큰(Token) 소모도 줄일 수 있습니다.

우리의 아이디어들은 아직 실험적입니다. 어떻게 하면 모델을 효과적으로 학습시킬 수 있을지, 어떤 데이터가 실제로 유용한지 파악하려면 반복적인 개선이 필요할 것입니다. 하지만 지금까지 제품을 더 간단하고 강력하게 만드는 방식으로 AI를 도입했을 때마다 항상 좋은 결과가 있었기 때문에, 시도해 볼 가치가 있다고 생각합니다.

이것이 작동하는 방식 우리는 사용자의 관점, 특히 트레이드오프(Trade-off) 측면에서 이 문제에 대해 많은 고민을 했습니다.

장점은 앞서 설명한 것과 같은 제품의 개선입니다. 대부분의 도구가 최고의 코드를 제공하는 데 집중한다면, 우리는 여러분의 제품이 최고가 될 수 있도록 에너지를 집중하고 싶습니다. 그래서 우리는 PostHog Code를 '제품 에디터(Product editor)'라고 부릅니다.

단점은 이를 위해 PostHog 내의 데이터를 사용하여 모델을 학습시켜야 한다는 것입니다. 대부분의 기업은 이러한 변경 사항을 눈에 띄지 않고 지루한 약관(T&Cs) 업데이트 속에 숨겨두겠지만, 우리는 투명성을 중요하게 생각하므로 인터넷 친화적인 번호 매기기 목록으로 여러분이 알아야 할 사항을 공유합니다:

EU 클라우드 인스턴스 사용자는 기본적으로 데이터 활용에 동의하지 않음(Opt-out) 상태입니다.
데이터 학습을 금지하는 계약(BAA, MSA 등)을 체결한 사용자도 마찬가지로 Opt-out 상태입니다.
미국(US) 클라우드 인스턴스의 다른 모든 사용자는 기본적으로 데이터 활용에 동의(Opt-in)한 상태입니다.
모든 데이터는 학습에 사용되기 전에 익명화될 것입니다.
PostHog 인스턴스에 이미 존재하는 데이터만 사용할 것입니다.
모든 모델 학습은 우리가 직접 수행합니다. 이는 다음을 의미합니다:
- 고객의 데이터를 서드파티(Third-party) 모델 제공업체에 판매하거나 전송하지 않습니다.
PostHog 내의 조직 설정(관리자 권한 필요)을 통해 언제든지 Opt-out(거부)하실 수 있습니다.
데이터 학습은 6월 29일까지 시작되지 않으므로 결정할 시간이 충분합니다.

소통과 관련하여 우리는 다음과 같이 진행하고 있습니다:

모든 고객에게 이메일을 보내 이메일의 주요 내용이 무엇인지 매우 명확하게 알리고 있습니다.
(이메일을 읽지 않을 경우를 대비해) 인앱 알림을 통해 모든 사용자에게 알리고 있습니다.
(이 게시물처럼) 우리의 계획을 매우 공개적으로 소통하고 있습니다.

강조하고 싶은 점은, 우리의 목표가 모델을 공개하거나 판매하기 위한 것이 아니라 고객을 위한 제품으로서 PostHog를 개선하는 것이라는 사실입니다.

원문 보기

원문 보기 (영어)

Training our own AI models James Hawkins May 27, 2026 CEO diaries Contents What we want to build How this will work Why this is opt out, not opt in I really think we're on the verge of some of our best work through the next six months. Over the past year, we've started building more AI-powered features into PostHog, like our AI installation wizard , PostHog AI , and our MCP . They're all wildly popular, but they're only the start. PostHog's next chapter is about building more proactive, self-driving products. Products that surface answers and solutions for you, act on them, and improve over time. This is the vision for PostHog Code , which is now in beta. To enable this and more products like it, we want to try something new. We want to train models on data in PostHog. What we want to build We have two goals here: Make our existing products smarter, more proactive, and useful to you Build entirely new products, like PostHog Code, that help teams build better products, faster The first area we're interested in is session replay analysis . PostHog AI can already detect issues in replays, but it's expensive and doesn't scale well. We want replays to be as powerful at scale as they are for diagnosing the problems of individual users, and we think a model trained on the underlying data that powers replays will help us achieve this. Another idea I'm especially excited about is synthetic user testing – i.e. using our knowledge of user behavior to identify when users might get confused, or what flows might break, before you ship to production. As coding models improve, many people are seeing test and review workload increase hugely. We want to automate this, so you can focus on your product. And, if we can get better at predicting user behavior , we should be able to suggest changes that will improve conversion, and reduce user frustration, for features you've already shipped as well. If we can automate this work for you, you'll spend less time on manual analysis and burn fewer tokens in the process. Our ideas here are experimental. It will take iteration to figure out how to train models effectively, and what data is actually useful. But, so far, every time we've added AI in a way that makes the product simpler or more powerful, it's worked well, so we think it's worth trying. How this will work We've spent a lot of time thinking about this from a user perspective, especially the tradeoffs. The upside is the kinds of improvements described above. Most tools are focused on providing you with the best code; we want to focus our energy into making your product the best it can be. This is why we describe PostHog Code as a product editor. The downside is that this involves using data in PostHog to train models. Most companies would bury this change in a deceptively boring T&Cs update, but we value transparency, so here's what you need to know in an internet-friendly numbered list: Users on our EU cloud instance are opted out by default So too users with agreements that prevent training (e.g. BAA, MSA, or similar) All other users on our US cloud instance are opted in by default We will anonymize all data before it's used for training We will only use data that already exists in your PostHog instance We will do all the model training ourselves, which means... We won't sell or send your data to third-party model providers You can opt out at any time via your org settings in PostHog (admin access required) Training won't start until June 29, so there's plenty of time to decide In terms of comms, we are: Emailing all our customers and making it super obvious what the email is about Notifying all our users through in-app notifications (in case you don't read emails) Communicating our plans very publicly (like in this post) I want to stress that our goal here is to improve PostHog as a product for our customers, not to expose or sell models trained on your data, or monetize your data. Why this is opt out, not opt in Put simply, because otherwise we will not have enough data to train a model that's actually useful. If you choose to opt out, the new features that we're building with these models won't be available to you, as they'll depend on this data. If you're opted out by default (e.g. because you're on our EU cloud instance), you can choose to opt in manually provided any legal agreements you have with us don't exclude this option. We're choosing to be upfront about this rather than quietly rolling something out, because we think that's the right way to do it. If you want to talk about this, I'm james at you can guess it. We're also hiring AI researchers , so get in touch if you want to work on this with us. PostHog is an all-in-one developer platform for building successful products. We provide product analytics , web analytics , session replay , error tracking , feature flags , experiments , surveys , AI Observability , logs , workflows , endpoints , data warehouse , CDP , and an AI product assistant to help debug your code, ship features faster, and keep all your usage and customer data in one stack. Community questions Ask a question

데이터 프라이버시 AI 모델 학습 사용자 동의 PostHog 제품 분석