HN
Hacker News • 40일 전
I prompted ChatGPT, Claude, Perplexity, and Gemini and watched my Nginx logs
IMP 3/10
핵심 요약
[요약 오류] I prompted ChatGPT, Claude, Perplexity, and Gemini and watched my Nginx logs
원문 보기 (영어)
Last month I wanted a straight answer to a question most AI visibility write-ups dodge. When someone asks ChatGPT, Claude, Perplexity, Gemini, or Google AI Mode about a site I own, does that AI product actually fetch the page, or does it answer from an index it built earlier? The way to get a straight answer was the unfashionable one. Read the nginx access log. This post walks through what the logs captured across the five AI products I tested, what they did not capture, and what that difference lets a product safely track. Every claim in the sections that follow is either something the server logged or a structural fact documented by the vendor. Two signals, not one A marketer saying “my site got traffic from AI” could mean two different things, and the logs prove they are different things. Provider-side fetch. The AI provider itself hits my origin. The request usually arrives with a dedicated user-agent token, usually with no referrer, and usually inside a short burst while the model is deciding which page to cite. Real clickthrough visit. A human reads the AI answer, clicks a citation link, and arrives as a normal browser. Chrome-shaped user-agent, normal cookies, the AI product as the referrer. Collapsing these into one AI-traffic number papers over the most useful distinction in the data. One is the model reaching out to read you. The other is a human reading you because the model pointed. Different lever, different measurement, different copy. How I instrumented the experiment Nothing exotic. A custom nginx log format that captures the bits the default combined format compresses out, plus a tail -F next to a browser tab in each AI product. log_format ai_probe '$time_iso8601 $remote_addr "$request" $status ' '"$http_user_agent" "$http_referer"'; I prompted each AI product with questions engineered to likely produce a citation or require a page fetch against a domain I control. I reran the same prompts across sessions and IPs so a transient cache hit would not hide the retrieval path. What ChatGPT did Captured, reproducibly, across multiple runs: User-agent contained ChatGPT-User/1.0 . No referrer on the captured requests. Multiple candidate pages fetched in tight bursts while ChatGPT was composing an answer. More than one source IP observed inside the same burst. 2026-03-18T14:23:41+00:00 203.0.113.42 "GET /a HTTP/1.1" 200 "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot" "-" 2026-03-18T14:23:41+00:00 203.0.113.58 "GET /b HTTP/1.1" 200 "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot" "-" 2026-03-18T14:23:42+00:00 203.0.113.42 "GET /c HTTP/1.1" 200 "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot" "-" This is enough to state the finding plainly. ChatGPT performs provider-side origin retrieval through ChatGPT-User . The burst pattern across multiple IPs matches OpenAI’s own description of the agent in its bots documentation . What Claude did Captured: User-agent contained Claude-User/1.0 . No referrer on the captured requests. /robots.txt requested first. Redirects followed normally. A /plugins request turning into /plugins/ was handled as expected. 2026-03-19T09:14:08+00:00 198.51.100.11 "GET /robots.txt HTTP/1.1" 200 "Mozilla/5.0 (compatible; Claude-User/1.0; +claudebot@anthropic.com)" "-" 2026-03-19T09:14:09+00:00 198.51.100.11 "GET /plugins HTTP/1.1" 301 "Mozilla/5.0 (compatible; Claude-User/1.0; +claudebot@anthropic.com)" "-" 2026-03-19T09:14:09+00:00 198.51.100.11 "GET /plugins/ HTTP/1.1" 200 "Mozilla/5.0 (compatible; Claude-User/1.0; +claudebot@anthropic.com)" "-" Same kind of finding. Claude performs provider-side origin retrieval through Claude-User . The robots precheck matches Anthropic’s documented behavior in its crawler docs . What Perplexity did Captured: User-agent contained Perplexity-User/1.0 . A direct fetch observed on a specific product page. No referrer on that request. 2026-03-20T17:02:33+00:00 192.0.2.73 "GET /plugins/product-builder-for-woocommerce/ HTTP/1.1" 200 "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; Perplexity-User/1.0; +https://perplexity.ai/perplexity-user" "-" One thing I will not generalize. Perplexity fetched the origin in the runs I captured, but I only captured a few Perplexity runs, and Perplexity is architecturally capable of answering from its own index without hitting the origin. The safe wording is that Perplexity can perform direct origin retrieval; whether it always does is not something one log file can prove. See Perplexity’s bots documentation for their own framing. What Google and Gemini did not prove I captured real clickthrough visits from https://gemini.google.com/ and https://www.google.com/ . Normal browser user-agents, tester IP, Google-product referrer. Those are real people arriving after reading an AI answer. I did not capture a distinct provider-side fetch for Gemini or Google AI Mode. The structural reason matters more than the gap. Google does not publish a distinct retrieval user-agent for Gemini. Per Google’s own crawler documentation , AI Overviews and AI Mode answer from the same Search index that Googlebot populates. A Gemini-User token that would show up in an access log does not exist, because Google does not emit one. Three practical consequences worth stating out loud. A Googlebot hit on your origin cannot be attributed to Gemini versus regular Search from the request alone. Blocking Google-Extended does not stop Googlebot . It only controls whether Googlebot -crawled content may be used for Gemini training and grounding. “Google did not fetch my page during my test” is not structurally observable the way it is for ChatGPT or Claude. Silence from Google is not evidence of no fetch. This asymmetry is the single most misreported finding in AI crawler write-ups. It should not be. What a product can safely track Putting only the proven layers together, there are two tracking classes a product can offer without overclaiming. Provider fetch Vendor-documented retrieval user-agents hitting your origin. ChatGPT-User (OpenAI) Claude-User (Anthropic) Perplexity-User (Perplexity) Meta-ExternalFetcher (Meta, documented retrieval bot I did not observe in this run but it belongs in the same class) Real visit Normal browser user-agent with an AI product as the referrer. chatgpt.com claude.ai perplexity.ai gemini.google.com google.com as a broader Google-origin bucket, with no way to isolate AI Mode from classic Search using HTTP alone Search-indexing bots ( OAI-SearchBot , Claude-SearchBot , PerplexityBot , Googlebot , Bingbot ) should not be folded into the provider-fetch bucket. They are not a live retrieval signal for any specific user question. Mixing them in turns the metric into noise. Training bots ( GPTBot , ClaudeBot , CCBot ) are a separate signal again, and they have no business inside a retrieval count. Why careful wording matters Any metric that says “ChatGPT fetched your page” when the log actually shows PerplexityBot is a product that will be right about trends and wrong about individual rows. The first time a user looks at a row and knows it is wrong, the whole dashboard loses credibility. Careful wording is boring and it is the only wording that survives a smart customer checking one row. Appendix: vendor-documented bot taxonomy The experiment proved three retrieval bots in action: ChatGPT-User , Claude-User , Perplexity-User . The table below is the full vendor-documented set across the major labs, classified by purpose. retrieval : user-initiated fetch, typically when a human pastes a link into the AI product or an agent follows a link on the user’s behalf. search_indexing : crawls pages so the AI product can cite them from its index at answer time. training : collects pages to train future models; not a live retrieval signal. Five points to carry out of this table. There is no de
관련 소식
HN
Hacker News • 40일 전
IMP 8
오픈AI 광고 파트너, '프롬프트 관련성' 기반 ChatGPT 광고 판매
오픈AI의 광고 파트너인 StackAdapt가 ChatGPT 내부 광고 테스트를 위해 광고주들을 조용히 모집하고 있습니다. 이들은 사용자의 프롬프트와 관련성이 높은 광고를 게재하며, 낮은 CPM과 할인된 수수료를 내세워 초기 접근 기회를 강조하고 있습니다. 이는 ChatGPT가 본격적인 수익화를 위해 광고 시스템을 구축하고 있음을 시사하며, 마케팅 실무자들에게 새로운 광고 채널의 등장을 의미합니다.
OpenAI ChatGPT 광고
GP
r/ChatGPT • 41일 전
IMP 2
맥도날드 고객센터 봇이 무료인데 ChatGPT 돈 낼 필요 있나요?
맥도날드 고객센터 챗봇을 'McGPT'라는 이름으로 소개하며, 유료 AI 서비스 대신 무료 챗봇을 실험해보는 커뮤니티의 유머러스한 시도를 다룹니다. 사용자는 고객센터 봇에 다양한 프롬프트를 던져봄으로써, 기업용 봇의 한계와 창의적 활용 가능성을 동시에 보여줍니다. 이 게시글은 무료 AI 도구와 커뮤니티의 놀이 문화가 결합된 사례로, 실무자들에게 대중적 AI 접근성을 시사합니다.
챗봇 무료 AI 커뮤니티