메뉴
BL
MarkTechPost 23일 전

클로크브라우저 자동화 실습 가이드

IMP
6/10
핵심 요약

클로크브라우저(CloakBrowser)를 활용해 탐지를 우회하는 스텔스(Stealth) 크롬 환경에서 브라우저 자동화를 구축하는 튜토리얼입니다. 구글 코랩(Colab)과 같은 비동기 루프 환경에서 발생하는 오류를 스레드 분리로 해결하고, 세션 상태 저장 및 브라우저 신호 검출 등 핵심 실습 과정을 다룹니다. 웹 스크래핑 및 자동화 실무자들이 계정 보호와 안정적인 작업 수행을 위해 참고할 만한 내용입니다.

번역된 본문

에디터 추천 | 에이전트 AI | AI 에이전트 | 튜토리얼

이 튜토리얼에서는 스텔스(Stealth) 크롬 환경에서 Playwright 스타일의 API를 사용하는 파이썬 친화적인 브라우저 자동화 도구인 클로크브라우저(CloakBrowser)를 살펴봅니다. 먼저 클로크브라우저를 설정하고 필요한 브라우저 바이너리를 준비한 뒤, 별도의 워커 스레드에서 동기식 브라우저 워크플로를 실행하여 코랩(Colab) 환경에서 자주 발생하는 asyncio 루프 충돌 문제를 해결합니다.

이어서 실질적인 자동화 단계를 진행합니다. 여기에는 브라우저 실행, 맞춤형 브라우저 컨텍스트 생성, 브라우저에서 감지되는 신호 검사, 로컬 테스트 페이지 상호작용, 세션 상태 저장, localStorage 복원, 영구적인 브라우저 프로필 사용, 스크린샷 캡처, 그리고 파싱을 위한 렌더링된 페이지 콘텐츠 추출 등이 포함됩니다.

코드 복사 | 다른 브라우저 사용

import os import sys import json import time import shutil import base64 import subprocess import concurrent.futures from pathlib import Path from datetime import datetime from textwrap import dedent

def run_cmd(cmd, check=True, capture=False): print(f"\n$ {' '.join(cmd)}") result = subprocess.run( cmd, check=check, text=True, stdout=subprocess.PIPE if capture else None, stderr=subprocess.STDOUT if capture else None, ) if capture and result.stdout: print(result.stdout[:4000]) return result

print("CloakBrowser 및 헬퍼 패키지 설치 중...") run_cmd([ sys.executable, "-m", "pip", "install", "-q", "-U", "cloakbrowser", "playwright", "pandas", "beautifulsoup4" ])

print("\nColab을 위한 Chromium 런타임 종속성 설치 중...") try: run_cmd([sys.executable, "-m", "playwright", "install-deps", "chromium"], check=False) except Exception as e: print("종속성 설치 프로그램 경고:", repr(e))

from cloakbrowser import ( launch, launch_context, launch_persistent_context, ensure_binary, binary_info, ) import pandas as pd from bs4 import BeautifulSoup from IPython.display import display, Image

WORKDIR = Path("/content/cloakbrowser_advanced_tutorial") WORKDIR.mkdir(parents=True, exist_ok=True)

SCREENSHOT_PATH = WORKDIR / "cloakbrowser_result.png" STORAGE_STATE_PATH = WORKDIR / "storage_state.json" PROFILE_DIR = WORKDIR / "persistent_profile"

print("\nCloakBrowser 바이너리 준비 중...") try: ensure_binary() except Exception as e: print("바이너리 설정 경고:", repr(e))

print("\nCloakBrowser 바이너리 정보:") try: info = binary_info() print(json.dumps(info, indent=2, default=str)) except Exception as e: print("바이너리 정보를 읽을 수 없습니다:", repr(e))

먼저 코랩 환경에서 브라우저 자동화와 결과 분석을 원활하게 진행할 수 있도록 클로크브라우저, Playwright, pandas, BeautifulSoup을 설치합니다. 또한 Chromium 런타임 종속성을 설치하고, 클로크브라우저의 주요 실행 유틸리티를 불러온 다음 스크린샷, 저장소 상태, 영구 프로필을 위한 작업 경로를 설정합니다. 그런 다음 자동화를 실행하기 전에 클로크브라우저 바이너리를 준비하고 브라우저 엔진이 올바르게 설치되었는지 확인하기 위해 세부 정보를 출력합니다.

코드 복사 | 다른 브라우저 사용

def make_data_url(html: str) -> str: encoded = base64.b64encode(html.encode("utf-8")).decode("ascii") return f"data:text/html;base64,{encoded}"

def print_section(title): print("\n" + "=" * 80) print(title) print("=" * 80)

def safe_close(obj, label="object"): try: if obj: obj.close() except Exception as e: print(f"{label}을(를) 닫는 중 경고 발생: {e}")

def run_sync_browser_job_in_thread(fn, *args, **kwargs): """ Google Colab 및 Jupyter에는 이미 asyncio 이벤트 루프가 실행되고 있습니다. 클로크브라우저는 현재 다음과 같은 Playwright 스타일의 동기식 헬퍼를 제공합니다: - launch() - launch_context() - launch_persistent_context()

Playwright의 동기식 API는 이미 실행 중인 이벤트 루프 내에서는 작동할 수 없습니다.
따라서 전체 브라우저 자동화 작업을 별도의 스레드 내에서 실행합니다.
"""
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
    future = executor.submit(fn, *args, **kwargs)
    return future.result()

test_page_html = dedent("""

CloakBrowser Local Automation Lab
원문 보기
원문 보기 (영어)
Editors Pick Agentic AI AI Agents Tutorials In this tutorial, we explore CloakBrowser , a Python-friendly browser automation tool that uses Playwright-style APIs within a stealth Chromium environment. We begin by setting up CloakBrowser, preparing the required browser binary, and resolving the common Colab asyncio loop issue by running the sync browser workflow in a separate worker thread. We then move through practical automation steps, including launching a browser, creating customized browser contexts, inspecting browser-visible signals, interacting with a local test page, saving session state, restoring localStorage, using persistent browser profiles, capturing screenshots, and extracting rendered page content for parsing. Copy Code Copied Use a different Browser import os import sys import json import time import shutil import base64 import subprocess import concurrent.futures from pathlib import Path from datetime import datetime from textwrap import dedent def run_cmd(cmd, check=True, capture=False): print(f"\n$ {' '.join(cmd)}") result = subprocess.run( cmd, check=check, text=True, stdout=subprocess.PIPE if capture else None, stderr=subprocess.STDOUT if capture else None, ) if capture and result.stdout: print(result.stdout[:4000]) return result print("Installing CloakBrowser and helper packages...") run_cmd([ sys.executable, "-m", "pip", "install", "-q", "-U", "cloakbrowser", "playwright", "pandas", "beautifulsoup4" ]) print("\nInstalling Chromium runtime dependencies for Colab...") try: run_cmd([sys.executable, "-m", "playwright", "install-deps", "chromium"], check=False) except Exception as e: print("Dependency installer warning:", repr(e)) from cloakbrowser import ( launch, launch_context, launch_persistent_context, ensure_binary, binary_info, ) import pandas as pd from bs4 import BeautifulSoup from IPython.display import display, Image WORKDIR = Path("/content/cloakbrowser_advanced_tutorial") WORKDIR.mkdir(parents=True, exist_ok=True) SCREENSHOT_PATH = WORKDIR / "cloakbrowser_result.png" STORAGE_STATE_PATH = WORKDIR / "storage_state.json" PROFILE_DIR = WORKDIR / "persistent_profile" print("\nPreparing CloakBrowser binary...") try: ensure_binary() except Exception as e: print("Binary setup warning:", repr(e)) print("\nCloakBrowser binary info:") try: info = binary_info() print(json.dumps(info, indent=2, default=str)) except Exception as e: print("Could not read binary info:", repr(e)) We start by installing CloakBrowser, Playwright, pandas, and BeautifulSoup so the Colab environment has everything needed for browser automation and result analysis. We also install Chromium runtime dependencies, import the main CloakBrowser launch utilities, and define the working paths for screenshots, storage state, and persistent profiles. We then prepare the CloakBrowser binary and print its details to confirm the browser engine is installed correctly before running automation. Copy Code Copied Use a different Browser def make_data_url(html: str) -> str: encoded = base64.b64encode(html.encode("utf-8")).decode("ascii") return f"data:text/html;base64,{encoded}" def print_section(title): print("\n" + "=" * 80) print(title) print("=" * 80) def safe_close(obj, label="object"): try: if obj: obj.close() except Exception as e: print(f"Warning while closing {label}: {e}") def run_sync_browser_job_in_thread(fn, *args, **kwargs): """ Google Colab and Jupyter already run an asyncio event loop. CloakBrowser currently exposes Playwright-style sync helpers such as: - launch() - launch_context() - launch_persistent_context() Playwright's sync API cannot run inside an already-running event loop. Therefore, we run the entire browser automation job inside a separate thread. """ with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor: future = executor.submit(fn, *args, **kwargs) return future.result() test_page_html = dedent(""" <!doctype html> <html> <head> <meta charset="utf-8"> <title>CloakBrowser Local Automation Lab</title> <style> body { font-family: system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif; max-width: 900px; margin: 40px auto; padding: 24px; line-height: 1.5; background: #f7f7f7; color: #222; } .card { background: white; border-radius: 18px; padding: 24px; box-shadow: 0 8px 30px rgba(0,0,0,0.08); margin-bottom: 18px; } label { display: block; margin-top: 12px; font-weight: 600; } input, textarea, button { width: 100%; box-sizing: border-box; padding: 12px; margin-top: 8px; border: 1px solid #ccc; border-radius: 12px; font-size: 15px; } button { cursor: pointer; background: #111; color: white; font-weight: 700; } pre { background: #111; color: #00ff99; padding: 16px; overflow-x: auto; border-radius: 12px; } </style> </head> <body> <div class="card"> <h1>CloakBrowser Local Automation Lab</h1> <p> This page runs locally from a data URL. We use it to inspect browser-visible properties and demonstrate Playwright-style interaction safely. </p> </div> <div class="card"> <h2>Interaction Form</h2> <label>Name</label> <input id="name" placeholder="Type your name here"> <label>Message</label> <textarea id="message" rows="4" placeholder="Type a short message"></textarea> <button id="submit">Submit Local Form</button> <p id="status">Waiting for interaction...</p> </div> <div class="card"> <h2>Browser Signals</h2> <pre id="signals"></pre> </div> <script> async function collectSignals() { const canvas = document.createElement("canvas"); const gl = canvas.getContext("webgl") || canvas.getContext("experimental-webgl"); let webglVendor = null; let webglRenderer = null; if (gl) { const debugInfo = gl.getExtension("WEBGL_debug_renderer_info"); if (debugInfo) { webglVendor = gl.getParameter(debugInfo.UNMASKED_VENDOR_WEBGL); webglRenderer = gl.getParameter(debugInfo.UNMASKED_RENDERER_WEBGL); } } const signals = { title: document.title, userAgent: navigator.userAgent, webdriver: navigator.webdriver, platform: navigator.platform, languages: navigator.languages, language: navigator.language, hardwareConcurrency: navigator.hardwareConcurrency, deviceMemory: navigator.deviceMemory || null, pluginsLength: navigator.plugins ? navigator.plugins.length : null, chromeObjectPresent: typeof window.chrome === "object", timezone: Intl.DateTimeFormat().resolvedOptions().timeZone, screen: { width: screen.width, height: screen.height, colorDepth: screen.colorDepth, pixelDepth: screen.pixelDepth }, viewport: { innerWidth: window.innerWidth, innerHeight: window.innerHeight, devicePixelRatio: window.devicePixelRatio }, webglVendor, webglRenderer, localStorageWorks: (() => { try { localStorage.setItem("cloakbrowser_test", "ok"); return localStorage.getItem("cloakbrowser_test") === "ok"; } catch (e) { return false; } })() }; document.getElementById("signals").textContent = JSON.stringify(signals, null, 2); return signals; } document.getElementById("submit").addEventListener("click", () => { const name = document.getElementById("name").value; const message = document.getElementById("message").value; localStorage.setItem("tutorial_name", name); localStorage.setItem("tutorial_message", message); document.getElementById("status").textContent = `Saved locally for ${name}: ${message}`; }); collectSignals(); </script> </body> </html> """).strip() TEST_PAGE_URL = make_data_url(test_page_html) We define helper functions for creating data URLs, printing section headers, safely closing browser objects, and running synchronous browser jobs inside a separate thread. We use the thread wrapper because Google Colab already runs an asyncio loop, and this prevents Playwright’s sync API from failing. We also create a safe local HTML test page that collects browser-visible signals, supports form interaction, and stores test values in localStorage. Copy Code Copied Use a different Browser def cloakbrowser_tutorial_job(): results = { "basic_launch": None, "advanced_context": None, "storage_restore": None, "persistent_profile": None, "rendered_extraction": None, "static_parsing": None, "e