Web Scraping

5.1k installs237 stars

Summary

A proper scraping cascade that tries trafilatura first, falls back to requests with rotated user agents, then escalates to Playwright with stealth mode if the site runs JavaScript or basic anti-bot checks. The code is clean and tracks which method succeeded. The anti-bot landscape section is honest about what playwright-stealth actually handles (navigator.webdriver patches, fingerprint evasion) versus what it doesn't (TLS fingerprinting, Cloudflare Turnstile). The async Playwright variant for Jupyter notebooks is a nice touch since sync Playwright breaks in notebook event loops. This won't beat DataDome or sophisticated bot management, but it covers the 80% case where you just need content extraction with reasonable resilience.

Install to Claude Code

npx -y skills add jamditis/claude-skills-journalism --skill web-scraping --agent claude-code

Installs into .claude/skills of the current project.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

CodeScene MCP Server

Your agent targets a perfect 10 Code Health score. Deterministic. Every commit.

Try For Free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

CodeScene MCP Server

Your agent targets a perfect 10 Code Health score. Deterministic. Every commit.

Try For Free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Files

SKILL.md

Select a file.

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

CodeScene MCP Server

Your agent targets a perfect 10 Code Health score. Deterministic. Every commit.

Try For Free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

View on GitHub

Web Scraping

Install to Claude Code

Web Scraping

Install to Claude Code

Recommended

Recommended