This is a full web scraping toolkit that converts websites to markdown, extracts structured data, and handles JavaScript-heavy pages. The standout feature is schema-based CSS extraction: you generate a JSON schema once using an LLM, then scrape repeatedly without any API costs. It's got both a CLI for quick jobs and a Python SDK for complex workflows. The content filtering is solid, using BM25 to strip irrelevant sections before you even process the text. If you're building documentation scrapers, price monitors, or any pipeline that needs clean structured data from messy web pages, this covers the spectrum from simple markdown conversion to authenticated multi-page crawls with proxy rotation.
npx skills add https://github.com/brettdavies/crawl4ai-skill --skill crawl4ai