This pulls article content from 12 news platforms (WeChat, Toutiao, BBC, CNN, Twitter, Quora, and six others) and spits out both JSON and Markdown files. It auto-detects the platform from the URL, handles the scraping with curl_cffi for browser simulation, and includes retry logic via tenacity. Twitter support works for public posts without auth, but protected tweets need cookies. The whole thing is self-contained with no external dependencies beyond the Python packages, so you can drop it into other projects. Main use case is when you need structured article data or clean Markdown instead of dealing with different site layouts yourself. Just run it with uv, point it at a URL, and it handles the extraction.
npx skills add https://github.com/nanmicoder/newscrawler --skill news-extractor