Browser Automation

Editor's Note

This is vision-driven browser automation that works from screenshots instead of the DOM, powered by Midscene.js. It runs in headless Puppeteer by default, but can also connect to your existing Chrome via CDP or a browser extension to preserve login sessions. The skill handles clicks, form fills, scrolling, and multi-step workflows by analyzing what's actually visible on screen. One thing to know: it requires a vision model (Gemini, Qwen, Doubao) configured via environment variables, and commands run one at a time since each step involves AI inference on screenshots. Use this when you need to scrape data, test UI, or automate web tasks without wrestling with selectors or accessibility trees.

Install

npx skills add https://github.com/web-infra-dev/midscene-skills --skill browser-automation

Votes

Installs2.9k

GitHub Stars223

Browser Automation

Install

Browser Automation

Install

Related Frontend Development Skills

Related Frontend Development Skills