Desktop Computer Automation

Editor's Note

Lets you automate desktop apps through screenshots and natural language instead of hunting for DOM elements or accessibility IDs. Works across macOS, Windows, and Linux by literally looking at what's on screen, then clicking, typing, and navigating based on visual AI models like Gemini or Qwen. The workflow is strictly synchronous: take a screenshot, let the model figure out what to do, execute one action, repeat. It takes over your actual mouse and keyboard, so this is for native apps, Electron UIs, or anything that can't run headless. Web stuff should stick to their browser automation skill instead. Commands can be slow since each involves AI inference, and you need to configure vision model credentials upfront.

Install

npx skills add https://github.com/web-infra-dev/midscene-skills --skill desktop-computer-automation

Votes

Installs2.9k

GitHub Stars223

Install

npx skills add https://github.com/web-infra-dev/midscene-skills --skill desktop-computer-automation

Desktop Computer Automation

Install

Desktop Computer Automation

Install

Related Frontend Development Skills

Related Frontend Development Skills