This gives Claude the ability to control HarmonyOS NEXT devices by running Midscene CLI commands that capture screenshots and execute touch actions based on what it sees. The workflow is deliberate: run one command, read the screenshot, decide the next move. No background execution, no chaining. Each action (tap, scroll, type, drag) takes about a minute because there's AI inference happening behind the scenes. You need API keys for vision models, and two-finger gestures won't work since HarmonyOS automation doesn't expose multi-touch. If you're testing apps on Huawei's new OS or need reproducible device automation without manually scripting every coordinate, this is the toolchain. Part of a larger Midscene suite that covers browsers, desktops, Android, and iOS.
npx skills add https://github.com/web-infra-dev/midscene-skills --skill harmonyos-device-automation