If you need to automate Android UI testing but don't have access to the DOM or accessibility tree, this handles it with computer vision instead. It works directly from screenshots, which means you can test apps where traditional automation tools can't hook in, or where the UI hierarchy is a mess. The approach is straightforward: you tell it what to do, it looks at the screen, finds the elements visually, and interacts with them. It's from the Midscene project (223 stars, 1.6K installs), so there's some community validation. The tradeoff is obvious: vision-based automation is inherently less precise than DOM-based, but when you literally can't access the structure, this gives you an option that actually works.
npx skills add https://github.com/web-infra-dev/midscene-skills --skill android-device-automation