This is a Playwright-based agent loop that wires up Gemini 2.5's Computer Use model to actually drive a browser. You give it a goal and a starting URL, it takes screenshots, asks the model what to do next, executes the actions, and repeats until the task is done or you hit the turn limit. The standout piece is the safety confirmation flow: risky actions get flagged and require human approval before execution. It defaults to Chromium but you can point it at Chrome, Edge, or Brave. The operational advice is solid: run this in a sandbox, use the exclude flag to block actions you don't trust, and keep the viewport at 1440x900. It's a clean reference implementation if you want to build browser automation on top of Gemini's multimodal function calling.
npx skills add https://github.com/am-will/codex-skills --skill gemini-computer-use