A Playwright reporter and MCP server that accumulates test run history into SQLite and exposes it through eight analysis tools. You get flakiness rankings by test and browser combination, exact commit SHAs where tests went from stable to flaky (using GITHUB_SHA and similar CI variables), semantic error clustering that normalizes UUIDs and dynamic values with Levenshtein matching, and trend analysis over arbitrary day ranges. The reporter writes every test result to the database automatically. Useful when you need to distinguish historical flakes from new regressions in CI, or when you want an AI agent to answer whether a failing test has been unreliable for weeks. Pairs well with playwright-trace-decoder-mcp for combining "what failed this time" with "has this always been flaky."
claude mcp add --transport stdio vola-trebla-flakiness-knowledge-graph-mcp uvx flakiness-knowledge-graph-mcp