This is the orchestrator for ToolUniverse's self-improvement loop. It chains together specialized devtu skills to discover APIs, generate tools, test them with researcher persona agents, fix verified bugs, optimize skill quality, and ship via git. The testing phase is interesting because it runs biologist personas to find real issues, but about 50% are false positives from MCP interface confusion, so you always verify via CLI before fixing. It also includes usefulness testing beyond just "does the tool work" to catch skills that dump data without interpretation. Use this when you want to run a full development cycle or coordinate multiple improvement phases. The repo has benchmarks and detailed anti-patterns for both code and skill design.
npx skills add https://github.com/mims-harvard/tooluniverse --skill devtu-self-evolve