This is a structured experimentation framework for optimizing anything measurable through parallel trials and automatic convergence. You define a spec with a primary metric (either hard numbers like build time or LLM-as-judge scores for quality), set degenerate gates to filter out regressions, and it runs batches of experiments while keeping a crash-safe append-only log on disk. The discipline here is serious: every result gets written and verified immediately, not cached in conversation context, because these runs take hours and sessions die. Works well for tuning prompts, search relevance, clustering parameters, or build performance. Start conservative with serial mode and low iteration caps until your measurement harness is solid.
npx skills add https://github.com/everyinc/compound-engineering-plugin --skill ce-optimize