Before you ship a skill that's supposed to enforce discipline, run it through scenarios where an agent actually wants to break the rules. This applies TDD's red-green-refactor cycle to process documentation: baseline test without the skill to watch agents fail and rationalize, write the skill to address those specific failures, then pressure test with combined stressors like time constraints plus sunk costs plus authority pressure. The rationalization table approach is clever. You capture exact excuses agents use ("keep as reference while writing tests first") and plug each loophole explicitly. Worth the setup cost for any skill with compliance costs attached, like testing requirements or code review checklists.
npx skills add https://github.com/neolabhq/context-engineering-kit --skill test-skill