This implements the NOWAIT technique from a 2025 paper that suppresses self-reflection tokens during inference to cut reasoning token usage by 27-51% without hurting accuracy. It works by blocking words like "wait," "hmm," and "alternatively" during generation, which forces models like QwQ and DeepSeek-R1 into more direct reasoning paths. Works great on RL-based models but can degrade performance on distilled ones since they rely heavily on the CoT structure from training. The implementation is straightforward, just a logit processor you drop into your generation call. If you're running reasoning models in production and token costs or latency matter, this is a clean win with minimal integration work.
npx skills add https://github.com/davila7/claude-code-templates --skill nowait-reasoning-optimizer