This is a template for implementing speculative decoding techniques to speed up LLM inference by 1.5 to 3.6 times without losing quality. It covers the main approaches like draft model speculation, Medusa's multiple head method, and Lookahead Decoding based on Jacobi iteration. The skill references recent papers from 2024 and gives you a starting point if you're trying to reduce latency in chatbots or code generation tools, or squeeze better throughput from limited hardware. Worth noting it's ported from ovachiever/droid-tings and sits in a template collection with 27.7K stars, so there's likely other inference optimization patterns in there too. The practical speedup range makes this relevant if you're actually serving models at scale.
npx skills add https://github.com/davila7/claude-code-templates --skill speculative-decoding