This tackles LLM caching at three levels: Anthropic's native prompt caching for repeated prefixes, full response caching for identical queries, and CAG (Cache Augmented Generation) where you pre-cache documents in the prompt instead of doing RAG retrieval. The claim is 90% cost reduction, which tracks if you're hitting the same prompt structures repeatedly. Watch out for cache misses adding latency overhead and stale cached responses becoming wrong over time. The key insight here is that LLM caching isn't like HTTP caching because you need to think about prompt prefix structure, temperature settings, and semantic similarity rather than exact matches.
npx skills add https://github.com/davila7/claude-code-templates --skill prompt-caching