This covers three caching layers for LLM applications: Anthropic's native prompt caching for stable prefixes (90% token cost reduction), response caching with Redis for repeated queries, and Cache Augmented Generation where you pre-load entire document sets into cached prompts instead of doing RAG retrieval. The CAG approach is smart when your corpus is under 100K tokens and rarely changes, since you skip the retrieval step entirely. The sharp edges section is honest about when caching hurts more than helps, like when your cache hit rate is below 50% and the overhead of cache checks actually slows things down. Includes a decision matrix comparing CAG versus RAG based on corpus size and update frequency.
npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill prompt-caching