This is a practical skill for running quantized LLMs when you're hitting memory limits. It covers GPTQ, a post-training quantization method that compresses models down to 4-bit precision with minimal accuracy loss. The guidance is useful: it tells you when to use GPTQ versus AWQ, with specific hardware examples like fitting 70B+ models on consumer GPUs like the RTX 4090. You get about 4× memory reduction and 3-4× inference speedup compared to FP16. The skill has solid traction with 282 installs and comes from a well-starred template repository. Good reference if you're optimizing deployment costs or working with limited hardware.
npx skills add https://github.com/davila7/claude-code-templates --skill gptq