HQQ gives you calibration-free quantization down to 4/3/2-bit precision, which means you can quantize any model in minutes without needing sample datasets. The real advantage here is speed and flexibility: while GPTQ and AWQ require hours and calibration data, HQQ just works immediately with multiple backend options like Marlin, TorchAO, and BitBlas for different hardware. It integrates natively with HuggingFace and vLLM, and you can fine-tune the quantized models with LoRA if needed. The tradeoff is that calibration-based methods like AWQ will generally give you better accuracy for production serving, but if you're experimenting with extreme quantization or need to compress models fast without datasets, this is the pragmatic choice.
npx skills add https://github.com/orchestra-research/ai-research-skills --skill hqq-quantization