Meta's 7-8B parameter model built specifically for moderating LLM inputs and outputs across six safety categories: violence, sexual content, weapons, substances, self-harm, and criminal planning. You run prompts or conversations through it and get back "safe" or "unsafe" plus a category code. Ships with transformers but you'll want vLLM for production since it cuts latency from 500ms to 50ms. The 94-95% accuracy is solid but you'll hit false positives, so consider adding probability thresholds. Needs GPU and HuggingFace login. If you're building a production LLM app and need more than OpenAI's basic moderation endpoint, this gives you granular control at the cost of hosting your own inference.
npx skills add https://github.com/orchestra-research/ai-research-skills --skill llamaguard