A safety middleware that sits between your users and any LLM, scanning prompts for self-harm ideation and criminal intent before they reach the model. Exposes three MCP tools: check_message_safety for realtime scanning, get_session_risk for per-user scoring, and list_recent_escalations for audit trails. Works as a standalone reverse proxy, Python library, or stdio MCP server. Uses a three-stage cascade starting with regex and keyword heuristics, escalating to semantic embeddings (sentence-transformers), and finally LlamaGuard or OpenAI's moderation API when ambiguous. Blocks flagged messages and returns empathetic crisis resources instead of letting them through. Supports bearer token auth when exposing the HTTP MCP endpoint beyond localhost. Install the ml extra for Stage 2 embeddings or provide GROQ_API_KEY for the reasoning layer.
claude mcp add --transport stdio vishisht16-humane-proxy uvx humane-proxy