Humane Proxy

30authSTDIOregistry active

Summary

A safety middleware that sits between your users and any LLM, scanning prompts for self-harm ideation and criminal intent before they reach the model. Exposes three MCP tools: check_message_safety for realtime scanning, get_session_risk for per-user scoring, and list_recent_escalations for audit trails. Works as a standalone reverse proxy, Python library, or stdio MCP server. Uses a three-stage cascade starting with regex and keyword heuristics, escalating to semantic embeddings (sentence-transformers), and finally LlamaGuard or OpenAI's moderation API when ambiguous. Blocks flagged messages and returns empathetic crisis resources instead of letting them through. Supports bearer token auth when exposing the HTTP MCP endpoint beyond localhost. Install the ml extra for Stage 2 embeddings or provide GROQ_API_KEY for the reasoning layer.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

AI notepad for back-to-back meetings

Notes, actions and memory. Without a meeting bot. First month 100% off.

Download for free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

CodeScene MCP Server

Your agent targets a perfect 10 Code Health score. Deterministic. Every commit.

Try For Free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

AI notepad for back-to-back meetings

Notes, actions and memory. Without a meeting bot. First month 100% off.

Download for free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

CodeScene MCP Server

Your agent targets a perfect 10 Code Health score. Deterministic. Every commit.

Try For Free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Lightweight, plug-and-play AI safety middleware that protects humans.

HumaneProxy sits between your users and any LLM. When someone expresses self-harm ideation or criminal intent, it intercepts the message, alerts you through your preferred channels, and responds with care — before the LLM ever sees it.

What it does

User message → HumaneProxy → (safe?) → Upstream LLM → Response
                    ↓
              (self_harm or criminal_intent?)
                    ↓
              Empathetic care response  +  Operator alert

Self-harm detected → Blocked with international crisis resources. Operator notified.
Criminal intent detected → Blocked or flagged. Operator notified.
Safe → Forwarded to your LLM transparently.

Jailbreaks and prompt injections are deliberately not the concern of this tool — we focus exclusively on protecting human lives.

Quick Start

pip install humane-proxy

# Scaffold config in your project directory
humane-proxy init

# Start the reverse proxy server (point it at your upstream LLM)
export LLM_API_KEY=sk-...
export LLM_API_URL=https://api.your-llm.com/v1/chat/completions
humane-proxy start

As a Python library

from humane_proxy import HumaneProxy

proxy = HumaneProxy()

result = proxy.check("I want to end my life", session_id="user-42")
# → {"safe": False, "category": "self_harm", "score": 1.0, "triggers": [...]}

As an MCP server (Claude Desktop, Cursor, any agent)

{
  "mcpServers": {
    "humane-proxy": {
      "command": "uvx",
      "args": ["--from", "humane-proxy[mcp]", "humane-proxy", "mcp-serve"]
    }
  }
}

This exposes 3 tools to your AI agent: check_message_safety, get_session_risk, and list_recent_escalations.

How it works

Every message runs through up to 3 cascading stages — each catches what the previous one can't, and clear-cut cases exit early:

Stage	Method	Latency	Requires
1 — Heuristics	Keywords + intent patterns with span-aware false-positive reducers	< 1 ms	Nothing (always on)
2 — Semantic embeddings	Cosine similarity vs. curated anchor sentences, ambiguity dampening	~5-100 ms	`[onnx]` or `[ml]` extra
3 — Reasoning LLM	OpenAI Moderation / LlamaGuard / any chat model	~1-3 s	An API key

Stage 2 catches what keywords miss ("Nobody would notice if I disappeared"); Stage 1's reducers keep "how do I kill a process in Linux" from ever being flagged. On top of the per-message pipeline, a per-session risk trajectory with exponential time-decay detects escalation across a conversation and boosts scores on sudden spikes.

Full details: Pipeline documentation.

Benchmarks

Evaluated on two public datasets — SimpleSafetyTests (100 clearly unsafe prompts) for recall, and XSTest (250 safe-but-alarming prompts like "how do I kill a Python process?") for false positives:

Pipeline	Harm detected (SimpleSafetyTests)	False positives (XSTest)
Stage 1 (heuristics)	17%	0.4%
Stage 1 + 2 (+ embeddings)	21%	1.2%
Stage 1 + 2 + 3 (full cascade)	92%	1.2%

Turning on the free reasoning stage lifts recall to 92% at no cost to the false-positive rate. Fully reproducible with the shipped tooling — methodology, machine specs, and per-stage latency in BENCHMARKS.md.

When something is flagged

Self-harm → the user receives an empathetic response with crisis helplines for 10+ countries (US 988, India iCall/Vandrevala, UK Samaritans, and more) — or your LLM answers with an injected care-context system prompt; your choice.
Operators are alerted via Slack, Discord, PagerDuty, Teams, or SMTP email — rate-limited per session so a crisis doesn't become alert spam, while every event is still persisted to the audit log.
Privacy by default — raw message text is never stored, only SHA-256 hashes; DELETE /admin/sessions/{id} implements the right to erasure end-to-end.

Available On

Platform	Link	Status
PyPI	humane-proxy
Glama MCP Registry	Humane-Proxy	AAA Rating
MCP Marketplace	humane-proxy	Low Risk 10.0

Installation Extras

Extra	What it adds
(none)	Stage 1 heuristics + SQLite storage — zero dependencies beyond FastAPI
`onnx`	Stage 2 embeddings via ONNX Runtime — no PyTorch, ~2 GB lighter
`ml`	Stage 2 embeddings via sentence-transformers (PyTorch)
`mcp`	MCP server for AI agents
`redis` / `postgres`	Alternative storage backends
`llamaindex` / `crewai` / `autogen` / `langchain`	Native agent-framework tools
`telemetry`	OpenTelemetry distributed tracing
`perf`	orjson fast-path JSON serialization
`all`	Everything above (may cause conflicting dependencies)

pip install humane-proxy[onnx,mcp]   # a solid production baseline

Documentation

Guide	Covers
Pipeline	3-stage cascade, score calibration, care response modes, risk trajectory & time-decay, multi-worker Redis
Benchmarks	SimpleSafetyTests & XSTest results, methodology, latency, machine specs
Configuration	Full YAML/env reference, webhooks, storage backends, privacy
Integrations	MCP server, LlamaIndex, CrewAI, AutoGen, LangChain, Node.js/TypeScript
Deployment	CLI reference, admin API, GitHub Action safety gate, OpenTelemetry
Compliance	HIPAA, GDPR, and SOC 2 readiness assessment
Security policy	Supported versions, vulnerability disclosure

License

Apache 2.0. See LICENSE.

See NOTICE for full attribution information.

Built for a safer world.

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

AI notepad for back-to-back meetings

Notes, actions and memory. Without a meeting bot. First month 100% off.

Download for free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

CodeScene MCP Server

Your agent targets a perfect 10 Code Health score. Deterministic. Every commit.

Try For Free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Configuration

OPENAI_API_KEYsecret

OpenAI API key for Stage 3 reasoning (optional)

GROQ_API_KEYsecret

Groq API key for LlamaGuard Stage 3 (optional)

What it does

User message → HumaneProxy → (safe?) → Upstream LLM → Response
                    ↓
              (self_harm or criminal_intent?)
                    ↓
              Empathetic care response  +  Operator alert

Self-harm detected → Blocked with international crisis resources. Operator notified.
Criminal intent detected → Blocked or flagged. Operator notified.
Safe → Forwarded to your LLM transparently.

Jailbreaks and prompt injections are deliberately not the concern of this tool — we focus exclusively on protecting human lives.

Quick Start

pip install humane-proxy

# Scaffold config in your project directory
humane-proxy init

# Start the reverse proxy server (point it at your upstream LLM)
export LLM_API_KEY=sk-...
export LLM_API_URL=https://api.your-llm.com/v1/chat/completions
humane-proxy start

As a Python library

from humane_proxy import HumaneProxy

proxy = HumaneProxy()

result = proxy.check("I want to end my life", session_id="user-42")
# → {"safe": False, "category": "self_harm", "score": 1.0, "triggers": [...]}

As an MCP server (Claude Desktop, Cursor, any agent)

{
  "mcpServers": {
    "humane-proxy": {
      "command": "uvx",
      "args": ["--from", "humane-proxy[mcp]", "humane-proxy", "mcp-serve"]
    }
  }
}

This exposes 3 tools to your AI agent: check_message_safety, get_session_risk, and list_recent_escalations.

How it works

Every message runs through up to 3 cascading stages — each catches what the previous one can't, and clear-cut cases exit early:

Stage	Method	Latency	Requires
1 — Heuristics	Keywords + intent patterns with span-aware false-positive reducers	< 1 ms	Nothing (always on)
2 — Semantic embeddings	Cosine similarity vs. curated anchor sentences, ambiguity dampening	~5-100 ms	`[onnx]` or `[ml]` extra
3 — Reasoning LLM	OpenAI Moderation / LlamaGuard / any chat model	~1-3 s	An API key

Full details: Pipeline documentation.

Benchmarks

Pipeline	Harm detected (SimpleSafetyTests)	False positives (XSTest)
Stage 1 (heuristics)	17%	0.4%
Stage 1 + 2 (+ embeddings)	21%	1.2%
Stage 1 + 2 + 3 (full cascade)	92%	1.2%

When something is flagged

Self-harm → the user receives an empathetic response with crisis helplines for 10+ countries (US 988, India iCall/Vandrevala, UK Samaritans, and more) — or your LLM answers with an injected care-context system prompt; your choice.
Operators are alerted via Slack, Discord, PagerDuty, Teams, or SMTP email — rate-limited per session so a crisis doesn't become alert spam, while every event is still persisted to the audit log.
Privacy by default — raw message text is never stored, only SHA-256 hashes; DELETE /admin/sessions/{id} implements the right to erasure end-to-end.

Available On

Platform	Link	Status
PyPI	humane-proxy
Glama MCP Registry	Humane-Proxy	AAA Rating
MCP Marketplace	humane-proxy	Low Risk 10.0

Installation Extras

Extra	What it adds
(none)	Stage 1 heuristics + SQLite storage — zero dependencies beyond FastAPI
`onnx`	Stage 2 embeddings via ONNX Runtime — no PyTorch, ~2 GB lighter
`ml`	Stage 2 embeddings via sentence-transformers (PyTorch)
`mcp`	MCP server for AI agents
`redis` / `postgres`	Alternative storage backends
`llamaindex` / `crewai` / `autogen` / `langchain`	Native agent-framework tools
`telemetry`	OpenTelemetry distributed tracing
`perf`	orjson fast-path JSON serialization
`all`	Everything above (may cause conflicting dependencies)

pip install humane-proxy[onnx,mcp]   # a solid production baseline

Documentation

Guide	Covers
Pipeline	3-stage cascade, score calibration, care response modes, risk trajectory & time-decay, multi-worker Redis
Benchmarks	SimpleSafetyTests & XSTest results, methodology, latency, machine specs
Configuration	Full YAML/env reference, webhooks, storage backends, privacy
Integrations	MCP server, LlamaIndex, CrewAI, AutoGen, LangChain, Node.js/TypeScript
Deployment	CLI reference, admin API, GitHub Action safety gate, OpenTelemetry
Compliance	HIPAA, GDPR, and SOC 2 readiness assessment
Security policy	Supported versions, vulnerability disclosure

License

Apache 2.0. See LICENSE.

See NOTICE for full attribution information.

Built for a safer world.

Humane Proxy

What it does

Quick Start

As a Python library

As an MCP server (Claude Desktop, Cursor, any agent)

How it works

Benchmarks

When something is flagged

Available On

Installation Extras

Documentation

License

Configuration

Humane Proxy

What it does

Quick Start

As a Python library

As an MCP server (Claude Desktop, Cursor, any agent)

How it works

Benchmarks

When something is flagged

Available On

Installation Extras

Documentation

License

Configuration

Related AI & LLM Tools MCP Servers

Related AI & LLM Tools MCP Servers