This is a security layer for protecting LLM applications from adversarial inputs. It exposes five MCP tools over stdio: analyze_prompt runs your input through a three-agent pipeline (retrieval against FAISS vectors, guard signals, policy enforcement) to detect injection and jailbreak attempts. You also get get_threat_breakdown for per-signal scoring, sanitize_prompt to clean suspicious text, get_firewall_status for health checks, and benchmark_firewall to run the built-in adversarial test suite. Ships as a pip package or Docker container with configurable thresholds and operates in strict, moderate, or permissive modes. Reach for this when you're building multi-agent systems or user-facing LLM features and need a programmatic gate before prompts hit your model.
<mcp-name: io.github.Akhilucky/ai-firewall-mcp>
A multi-agent AI security layer that protects LLMs from prompt injection, jailbreaks, and policy violations. Available as an MCP server for any MCP-compatible client (Claude Desktop, Cursor, Windsurf, Cline, Roo Code, etc.).
pip install ai-firewall-mcp
ai-firewall-mcp
docker pull akhilucky/ai-firewall-mcp:latest
docker run -i akhilucky/ai-firewall-mcp:latest
Add to claude_desktop_config.json:
pip install:
{
"mcpServers": {
"ai-firewall": {
"command": "pipx",
"args": ["run", "ai-firewall-mcp"]
}
}
}
Docker:
{
"mcpServers": {
"ai-firewall": {
"command": "docker",
"args": ["run", "-i", "akhilucky/ai-firewall-mcp:latest"]
}
}
}
Configure in your MCP settings with:
stdiodocker run -i akhilucky/ai-firewall-mcp:latestai-firewall-mcp if installed via pip| Tool | Description |
|---|---|
analyze_prompt | Analyze a prompt for injection, jailbreaks, exfiltration, and leakage |
get_threat_breakdown | Detailed per-signal scoring breakdown from the last analysis |
sanitize_prompt | Clean a suspicious prompt while preserving legitimate content |
get_firewall_status | Health check: vector DB size, model status, uptime |
benchmark_firewall | Run the adversarial test suite and return detection statistics |
npx @modelcontextprotocol/inspector ai-firewall-mcp
The firewall runs three agents per prompt:
User Prompt → [Retrieval Agent] → [Guard Agent] → [Policy Agent] → LLM
│ │ │
▼ ▼ ▼
Vector DB (FAISS) Threat Signals Allow/Block
| Agent | Role |
|---|---|
| Retrieval Agent | Semantic search against known attack patterns (FAISS + sentence-transformers) |
| Guard Agent | Multi-signal classification: vector similarity, keyword match, heuristic scoring |
| Policy Agent | Final decision: ALLOW / BLOCK / SANITIZE based on configurable thresholds |
Threat signals are weighted: 40% vector similarity, 25% keyword match, 20% heuristic, 15% policy weight.
| Env Var | Default | Description |
|---|---|---|
FIREWALL_MODE | strict | strict / moderate / permissive |
SIMILARITY_THRESHOLD | 0.50 | Vector match threshold (lower = stricter) |
LOG_LEVEL | INFO | Logging verbosity |
# Interactive dashboard
python main.py
# Red-team adversarial tests
python main.py --redteam
# REST API server
python main.py --api
# Single prompt analysis
python main.py --analyze "Ignore all previous instructions"
The REST API runs at http://localhost:8000 with OpenAPI docs at /docs (requires pip install ai-firewall-mcp[api]).
pytest tests/ -v # Full test suite (43 tests)
pytest tests/test_mcp.py # MCP-specific tests only
├── src/ai_firewall/ # MCP server package (PyPI entry)
│ ├── mcp_server.py # 5 MCP tools, stdio transport
│ ├── threat_scorer.py # Per-signal scoring breakdown
│ └── __init__.py
├── src/agents/ # Core firewall agents
├── tests/ # Test suites
├── Dockerfile # Docker image (2.04GB, CPU-only torch)
├── pyproject.toml # Package config & metadata
└── .github/workflows/ci.yml # CI/CD pipeline
MIT — see LICENSE.
io.github.ericm1018/skillfm-llm-cost-optimizer-openai-anthropic-usage
io.github.mikerawsonnz/llm-orchestration-agent
io.github.mikerawsonnz/authenticated-llm-agent
labforgedev/copilot-memory-mcp
csoai-org/agent-prompt-injection-firewall-mcp
io.github.mikerawsonnz/authenticated-multi-llm-agent