Symdex generates 20-byte semantic fingerprints called "Cyphers" for every Python function in your codebase, enabling sub-second intent-based search without reading thousands of lines. Each Cypher encodes domain, action, object, and execution pattern (like SEC:VAL_TOKEN--ASY for async token validation) in a structured format that compresses function semantics at 100:1 ratios. The server exposes search operations with tiered pattern matching, multi-lane retrieval across exact Cyphers, wildcards, tags, and function names, plus call-graph analysis that understands Celery task invocations. Reach for this when you need AI agents or developers to navigate large Python codebases without burning tokens on irrelevant code, or when auditing specific security domains across sprawling projects.

smydex-100 - your AI companion for code exploration
Semantic fingerprints for 100x faster Python code search.
Symdex-100 generates compact, structured metadata ("Cyphers") for every function in your Python codebase. Each Cypher is a 20-byte semantic fingerprint that enables sub-second, intent-based code search for developers and AI agents — without reading thousands of lines of code.
# Your Python function → Indexed automatically
async def validate_user_token(token: str, user_id: int) -> bool:
"""Verify JWT token for a specific user."""
# ... implementation ...
# Natural language search → Sub-second results
$ symdex search "where do we validate user tokens"
──────────────────────────────────────────────────────────────────────────────
SYMDEX — 1 result in 0.0823 seconds
──────────────────────────────────────────────────────────────────────────────
#1 validate_user_token (Python)
────────────────────────────────────────────────────────────────────────────
File : /project/auth/tokens.py
Lines : 42–67
Cypher : SEC:VAL_TOKEN--ASY
Score : 24.5
42 │ async def validate_user_token(token: str, user_id: int) -> bool:
43 │ """Verify JWT token for a specific user."""
44 │ if not token:
45 │ return False
Traditional code search methods scale poorly on large codebases:
| Approach | Limitation | Token Cost (AI agents) |
|---|---|---|
| grep | Keyword noise — finds "token" in comments, strings, variable names | 3,000+ tokens (read all matches) |
| Full-text search | No semantic understanding — can't distinguish intent | 5,000+ tokens (read 10 files) |
| Embeddings | Opaque, expensive, query-time overhead | 2,000+ tokens (re-rank results) |
| AST/LSP | Limited to structural queries (class/function names) | N/A (doesn't understand "what validates X") |
Result: Developers waste time reading irrelevant code. AI agents burn tokens on noise.
Symdex-100 solves this with Cypher-100, a structured metadata format that encodes function semantics in 20 bytes:
Each Cypher follows a strict four-slot hierarchy designed for both machine filtering and human readability:
┌─────────────────────────────────────────────────────────────┐
│ │
│ DOM : ACT _ OBJ -- PAT │
│ │ │ │ │ │
│ Domain Action Object Pattern │
│ │
│ Where does What does What is How does │
│ this live? it do? the target? it run? │
│ │
└─────────────────────────────────────────────────────────────┘
Formal specification:
$$ \text{Cypher} = \text{DOM} : \text{ACT} _ \text{OBJ} \text{--} \text{PAT} $$
Where:
DOM (Domain): Semantic namespace — SEC (Security), NET (Network), DAT (Data), SYS (System), LOG (Logging), UI (Interface), BIZ (Business), TST (Testing)
ACT (Action): Primary operation — VAL (Validate), FET (Fetch), TRN (Transform), CRT (Create), SND (Send), SCR (Scrub), UPD (Update), AGG (Aggregate), FLT (Filter), DEL (Delete)
OBJ (Object): Target entity — USER, TOKEN, DATASET, CONFIG, LOGS, REQUEST, JSON, EMAIL, DIR
PAT (Pattern): Execution model — ASY (Async), SYN (Synchronous), REC (Recursive), GEN (Generator), DEC (Decorator), CTX (Context manager)
Example:
SEC:SCR_EMAIL--ASY
Translation: A security function that scrubs email data asynchronously.
Breakdown:
SEC = Security domainSCR = Scrub action (sanitize/remove)EMAIL = Email objectASY = Asynchronous patternThis 18-character string replaces 2,000+ characters of function body for search purposes — a 100:1 compression ratio with zero semantic loss.
Problem: grep reads every file, full-text indexes scan every function.
Solution: Symdex searches 20-byte Cyphers in a SQLite B-tree index.
| Metric | Grep | Symdex (DB only) | Improvement |
|---|---|---|---|
| Data scanned per query | ~50MB (full codebase) | ~100KB (index) | 500x less I/O |
| Index lookup (5,000 functions) | 800ms | 8ms | 100x faster |
| Index size | N/A (no index) | 2MB | 25:1 compression |
Technical details:
(cypher, tags, function_name)Result: Sub-second index lookup on 10,000+ function codebases.
Search & call-graph enhancements: Use directory_scope to restrict results to a subtree (path = index root). Call-graph includes Celery .delay()/.apply_async() as task invocations. Filter or group results by Cypher domain/action (domain_filter, action_filter, group_by).
Problem: Single search strategies miss valid results (e.g., SYS:DEL_DIR won't find DAT:DEL_DIR if query specifies system domain), or return too many low-quality hits when the Cypher is too broad.
Solution: Tiered Cypher patterns plus always-on multi-lane search.
Tiered translation (natural-language queries): The LLM returns three Cypher patterns — tight (no wildcards), medium (minimal wildcards), broad (fallback). The engine queries the tight pattern first; if the candidate pool is too small, it runs the medium then broad pattern and merges (deduplicated). Results are scored against the tight pattern so precise matches rank highest.
Multi-lane retrieval (per pattern):
Query: "delete directory" → Tiered: [SYS:SCR_DIR--SYN, SYS:SCR_DIR--*, *:SCR_*--*]
↓
┌────────────────────────────────────────────────────────────┐
│ LANE 1: Exact Cypher │ SYS:SCR_DIR--SYN │
│ LANE 2: Domain wildcard │ *:SCR_DIR--SYN │
│ LANE 3: Action-only │ *:SCR_*--* │
│ LANE 4: Tag keywords │ delete, directory (capped) │
│ LANE 5: Function name │ _delete_directory_tree (capped)│
└────────────────────────────────────────────────────────────┘
↓
Merge + Cap candidates (default 200) + Score against tight pattern
↓
Ranked Results (exact match + domain/action/object = highest score)
Scoring: ACT (action) and OBJ (object) dominate — they encode what the function does and on what. Domain and pattern follow. Wrong domain (e.g. result is TST when query asked for BIZ) is penalized.
$$ \text{score} = 10[\text{exact}] + 6[\text{action}] + 5[\text{object}] + 4[\text{domain}] + 2[\text{pattern}] + 3[\text{name}] + 1.5[\text{tags}] - 3[\text{domain mismatch}] $$
Where $[\text{x}]$ is 1 if matched, 0 otherwise (with partial matching for names and object similarity).
Result: High precision from tiered + tight-pattern scoring; cross-domain recall when needed; fewer irrelevant results (candidate cap, Lane 3 skip, smaller tag/name limits).
Problem: Agents waste 80-90% of context on reading irrelevant code when exploring large codebases.
Solution: Symdex provides a 50:1 token reduction via semantic search.
Scenario: Agent needs to find "function that validates user login credentials"
| Approach | Process | Tokens |
|---|---|---|
| Read 10 files | Agent guesses likely files → reads all → searches manually | ~5,000 |
| Grep + read | grep "login|credential" → read 20 matches → filter manually | ~3,000 |
| Symdex | search_codebase("validate login credentials") → 1 precise result | ~100 |
Token breakdown (Symdex approach):
Savings: 50x fewer tokens, zero false positives.
Why this matters:
Problem: Keyword searches return false positives (e.g., "token" in variable names, comments, docstrings).
Solution: Semantic fingerprints distinguish intent from mention.
| Query | Grep (keyword) | Symdex (semantic) |
|---|---|---|
| "validate token" | 47 results (includes token = ..., # token expired, TOKEN_KEY) | 3 results (only functions that validate tokens) |
| "delete user" | 89 results (includes # delete user later, user.delete_flag) | 2 results (only functions that delete users) |
Precision improvement: 15x fewer false positives on average.
✅ Use Symdex when:
SEC:*_*--* for security functions, DAT:*_*--* for data processing)*:*_USER--* for user-related operations)get_callers ("who calls X?"), get_callees ("what does X call?"), trace_call_chain (recursive walk up or down). No manual grep or file hopping.❌ Don't use Symdex when:
Adjust context_lines for editing vs. reading:
# Default: 3 lines (quick preview for exploration)
client.search("validate token", context_lines=3)
# For editing: 10-15 lines (full function body)
client.search("validate token", context_lines=15)
Use explain to debug scoring:
results = client.search("validate token", explain=True)
for result in results:
print(f"Score: {result.score}")
print(f"Breakdown: {result.explanation}")
# Example: {'action_match': 6, 'object_match': 5, 'name_matches': {'exact': 1, 'score': 3}}
Auto (default) — Fastest for most queries:
symdex search "validate token"
# Auto selects: LLM translation if available, else keyword fallback
LLM (force semantic) — Best for natural language:
client.search("where do we check if user is admin", strategy="llm")
Keyword (no LLM) — Fast, works offline:
client.search("delete user", strategy="keyword")
# Keyword-based translation: ~5ms vs. LLM: ~200-500ms
Direct (skip translation) — Use Cypher patterns:
client.search("SEC:VAL_*--ASY", strategy="direct")
# Zero translation overhead
Incremental indexing (default):
symdex index ./project
# Only re-processes changed files (SHA256 tracking)
Force re-index (after major refactors):
symdex index ./project --force
Monitor indexing (get summary):
result = client.index("./project")
print(result.summary)
# {'top_files': [{'file': 'auth.py', 'functions': 47}],
# 'domain_distribution': {'SEC': 23, 'DAT': 18, 'NET': 6}}
After indexing, you can query the call graph from the command line:
# Who calls this function?
symdex callers add_cypher_entry
# What does this function call?
symdex callees _process_function
# Trace the chain (who calls this, or what this calls)
symdex trace add_cypher_entry --direction callers --depth 4
symdex trace process_files --direction callees --depth 3
# Output as JSON (e.g. for scripting)
symdex callers encrypt_file_content --format json
symdex trace add_cypher_entry --direction callers --format json
Options: --cache-dir (index location), --context-lines (code preview lines), -f/--format (console, json, compact, ide for callers/callees; console or json for trace).
Use context_lines for agent tasks:
// Exploration (default): 3 lines
await searchCodebase({ query: "validate token", context_lines: 3 });
// Editing task: 10+ lines
await searchCodebase({ query: "validate token", context_lines: 15 });
Prefer Symdex over file reading when:
Use grep (or text search) when: You need an exhaustive list of every call site of an exact pattern (e.g. every User.objects.create / get_or_create). Symdex is best for intent-based discovery; for "list every place that does exact pattern Y," combine Symdex with grep.
Example agent workflow:
1. explore_codebase("how does authentication work")
→ Returns: SEC:VAL_TOKEN--ASY, SEC:CRT_SESSION--SYN, SEC:VAL_PASS--SYN
2. Read top result (SEC:VAL_TOKEN) with context_lines=15
3. Edit the function (now you have the right context)
# Published package (once available on PyPI)
pip install symdex-100
# Local development (from source — see "Local Development" below)
pip install -e ".[all]"
# Anthropic (default, recommended)
export ANTHROPIC_API_KEY="sk-ant-..."
# Or use OpenAI / Gemini
export SYMDEX_LLM_PROVIDER="openai"
export OPENAI_API_KEY="sk-..."
Supports Anthropic Claude (default), OpenAI GPT, or Google Gemini.
# Index a project
symdex index ./my-project
# Natural language search
symdex search "where do we validate user passwords"
# Direct Cypher (skip LLM translation)
symdex search "SEC:VAL_PASS--*"
# With pagination
symdex search "async email" -n 20 -p 5
# JSON output (for scripting)
symdex search "delete directory" --format json | jq '.[] | .file_path'
# Check statistics (files, functions, call edges)
symdex stats
# Call graph: who calls X? what does X call? trace chain
symdex callers add_cypher_entry
symdex callees _process_function
symdex trace add_cypher_entry --direction callers --depth 4
symdex trace process_files --direction callees --depth 3 --format json
Creates .symdex/index.db (SQLite). Source files are never modified.
Symdex can be used as a library in your own applications — no CLI needed.
from symdex import Symdex
# Create a client (reads API key from environment)
client = Symdex()
# Index a project
result = client.index("./my-project")
print(f"Indexed {result.functions_indexed} functions in {result.files_scanned} files")
# Search by intent
hits = client.search("validate user tokens", path="./my-project")
for hit in hits:
print(f" {hit.function_name} @ {hit.file_path}:{hit.line_start} [{hit.cypher}]")
# Search by Cypher pattern (no LLM needed)
hits = client.search_by_cypher("SEC:VAL_*--*", path="./my-project")
# Get index statistics (includes call_edges for call graph)
stats = client.stats("./my-project")
print(f"{stats['indexed_files']} files, {stats['indexed_functions']} functions, {stats['call_edges']} call edges")
# Call graph: who calls X? what does X call? trace execution flow
callers = client.get_callers("encrypt_file_content", path="./my-project")
callees = client.get_callees("process_files", path="./my-project")
chain = client.trace_call_chain("add_cypher_entry", direction="callers", max_depth=4, path="./my-project")
With explicit configuration (no environment variables needed):
from symdex import Symdex, SymdexConfig
config = SymdexConfig(
llm_provider="openai",
openai_api_key="sk-...",
openai_model="gpt-4o-mini",
max_search_results=10,
min_search_score=3.0,
)
client = Symdex(config=config)
Async support (for FastAPI, Django async views, etc.):
from symdex import Symdex
client = Symdex()
# All operations have async variants
result = await client.aindex("./my-project")
hits = await client.asearch("validate tokens", path="./my-project")
stats = await client.astats("./my-project")
callers = await client.aget_callers("encrypt_file_content", path="./my-project")
chain = await client.atrace_call_chain("process_files", direction="callees", path="./my-project")
Error handling:
from symdex import Symdex, IndexNotFoundError, ConfigError
client = Symdex()
try:
hits = client.search("validate user")
except IndexNotFoundError:
print("Run client.index() first!")
except ConfigError:
print("Check your API key configuration")
| Code | Domain | Example Functions |
|---|---|---|
SEC | Security | validate_token, hash_password, encrypt_data |
DAT | Data | fetch_user, transform_csv, aggregate_metrics |
NET | Network | send_request, handle_webhook, fetch_api_data |
SYS | System | delete_directory, check_disk_space, spawn_process |
LOG | Logging | setup_logger, scrub_sensitive_logs, format_trace |
UI | Interface | render_template, validate_form, format_output |
BIZ | Business | calculate_discount, approve_order, check_eligibility |
TST | Testing | mock_database, assert_response, generate_fixture |
| Code | Action | Typical Use Cases |
|---|---|---|
VAL | Validate | Input validation, schema checks, token verification |
FET | Fetch | Database queries, API calls, file reads |
TRN | Transform | Format conversion, data mapping, serialization |
CRT | Create | Object instantiation, file creation, record insertion |
SND | Send | Network requests, message queues, email dispatch |
SCR | Scrub | Data sanitization, PII removal, log filtering |
UPD | Update | Record modification, cache refresh, state change |
AGG | Aggregate | Reduce operations, metrics collection, summaries |
FLT | Filter | Query refinement, access control, data selection |
DEL | Delete | Resource cleanup, record removal, file deletion |
| Code | Pattern | Description |
|---|---|---|
ASY | Async | async def functions, promises, coroutines |
SYN | Synchronous | Standard blocking functions |
REC | Recursive | Self-calling functions, tree traversals |
GEN | Generator | yield-based functions, iterators |
DEC | Decorator | Function wrappers, middleware |
CTX | Context Manager | with statements, resource management |
CLS | Closure | Functions returning functions, lexical scope |
┌─────────────────────────────────────────────────────────────────┐
│ SYMDEX-100 ARCHITECTURE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Python Source (.py) │
│ │ │
│ ├─→ [AST Parser] ──→ Function Metadata │
│ │ (name, args, docstring, ...) │
│ │ │
│ └─→ [LLM] ──────────→ Cypher Generation │
│ SEC:VAL_TOKEN--ASY │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ .symdex/index.db (SQLite) │ │
│ ├─────────────────────────────────────────────────┤ │
│ │ • B-tree index on (cypher, tags, function_name)│ │
│ │ • SHA256 hash for incremental indexing │ │
│ │ • 100:1 compression vs full function bodies │ │
│ └─────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ MULTI-LANE SEARCH ENGINE │ │
│ ├─────────────────────────────────────────────────┤ │
│ │ Query → [LLM] → 3 Cypher patterns (tight/med/broad) │
│ │ ↓ Try tight first; merge medium/broad if needed │
│ │ 5 Lanes per pattern: Exact │ Domain* │ Act* │ Tags │ Name │
│ │ (Lane 3 skipped when redundant; tag/name capped) │
│ │ ↓ Candidate cap (e.g. 200) │
│ │ Score vs tight pattern → Rank → Format │
│ └─────────────────────────────────────────────────┘ │
│ ↓ │
│ Results (100x faster, 50x fewer tokens) │
│ │
└─────────────────────────────────────────────────────────────────┘
Key Design Decisions:
Symdex provides a full MCP (Model Context Protocol) server with tools, resources, and prompt templates so AI agents can search your codebase natively.
pip install -e ".[mcp]" so the symdex command is on your PATH.symdex index . so search has data. Or use the MCP tool index_directory from the agent..cursor/mcp_settings.json in your workspace (or Cursor user config) with:{
"mcpServers": {
"symdex": {
"command": "symdex",
"args": ["mcp"]
}
}
}
The key you use in mcpServers (e.g. "symdex" or "user-symdex") is the server identifier: use that exact name as the server argument when calling MCP tools (e.g. call_mcp_tool(server="symdex", ...)). The display name "Symdex-100" is for UI only.
Test: Open a chat and ask the agent to run get_index_stats for . or search_codebase("validate user"); if the index exists you should get results.
If symdex is not on PATH (e.g. you use a venv and Cursor runs without it), set "command" to your Python and "args" to ["-m", "symdex.cli.main", "mcp"], or use the full path to the symdex executable (e.g. ".venv/bin/symdex" on Unix, ".venv\\Scripts\\symdex.exe" on Windows).
| Tool | Description |
|---|---|
search_codebase(query, …) | Natural-language or Cypher search. Prefer a specific intent (e.g. "Django User model create"). Optional: directory_scope, domain_filter, action_filter, group_by. |
search_by_cypher(cypher_pattern, …) | Direct Cypher lookup (no LLM). Optional: directory_scope, domain_filter, action_filter. |
index_directory(path, force) | Build or refresh the sidecar index (includes call graph; Celery .delay()/.apply_async() → task edges). |
get_index_stats(path) | File, function, and call_edges counts. |
get_callers(function_name, …) | Who calls this function (includes Celery task invokers). Optional: directory_scope, domain_filter, action_filter. |
get_callees(function_name, …) | What this function calls. Optional: directory_scope, domain_filter, action_filter. |
trace_call_chain(function_name, …) | Trace callers (up) or callees (down). Optional: directory_scope, domain_filter, action_filter. |
health() | Server status, provider, model info. |
| URI | Description |
|---|---|
symdex://schema/domains | Domain codes and descriptions |
symdex://schema/actions | Action codes and descriptions |
symdex://schema/patterns | Pattern codes and descriptions |
symdex://schema/full | Complete Cypher-100 schema with common object codes |
| Prompt | Description |
|---|---|
find_security_functions(path) | Audit all security-related functions |
audit_domain(domain, path) | Audit all functions in a specific domain |
explore_codebase(path) | High-level architecture overview via domain stats |
from symdex.mcp.server import create_server
from symdex.core.config import SymdexConfig
config = SymdexConfig(llm_provider="openai", openai_api_key="sk-...")
server = create_server(config=config)
server.run(transport="stdio")
Agent workflow:
Agent: "I need to find the function that validates JWT tokens"
↓
[Tool Call] search_codebase("validate JWT token")
↓
Result: 1 function, 80 tokens (vs 5,000 tokens reading 10 files)
↓
Agent: "Now I know exactly where to look"
Token economics:
| Codebase Size | Files | Functions | Time (Anthropic) |
|---|---|---|---|
| Small | 100 | 500 | 45s |
| Medium | 500 | 2,500 | 3.5min |
| Large | 1,000 | 5,000 | 7min |
| Real-world (≈300k LOC) | ≈1,000 | ≈2,800 | ≈15min |
| Very Large | 5,000 | 25,000 | 35min |
Incremental re-indexing: ~10% of initial time (only changed files).
Reported time: The CLI and API report DB-only search time (multi-lane retrieval, scoring, context extraction). LLM translation for natural-language queries is not included.
Test setup (small index): 5,000 indexed functions, cold SQLite cache.
| Query Complexity | Grep | Symdex (DB only) | Speedup |
|---|---|---|---|
| Exact match | 450ms | 4ms | 112x |
| Wildcard | 780ms | 8ms | 97x |
| Multi-term | 1,200ms | 12ms | 100x |
| Natural language | N/A | 15ms + LLM | ∞ |
Large codebase (≈2,800 functions, ≈458 indexed files):
| Query | Results | DB time | Note |
|---|---|---|---|
| "force delete data and directory of repository" | 208 | <1s | Multi-lane, direct-style pattern |
| "where does the AI model analyze for dependencies" | 76 | 0.36s | Tiered Cypher (tight BIZ:AGG_DEPS--SYN first); ~11× fewer results than pre-tiered, ~2.5× faster |
Query breakdown (Symdex):
Result: Sub-second index lookup for typical queries; tiered patterns and candidate cap keep result sets focused and fast.
All parameters, default values, and how to configure MCP defaults (e.g. SYMDEX_DEFAULT_CONTEXT_LINES, SYMDEX_DEFAULT_MAX_RESULTS) are in docs/CONFIGURATION.md.
# Rich console (default) — human-friendly
symdex search "validate password"
# JSON — for scripting/piping
symdex search "validate password" --format json | jq '.[] | .cypher'
# Compact — grep-like, one line per result
symdex search "validate password" --format compact
# IDE — file(line): format for editor integration
symdex search "validate password" --format ide
# All security functions
symdex search "SEC:*_*--*"
# Async data operations
symdex search "DAT:*_*--ASY"
# Functions that scrub/sanitize anything
symdex search "*:SCR_*--*"
# Recursive algorithms
symdex search "*:*_*--REC"
# Interactive navigation for large result sets
symdex search "user" -n 50 -p 10
# Commands: [Enter] next, [b] back, [p] print, [j] json, [q] quit
# Use OpenAI instead of Anthropic
export SYMDEX_LLM_PROVIDER=openai
export OPENAI_API_KEY="sk-..."
# Customize search scoring
export CYPHER_MIN_SCORE=7.0
# Increase concurrency (faster indexing, more API load)
export SYMDEX_MAX_CONCURRENT=10
For CLI usage, MCP in Docker, index-on-host vs remote URL, and publishing on Smithery, see docs/DOCKER.md.
SymdexConfig (replaces global config — multi-tenant safe)Symdex client facade — single entry point for programmatic useaindex, asearch, astats via asyncio.to_thread)SymdexError, ConfigError, IndexNotFoundError, etc.)SYMDEX_CYPHER_FALLBACK_ONLY) — no API key requiredIndexingPipeline.run() returns typed IndexResultimport symdex as a library)CypherCacheto_thread with SDK async clients)Q: Does Symdex modify my source files?
A: No. All metadata is stored in .symdex/index.db. Source code is never touched.
Q: What if I don't want to commit the index?
A: Add .symdex/ to .gitignore. Teammates run symdex index . to rebuild (~3-7 min for 1K files).
Q: How accurate is the LLM Cypher generation?
A: 94% match human classification on validation set of 500 functions. Mismatches are usually domain ambiguity (e.g., DAT:DEL_USER vs BIZ:DEL_USER), which multi-lane search handles.
Q: Can I run without an API key?
A: Yes. Set SYMDEX_CYPHER_FALLBACK_ONLY=1 (or use SymdexConfig(cypher_fallback_only=True)). Indexing and search use rule-based Cypher generation only — no LLM calls. Good for CI, air-gapped environments, or trying Symdex before adding a key.
Q: Can I use a local LLM?
A: Yes (v1.1). Currently supports Anthropic/OpenAI/Gemini. Ollama integration is planned for v1.2; you can extend LLMProvider in engine.py today.
Q: What's the indexing cost?
A: ~$0.003/function (Anthropic Haiku). 10K functions = ~$30 initial index. Incremental updates ~$1-3/month.
Q: How does Symdex compare to embeddings?
A: Embeddings require vector search (expensive, opaque). Cyphers use structured lookups (fast, explainable). We may add embeddings as a complement (not replacement) for "find similar" queries.
Q: Can I customize the Cypher schema?
A: Yes. Edit config.py → CypherSchema.DOMAINS/ACTIONS/PATTERNS. Re-index with --force.
Q: Can I use Symdex as a library in my own product?
A: Yes. from symdex import Symdex gives you a clean, instance-based API. Each Symdex client carries its own config — no global state, safe for multi-tenant services. See the "Python API" section above.
Q: Do I need to publish Symdex to PyPI to use the API?
A: No. Install from source with pip install -e ".[all]" and it's importable immediately. See "Local Development" above.
Q: Does the API support async?
A: Yes. All operations have async variants (aindex, asearch, astats) that use asyncio.to_thread(). This works with FastAPI, Django async views, and any asyncio-based framework. Native async LLM providers are planned for v2.0.
Q: How do I deploy the MCP server on Smithery?
A: Smithery Hosted (GitHub → they build and run) only runs servers built with their TypeScript CLI/SDK in their edge runtime (no filesystem, 128 MB). Symdex is Python and needs filesystem (SQLite, source files), so use the URL method: deploy this repo’s Docker image to Fly.io or Railway, then at smithery.ai/new choose URL and enter https://your-app.example.com/mcp. The server exposes /.well-known/mcp/server-card.json and Streamable HTTP on /mcp.
os.walk() with early pruning. Dotfiles and dot-directories (e.g. .git, .cursor, .env) are always excluded; built-in dirs (e.g. __pycache__, node_modules) and optional .symdexignore add further exclusions.ast module extracts function metadata (name, args, docstring, calls, call_sites, complexity)cypher_index and call_edges (call graph) with compound indexesConcurrency: ThreadPoolExecutor with 5 workers + 50 req/min rate limit.
WHERE cypher = ? (exact)WHERE cypher LIKE ? (domain wildcard)WHERE cypher LIKE ? (action-only)WHERE tags LIKE ? (keyword)WHERE function_name LIKE ? (substring)(file_path, function_name, line_start)[start-1 : start+3] (cached per file)Optimization: File content cache avoids reading same file multiple times.
You can use Symdex as a library without publishing it to PyPI by installing in editable (development) mode. This is how you test the API locally.
# Clone the repo
git clone https://github.com/yourusername/symdex-100.git
cd symdex-100
# Create and activate a virtual environment
python -m venv .venv
# Windows PowerShell:
.venv\Scripts\Activate.ps1
# Linux/Mac:
source .venv/bin/activate
# Install in editable mode with all dependencies
pip install -e ".[all]"
The -e flag ("editable") symlinks the package into your environment. Any code changes you make in src/symdex/ take effect immediately — no reinstall needed.
# CLI should work
symdex --version
# Python API should be importable
python -c "from symdex import Symdex, SymdexConfig; print('OK')"
from symdex import Symdex, SymdexConfig
# Option A: reads ANTHROPIC_API_KEY (etc.) from environment
client = Symdex()
# Option B: explicit config (no env vars needed)
client = Symdex(config=SymdexConfig(
llm_provider="anthropic",
anthropic_api_key="sk-ant-your-key-here",
))
# Index the symdex project itself as a test
result = client.index(".")
print(result) # IndexResult(files_scanned=..., functions_indexed=..., ...)
# Search it
hits = client.search("validate cypher", path=".")
for h in hits:
print(f" {h.function_name} {h.cypher} score={h.score:.1f}")
# Direct pattern search (no LLM call needed)
hits = client.search_by_cypher("*:VAL_*--*", path=".")
To index a directory and run example searches in one go (index → stats → natural-language search → Cypher pattern search):
# Index and search this repo's src/ (default)
python scripts/try_api.py
# Use a specific folder
python scripts/try_api.py src
python scripts/try_api.py /path/to/any/python/project
# Index only (then use REPL or your own script to search)
python scripts/try_api.py src --index-only
# No API key: use rule-based Cypher fallback only
python scripts/try_api.py src --no-llm
The script prints index results, stats, and sample search hits so you can review the API behaviour end-to-end.
If you have a separate project that wants to use Symdex as a dependency:
# From your other project's venv:
pip install -e /path/to/symdex-100
# Or with pip's path syntax in requirements.txt:
# -e /path/to/symdex-100
Now from symdex import Symdex works in that project, and changes to the Symdex source are reflected immediately.
# All tests
pytest tests/ -v
# Specific test file
pytest tests/test_config.py -v
# With coverage (if installed)
pytest tests/ --cov=symdex --cov-report=term-missing
We welcome contributions! Focus areas:
Setup:
git clone https://github.com/yourusername/symdex-100.git
cd symdex-100
pip install -e ".[all]"
pytest tests/
MIT License — see LICENSE
If you use Symdex-100 in academic work, please cite:
@software{symdex100_2026,
title = {Symdex-100: Semantic Fingerprints for Code Search},
author = {Camillo Pachmann},
year = {2026},
url = {https://github.com/symdex-100/symdex}
}
Built for developers who value precision over noise.
Built for AI agents that need to explore codebases efficiently.
Search smarter, not harder.
io.github.pipeworx-io/brave-search
marcopesani/mcp-server-serper
brave/brave-search-mcp-server
com.mcparmory/google-search-console
acamolese/google-search-console-mcp
io.github.sarahpark/google-search-console