A read-only governance layer for AI coding agents that indexes your codebase locally and serves evidence-backed context packets through MCP. You get tools like change_plan to snapshot file hashes and symbol baselines before edits, post_edit_review to diff the dirty tree against that plan, and a verification ledger that parses reported commands against a POSIX shell subset to detect exit masking. Deep support for TypeScript, JavaScript, and Python with shallow coverage for Rust, Go, and Java. Works with any MCP host including Codex CLI and Claude Code through stdio transport. The core profile exposes six primary-loop tools to minimize schema token cost, while the full profile gives you all twenty. No model calls, no API keys, everything runs locally with deterministic output.
CODEXA_MCP_STRUCTURED_BUDGET_BYTESOptional structured-result byte budget override for hosts with small MCP result limits.
Codexa is an edit-lifecycle governance layer for AI coding agents — plan conformance, drift review, and verification crediting — built on a local, deterministic codebase map.
In plain English: it reads a repository, builds a compact index of the files, symbols, imports, tests, risks, and workflows it can prove, then gives Codex, Claude Code, or another MCP client small evidence-backed packets before and after edits. It is meant to help an agent answer questions like:
It is not an autonomous coding agent. It does not edit your source files through MCP. It is a context compiler, query server, and verification guide.
Three capabilities are deliberately hard to find elsewhere:
change_plan snapshots per-file hashes plus symbol and
risk baselines before editing; post_edit_review diffs the real dirty tree
against that plan afterwards, rename-aware. When no plan was saved, the
pre-edit hook saves an implicit baseline automatically, so the review always
has a pre-edit reference; an explicit change_plan upgrades it with planned
scope and tests. Blocking is opt-in: only reviews against an explicit plan
can surface a blocking verdict to the host — implicit baselines keep the
loop informational.npm test || true earns nothing, tsc --help is vetoed as non-compiling, sh -c
wrappers are unwrapped with ambiguity failing closed. Scope stated plainly:
this detects structural exit-masking in reported commands — it cannot
detect a wholesale fabricated report. The opt-in AutoVerify lane exists for
execution-backed evidence.rg/git baselines and
fails a scenario outright if the raw baseline does the job better. The
archived v0.2.0 release run passed 20/20 scenarios with packets averaging
0.66x the raw baseline output size — and the harness ships in this repo, so
you can re-run it yourself. See Public Proof.Limits, stated up front: TypeScript/JavaScript and Python are the deep lanes (Rust/Go/Java are shallow; other languages get light file facts). Impact expansion caps at graph depth 3. The tested envelope is repos around the ~50K-LOC scale of Codexa itself — expect slower cold indexing and shallower ranking on large monorepos. Everything runs locally: zero API keys and zero network calls in the core paths.
Codexa is maintained by one person, in spare time, with a deliberately narrow scope. That shapes how this repo works:
Codexa requires Node.js 22 or newer.
Install from npm:
npm install -g @mirnoorata/codexa
Or work from a checkout:
git clone https://github.com/mirnoorata/codexa.git
cd codexa
npm install
npm run build
npm link
Wire Codexa into another repository:
codexa init /path/to/project # Codex CLI: .codex/config.toml + hooks
codexa init /path/to/project --claude # also writes a repo-root .mcp.json for Claude Code
codexa session-start /path/to/project
After codexa init, the target repository gets a repo-local .codex/config.toml
entry that lets Codex discover the Codexa MCP server automatically, and with
--claude a repo-root .mcp.json so Claude Code discovers the same server
(only the codexa entry is managed; other servers in an existing .mcp.json
are preserved, and malformed JSON aborts the write). When init runs from an
evictable npx cache, generated configs pin npx -y @mirnoorata/codexa@<version>
instead of the cache path so they keep working after a cache prune.
Useful flags: the default tool profile for fresh installs is core — only the
primary-loop tools (plus impact/freshness) are exposed, which cuts per-turn
schema token cost; --tools full exposes all 20 tools, and re-running plain
codexa init preserves whichever profile the repo already uses. On the Codex
side the core profile relies on Codex CLI honoring enabled_tools (older
versions ignore the key and simply expose every tool); the Claude Code
.mcp.json path filters server-side via serve --tools core and needs no
client support. --agents-md (opt-in) writes a managed
Codexa workflow block into the repo's AGENTS.md for Codex, and --claude-md
(opt-in) writes the same managed block into CLAUDE.md for Claude Code. The
region between the <!-- >>> codexa managed --> / <!-- <<< codexa managed -->
markers is reserved: Codexa replaces it in place on every re-run (so the block
stays current) and never edits anything outside it. Unbalanced or malformed
markers abort the write instead of silently truncating the file.
The installed command is codexa, and the server can also run ad hoc:
npx -y @mirnoorata/codexa serve /path/to/project --auto-refresh
Codexa is also listed in the official MCP registry as
io.github.mirnoorata/codexa for MCP clients that discover servers there.
Codexa is deterministic and model-agnostic — its core indexing, ranking, and
query paths call no model and need no API keys, so it serves the same
evidence-backed context to any agent host that speaks MCP: the OpenAI Codex CLI
(repo-local .codex/config.toml), Claude Code (codexa init --claude writes a
repo-root .mcp.json; the bundled plugin under integrations/claude-code/
ships its own MCP server entry, hooks that auto-save the pre-edit baseline and
surface blocking drift verdicts to the model, and slash commands; --claude-md
adds workflow steering — pick the plugin or init --claude for MCP wiring,
not both, or Claude Code will register the codexa server twice), and any client
that discovers it through the MCP registry. There is no per-model integration to do — the model lives in the
host, and Codexa is the host's context server. (The one exception is the
opt-in, off-by-default semantic lane, which can call a configured embedding
provider such as OpenAI — see Optional Lanes.)
Token discipline is built in: every tool description states its typical output
cost, structured results are budget-compacted with truncation records naming
dropped fields, hosts with small MCP result limits can set
CODEXA_MCP_STRUCTURED_BUDGET_BYTES, and the big retrieval tools accept
responseFormat: "concise" for a summary-tier packet that compacts both the
structured payload and the text block. The tools/list surface is budgeted
too: the per-tool output schema defaults to a compact top-level contract
(measured on this repo: 123KB -> 54KB for the full 20-tool surface, 21KB with
the core profile; CODEXA_MCP_OUTPUT_SCHEMA=full restores the deep schema),
and codexa serve --tools core registers only the primary-loop tools for
hosts without a client-side allowlist. Because the budget caps tokens rather
than dollars, the savings scale with the host model's price — they matter most
on frontier-tier models.
Codexa's stdio transport is for a host running on the same machine as the
repository (Codex CLI, Claude Code). Its HTTP transport is loopback-only by
design — non-loopback bind addresses and non-loopback Origin headers are
rejected — so a hosted agent whose container runs in someone else's cloud (for
example a Claude Managed Agents session) cannot reach a local Codexa server over
the public network.
The supported way to give a managed cloud agent Codexa context is a
self-hosted sandbox: run the agent's tool-execution container in your own
infrastructure, alongside a Codexa server, and point the agent's MCP config at
Codexa on 127.0.0.1. The agent loop stays on the provider's orchestration
layer; tool execution — and the Codexa connection — stay inside your trust
boundary, where loopback HTTP is safe. An authenticated remote HTTP mode that
would let a provider-hosted container dial into Codexa directly is intentionally
not shipped: exposing a codebase context server to the network needs an
auth/origin policy Codexa does not yet have, so it is deferred rather than
shipped insecure.
Use Codexa as a guardrail around code changes:
Start with session_context or codexa session-start.
This tells the agent whether the index is fresh and what loop to use.
Search when the target is unclear.
search combines bounded raw search, exact/symbol evidence, Codexa ranking,
optional semantic retrieval, likely tests, and known gaps.
Ask for a task brief before editing.
task_brief / brief returns read-first files, impact expansion, risks,
snippets, test recommendations, freshness, and next tool guidance.
Save a change plan before non-trivial edits.
change_plan with saveSnapshot=true, or CLI
change-plan --save-snapshot, records the intended scope and test plan.
If you skip this step, the pre-edit hooks save an implicit baseline of the
dirty tree on the first edit — the review still gets changed-since-baseline
and head-drift accuracy, but only an explicit plan enables unplanned-scope
drift detection.
Review after editing.
post_edit_review / post-edit-review compares the actual dirty tree with
the saved snapshot, reports drift, and tells you whether to continue, run
tests, inspect, or replan.
Finish with a test plan if verification is unclear.
test_plan recommends targeted commands and shows what they would cover.
Primary MCP loop:
session_context -> search(if target unclear) -> task_brief ->
change_plan(saveSnapshot) -> post_edit_review -> test_plan
Running codexa index /path/to/project writes generated files under the target
repo's .codex/codebase/ directory:
.codex/codebase/README.md
.codex/codebase/codex-contract.md
.codex/codebase/repo-map.md
.codex/codebase/risk-map.md
.codex/codebase/placeholder-map.md
.codex/codebase/test-map.md
.codex/codebase/conventions.md
.codex/codebase/workflows.md
.codex/codebase/freshness.json
.codex/codebase/index.json
.codex/codebase/facts.ndjson
.codex/codebase/modules/
.codex/codebase/playbooks/
For lay readers, these are the maps and checklists Codex reads. For engineers,
the durable machine-readable index is index.json plus facts.ndjson; the
Markdown files are compact human/agent-facing projections of the same facts.
Generated cache and working state live under .codex/cache/. Codexa-owned cache
writes are allowed; source-file mutation is not exposed through MCP tools.
| Command | Use it for |
|---|---|
codexa init <repo> | Write repo-local Codex MCP config/hooks and index the repo (--claude for a repo-root Claude Code .mcp.json, --tools full to expose every tool, --agents-md for an AGENTS.md workflow block). |
codexa session-start <repo> | Print cheap startup status and the automatic-use loop. |
codexa index <repo> | Build .codex/codebase/ artifacts once. |
codexa watch <repo> | Keep artifacts fresh during active edit sessions. |
codexa status <repo> | Check freshness and parser errors without refreshing. |
codexa doctor <repo> | Diagnose wiring, freshness, hooks, artifacts, and MCP readiness. |
codexa repo-map <repo> | Show ranked modules/files. |
codexa search <repo> --query "..." | Discover a target from natural language, identifiers, or broad prompts. |
codexa find-context <repo> --query "..." | Find matching files, symbols, and usage sites. |
codexa explain <repo> --file path | Explain a file. |
codexa explain <repo> --symbol name | Explain a symbol neighborhood. |
codexa impact <repo> --file path | Estimate blast radius for a file or symbol. |
codexa diff-impact <repo> | Summarize current dirty worktree impact. |
codexa test-plan <repo> --diff | Recommend targeted tests for current changes. |
codexa brief <repo> --task "..." | Get the default read-first packet before editing. |
codexa context-pack <repo> --task "..." | Get a larger task-shaped context packet. |
codexa focus-brief <repo> --task "..." | Orient around a broad project question. |
codexa callers <repo> --symbol name | Find who calls or references a symbol/file. |
codexa callees <repo> --file path | Find what a symbol/file calls or references. |
codexa dependency-path <repo> ... | Find a bounded graph path between two files/symbols. |
codexa workflow-path <repo> --query "..." | Trace route, job, manifest, or workflow paths. |
codexa change-plan <repo> --task "..." --save-snapshot | Save a pre-edit plan and dirty baseline. |
codexa post-edit-review <repo> --task-id ... | Review the final dirty tree against the saved plan. |
codexa semantic-index <repo> --provider ... | Build optional semantic retrieval cache. |
codexa static-analysis <repo> ... | Import or optionally run external scanner reports. |
codexa eval <repo> | Run structured retrieval/verification benchmark scenarios. |
codexa github-sync-check <repo> | Diagnose GitHub source sync readiness. |
codexa github-release <repo> | Create release notes, tags, and GitHub Release entries. |
codexa serve <repo> | Start the MCP context server over stdio (--tools core registers only the primary-loop tools). |
codexa serve <repo> --transport http --host 127.0.0.1 --port 8729 | Start loopback-only HTTP MCP. |
Most context commands auto-refresh stale or missing Codexa artifacts before
answering. Use --no-auto-refresh when you intentionally want to inspect only
the stored index.
Codexa indexes git-visible files and skips common generated or dependency directories. The source reader is intentionally small and deterministic.
Native parser lanes:
Shallow deterministic lanes:
Lightweight file lanes:
Facts carry explicit confidence:
authoritative: syntax or git facts Codexa directly read.derived: deterministic links, static assists, report-backed relationships,
and likely test relationships.heuristic: framework hints, string references, dynamic behavior guesses, or
risk hints.fallback: low-confidence context used only when nothing better is available.Codexa should never make heuristic-heavy output look stronger than it is.
Codexa is a TypeScript package with five main layers.
Entry point: src/indexer.ts.
Pipeline:
The indexer uses a cross-process cache lock so parallel Codexa commands do not stampede artifact writes.
Core types live in src/types.ts.
Important fact types:
RepoSnapshotFileSymbolUsageSiteImportEdgeTestEdgeGraphEdgeWorkflowTraceModuleClusterRiskSignalParserErrorSessionMemoryEntryImportant graph edge kinds:
DEFINESIMPORTSCALLSREFERENCESTESTSROUTEJOBRISKROUTE_HANDLESROUTE_CALLS_STORESTORE_DISPATCHES_ADAPTERADAPTER_REFERENCED_BY_MANIFESTUI_CALLS_ENDPOINTTEST_COVERS_WORKFLOWIMPLEMENTSEXTENDSEXPORTSTYPE_EXPORTSRelationship claims can include EdgeEvidenceV1, which carries edge kind,
source, confidence, reason, path/symbol endpoints, optional range, and
stale/degraded flags.
Public query exports live in src/queries.ts, intentionally kept as a thin
barrel. Implementations live under src/query/.
Key query modules:
search.ts: repo maps, raw/BM25/exact/symbol/semantic search, and target
discovery.context.ts: context_pack, task_brief, focus_brief, and
session_context.impact.ts: file/symbol blast-radius expansion and verification recipes.graph-traversal.ts: callers, callees, and dependency paths.workflow.ts: route/job/manifest workflow traces.change-plan.ts: pre-edit plans and saved snapshots.post-edit.ts: dirty-tree review against saved snapshots.test-plan.ts and tests.ts: test recommendations and provenance.verification.ts: command coverage, command envelopes, and verification
ledger entries.session-memory.ts: cache-only working memory queries.Query sessions (src/query/session.ts) carry the repo root, loaded index,
freshness, git state, command budget, warnings, provenance, changed files, and
changed symbols. Worktree inspection is allowed to degrade; an empty changed-file
set with degradation warnings means "unknown", not "clean".
Entry point: src/mcp.ts.
Codexa registers a query-only MCP server. Stdio is the default transport for local Codex use. Streamable HTTP is available only on loopback addresses unless future auth/origin policy is added.
MCP tools:
freshness
repo_map
find_context
search
placeholder_report
symbol_context
impact
diff_impact
test_plan
task_brief
context_pack
focus_brief
session_context
callers
callees
dependency_path
workflow_path
change_plan
post_edit_review
session_memory
MCP resources expose generated .codex/codebase/ artifacts. MCP prompts expose
small workflow prompts for impact-before-edit, dirty-diff review, snapshot edit
loops, and targeted test planning.
MCP tools may update Codexa-generated artifacts or cache state when auto-refresh, snapshots, or session memory are enabled. They do not expose a source-editing tool.
Adapters:
src/cli.ts: Commander-based CLI.src/init.ts: repo-local MCP config and hook setup.integrations/claude-code/: Claude Code plugin, hooks, and slash commands.plugins/codexa/: Codex plugin bundle with manifest, skill, and MCP wrapper.Operational tools:
src/doctor.ts: local readiness checks.src/github-sync.ts: git/GitHub sync diagnostics.src/github-release.ts: release notes, tags, and GitHub Release flow.scripts/*.mjs and scripts/*.sh: source hygiene, privacy, package smoke,
public snapshot, benchmark, and publish gates.Semantic retrieval is opt-in and cache-based.
Build the cache:
codexa semantic-index /path/to/project --provider openai
codexa semantic-index /path/to/project --provider local-command --command ./embed-jsonl
After the cache exists, query commands can use it automatically when the snapshot
and provider settings match. --semantic forces diagnostics, and
--no-semantic disables the lane for one call.
OpenAI uses OPENAI_API_KEY and defaults to text-embedding-3-small.
local-command receives JSONL on stdin and returns embedding records. Codexa
does not ship a vector database and does not call embedding providers unless the
semantic cache/provider path is configured or explicitly forced.
LSP assist is read-only and bounded. Enable it with --lsp or
CODEXA_LSP=1 on supported query commands.
Codexa can query:
typescript-language-server --stdiobasedpyright-langserver --stdiopyright-langserver --stdioLSP failures are warnings in the packet, not hard failures. LSP never edits source files.
Codexa does not vendor Semgrep, CodeQL, ShellCheck, or other scanner engines. The default safe shape is report ingestion:
codexa static-analysis /path/to/project \
--semgrep-report /tmp/semgrep.json \
--codeql-report /tmp/codeql.sarif \
--symbol-report /tmp/codexa-symbols.json
Codexa also accepts a bounded CodexaSymbolReportV1 JSON document so external
language tools can feed symbols and relationships into Codexa with derived
confidence.
Scanner execution flags such as --run-semgrep, --run-codeql, and
--run-shellcheck are explicit opt-ins. They run installed local tools under
scrubbed environments and write reports under .codex/static-analysis/.
codexa init writes advisory hooks when Codex hooks are available:
hook-pre-edit saves an implicit pre-edit baseline when no change-plan
snapshot exists (and reminds the agent that an explicit change_plan
upgrades it with planned scope and tests).hook-post-edit runs a bounded post-edit review after edits.AutoVerify command execution is disabled unless user-owned autonomy is
full-access or the environment sets CODEXA_AUTOVERIFY=1 /
CODEXA_AUTOVERIFY=true. Even then, AutoVerify is hook-only. MCP
post_edit_review never spawns commands.
AutoVerify is not a sandbox. Test code still runs locally with the user's file permissions. Codexa records whether verification mutated source/test/provenance state and treats such reports as non-covering evidence.
| Path | Purpose |
|---|---|
src/cli.ts | CLI command registration and option parsing. |
src/indexer.ts | Main index pipeline orchestration. |
src/indexer/ | Discovery, parsing, graph stage, ranking, freshness, and artifact writing helpers. |
src/parser/ | Tree-sitter and shallow language extraction. |
src/resolver.ts | Import, alias, usage, and symbol relationship resolution. |
src/graph.ts | Typed graph and workflow trace construction. |
src/query/ | Query packets, edit planning, post-edit review, test planning, and verification logic. |
src/mcp.ts | MCP server creation and transport setup. |
src/mcp/ | MCP tool/resource/prompt registration, runtime refresh, result compaction, and session-memory adapter code. |
src/session-memory/ | Cache-only structured working memory store. |
src/semantic-retrieval.ts | Optional embedding cache build/query lane. |
src/static-analysis.ts | Static-analysis report import and optional scanner execution. |
src/autoverify.ts | Hook-only targeted verification runner. |
src/github-sync.ts | GitHub source-sync diagnostics. |
src/github-release.ts | GitHub Release and restore-note generation. |
scripts/ | Hygiene, privacy, package, benchmark, and publish checks. |
tests/ | Vitest coverage for indexing, MCP, CLI hooks, session memory, static analysis, packaging, and release helpers. |
docs/architecture/ | Design notes for the context server and session memory. |
integrations/claude-code/ | Claude Code plugin adapter and smoke tests. |
plugins/codexa/ | Codex plugin package. |
Codexa is deliberately constrained:
Context commands can refresh generated .codex/codebase/ artifacts. Snapshot
and session-memory tools can write under .codex/cache/. Those are Codexa-owned
state paths, not source edits.
Common development commands:
npm run typecheck
npm run lint
npm run privacy
npm test
npm run check
npm run check runs typecheck, source hygiene, release-path hygiene, privacy,
Claude Code smoke tests, and the Vitest suite.
Release-oriented checks:
npm run smoke:package
npm run benchmark:ci
npm run public:snapshot-check
npm run package:hygiene
npm run security:check
security:check runs the development gate, dependency audit, clean-tree public
snapshot verification, package hygiene, and installed-package smoke test. The
public snapshot check intentionally refuses a dirty tree so the verified archive
matches HEAD.
Codexa has a structured eval harness:
node dist/cli.js eval /path/to/project --suite all --seed codexa-v1-benchmark
The eval scores structured query data, not prose. It compares Codexa packets
against raw rg/git status baselines, tracks recall/precision/test
recommendations/context size, and can run ranking experiments without changing
production ranking. The claim is deliberately falsifiable: a scenario fails
outright if the raw-grep baseline does the job better, and the harness runs in
CI — so "beats grep on its scenarios" is a gate, not a one-off benchmark.
Measured results for v0.2.0 (seed codexa-v020-release, full suite, archived
in reports/benchmarks/v0.2.0-eval.json):
| Metric | Result |
|---|---|
| Scenarios passed | 20/20 (2 project, 12 synthetic anti-cheat, 6 historical fixture) |
| File recall (mean) | 1.00 |
| Precision@k (mean) | 1.00 |
| Test recall (mean) | 1.00 |
Scenarios where raw rg/git beat Codexa | 0 |
| Packet size vs. raw baseline output (mean) | 0.66x |
| Over-budget packets | 0 |
Do not update public benchmark claims without rerunning the eval on the current checkout and current target.
Use GitHub Releases as the visible source timeline for the current project.
Source sync diagnostic:
codexa github-sync-check /path/to/codexa-checkout
codexa github-sync-check /path/to/codexa-checkout --no-network
GitHub Release dry run and real release:
npm run release:github:dry-run -- --tag v0.2.0
npm run release:github -- --tag v0.2.0
The release command generates a changelog-style summary, changed-area summary,
restore commands, branch/worktree continuation commands, and forward-only PR rollback commands.
Official releases should come from a clean main after the normal GitHub flow
has landed.
Release Please runs after pushes to main. It reads conventional commits,
opens or updates a release PR with the package version and changelog changes,
and creates the GitHub Release after that release PR is merged.
This does not publish npm on every main merge. Normal feature and fix PRs land
on main first, Release Please batches releasable changes into its release PR,
and npm publishing stays downstream of the GitHub Release event.
Configure a RELEASE_PLEASE_TOKEN GitHub repository secret with a personal
access token that can create pull requests, tags, and releases. Do not use the
default GITHUB_TOKEN for Release Please if npm publishing should happen
automatically, because releases created by GITHUB_TOKEN do not trigger the
separate release: published npm workflow.
The npm package is published by GitHub Actions after the GitHub Release lane
publishes a release. The trigger is release: published; pushed tags alone do
not publish to npm. The workflow checks the released tag, package identity,
repository URL, version availability, and npm run security:check, then runs:
npm publish --registry https://registry.npmjs.org --access public --tag latest --provenance --ignore-scripts
For the first public npm release, configure an NPM_TOKEN GitHub repository
secret with publish access. After the package exists and npm trusted publishing
is configured, the workflow can remove token-based publishing while keeping the
same release gate and --ignore-scripts protection.
Read CONTRIBUTING.md before opening a PR.
What usually fits:
What usually does not fit:
Run this before proposing code changes:
npm run check
MIT. See LICENSE.