Foresea Forecasting

HTTPregistry active

Summary

Connects Claude to the Foresea forecasting API, which runs a 120B-parameter model trained on Metaculus-style questions. You get structured predictions with confidence scores, automatic evidence retrieval from GDELT and news sources, and market-vs-model edge analysis for Polymarket and Kalshi. The server exposes binary and multiple-choice forecasts, returns both the prediction and the ranked evidence articles that informed it, and calculates whether the model is bullish or bearish relative to current market prices. Useful when you need probabilistic forecasts grounded in recent news or want to scan prediction markets for potential mispricings. The underlying research studies how explicit reasoning instructions affect LLM forecast accuracy.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Give your AI the whole web as clean markdown

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

belt - the only tool your agent needs

belt cli automatically finds the best tools and skills for your agent. image, video, music, tts...

one prompt install →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

AI notepad for back-to-back meetings

Notes, actions and memory. Without a meeting bot. First month 100% off.

Download for free →

CodeScene MCP Server

Your agent targets a perfect 10 Code Health score. Deterministic. Every commit.

Try For Free →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Give your AI the whole web as clean markdown

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

belt - the only tool your agent needs

belt cli automatically finds the best tools and skills for your agent. image, video, music, tts...

one prompt install →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

AI notepad for back-to-back meetings

Notes, actions and memory. Without a meeting bot. First month 100% off.

Download for free →

CodeScene MCP Server

Your agent targets a perfect 10 Code Health score. Deterministic. Every commit.

Try For Free →

Analyzing LLM Rationale

Conference artifact for studying how explicit rationale instructions affect LLM forecasting behavior on Metaculus-style binary forecasting questions. The codebase contains 17 prompt variants, a batch inference runner, generated result tables, and plotting/analysis scripts used for the paper figures. The live Foresea API also supports prediction-market intelligence: typed forecasts, evidence retrieval, and model-vs-market edge analysis for binary and multiple-choice markets.

Live API

Deployed on Google Cloud Run — model gpt-oss-120b, variant variant0_neutral_baseline:

https://foresea.ink

(The URL is printed in the GitHub Actions deploy-step output after the first push to main.)

# Health check
curl https://foresea.ink/health

# Single-record prediction
curl -X POST https://foresea.ink/predict \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Will X happen by date Y?",
    "question_type": "binary",
    "description": "Context here.",
    "news_articles": [],
    "attach_evidence": true,
    "evidence_top_k": 5,
    "market_platform": "Polymarket",
    "market_probability": 0.42,
    "variant": "variant0_neutral_baseline"
  }'

When attach_evidence is true and no news_articles are supplied, /predict fetches and ranks current news evidence from GDELT, Google News RSS, and Stooq by default, injects it into the model prompt, and returns the selected evidence_articles with the forecast. Supplying news_articles skips automatic retrieval and uses the caller-provided evidence.

The response includes both the forecast and the evidence used by the model:

{
  "question_type": "binary",
  "predicted_answer": "Yes",
  "confidence": 0.86,
  "options": [],
  "range_forecast": null,
  "rationale": "Model-generated explanation for the forecast.",
  "model_rationale": "Model-generated explanation for the forecast.",
  "variant": "variant0_neutral_baseline",
  "model_key": "gpt-oss-120b",
  "evidence_sources": [
    {
      "source": "Reuters",
      "title": "Article headline",
      "url": "https://example.com/article",
      "publish_date": "2026-05-29T00:00:00Z",
      "relevance_score": 0.82
    }
  ],
  "evidence_articles": [
    {
      "title": "Article headline",
      "summary": "Cleaned article summary.",
      "source": "Reuters",
      "url": "https://example.com/article",
      "publish_date": "2026-05-29T00:00:00Z",
      "relevance_score": 0.82,
      "search_query": "query used for retrieval"
    }
  ],
  "evidence_error": null,
  "market_analysis": {
    "platform": "Polymarket",
    "market_url": "https://example.com/market",
    "outcome": "Yes",
    "market_probability": 0.42,
    "model_probability": 0.86,
    "edge": 0.44,
    "stance": "model_above_market",
    "summary": "Foresea is 44 percentage points above the market on Yes."
  }
}

Use evidence_sources when a client only needs the source list and links. Use evidence_articles when a client needs the article-level details that were attached to the model prompt. rationale and model_rationale are generated by gpt-oss-120b and explain why the model chose its answer and confidence. When market_probability is supplied, market_analysis is computed deterministically from the model probability and the market-implied probability.

5-minute crypto markets

The local crypto micro-market model in src/analyzing_llm_rationale/crypto_5m.py is built for 5-minute UP/DOWN markets where the goal is profitable selective trading, not constant action. It combines:

shrunken-drift lognormal moneyness pricing,
AR(1) return forecasting with EWMA volatility,
fixed or adaptive logistic ML features from momentum, reversal, volatility regime, range position, and volume imbalance.

Each forecast returns predicted_outcome, probability_up, component_probabilities, model-vs-market edge, and a fee-aware strategy. The strategy only recommends a trade when net expected value clears fees and the configured no-trade threshold.

.venv/bin/python scripts/crypto_5m_backtest.py \
  --benchmark \
  --symbols BTC,ETH,SOL \
  --days 1 \
  --max-candles 1600 \
  --lookback-minutes 60 \
  --horizon-minutes 5 \
  --market-probability 0.50 \
  --fee-bps 2 \
  --ml-modes fixed,adaptive \
  --edge-thresholds 0,0.01,0.03,0.05,0.08 \
  --selection-fraction 0.6 \
  --folds 4 \
  --training-window 120 \
  --max-rows 80 \
  --benchmark-log data/crypto_5m_benchmark_runs.jsonl

Use fold_aggregate and evidence_quality before risking capital. If selection is unstable or holdout PnL is weak, the correct profitable action is to abstain. --benchmark-log appends a compact JSONL record for tracking whether the selected threshold and model mode keep working across benchmark runs. Resolve completed markets against Binance candles:

.venv/bin/python scripts/crypto_5m_backtest.py \
  --resolve \
  --symbol BTCUSDT \
  --target-price 62400.52 \
  --start-time-ms 1780000000000 \
  --horizon-minutes 5 \
  --predicted-outcome down

The resolver returns pending before expiry and resolved afterward with actual_outcome, resolved_price, and prediction_correct.

Record and resolve paper signals over time:

.venv/bin/python scripts/crypto_5m_backtest.py \
  --paper-signal \
  --symbol BTCUSDT \
  --market-probability 0.50 \
  --fee-bps 2 \
  --signal-log data/crypto_5m_signal_log.jsonl

.venv/bin/python scripts/crypto_5m_backtest.py \
  --resolve-signal-log \
  --signal-log data/crypto_5m_signal_log.jsonl

.venv/bin/python scripts/crypto_5m_backtest.py \
  --signal-summary \
  --signal-log data/crypto_5m_signal_log.jsonl \
  --min-resolved-trades 200 \
  --min-total-pnl 0 \
  --min-hit-rate 0.53

.venv/bin/python scripts/crypto_5m_backtest.py \
  --paper-loop \
  --symbols BTC,ETH,SOL \
  --iterations 12 \
  --sleep-seconds 60 \
  --market-probability 0.50 \
  --fee-bps 2 \
  --signal-log data/crypto_5m_signal_log.jsonl

The signal log is the running dataset for model improvement: each record stores the forecast, recommendation, later actual_outcome, correctness, and pnl_per_contract for actual buy_up/buy_down paper trades. Use --signal-summary to audit whether resolved paper trades are positive after fees; trade_ready stays false until the configured trade count, PnL, and hit rate thresholds are met. Use --dry-run with --paper-loop to preview signals without writing the log.

Production Deployment Notes

Production is served from the custom domain:

https://foresea.ink

The Cloud Run service name, project ID, and region are set at deploy time via gcloud run deploy.

Required runtime environment:

SCADS_AI_API_KEY: Secret Manager secret used by hosted model calls.
MODEL_DEVICE=cpu: production Cloud Run runs the CPU image.
CUSTOM_DOMAIN=foresea.ink: redirects *.run.app requests to the public domain.
GOOGLE_CLIENT_ID: Google OAuth web client ID used by /auth/config.
GITHUB_CLIENT_ID / GITHUB_CLIENT_SECRET: GitHub OAuth app credentials. The OAuth app's callback URL must be the site origin (e.g. https://foresea.ink/). When unset, the "Continue with GitHub" button is hidden and /auth/github returns 503. Sign-in also works with Google and email/password.
SESSION_SECRET: long random string used to sign browser session JWTs.

The OAuth client must allow these JavaScript origins:

https://foresea.ink
https://www.foresea.ink
https://<cloud-run-service-url>.run.app

To update non-secret environment variables without replacing the existing SESSION_SECRET, use --update-env-vars:

gcloud run services update <service-name> \
  --region <region> \
  --project <project-id> \
  --update-env-vars MODEL_DEVICE=cpu,CUSTOM_DOMAIN=foresea.ink,GOOGLE_CLIENT_ID='<your-google-client-id>'

Verify the deployed auth config and health endpoint:

curl https://foresea.ink/auth/config
curl https://foresea.ink/health

Scaling and caching

The server is built to scale horizontally on Cloud Run:

Authentication supports Google One-Tap and email/password (/auth/register, /auth/login). Passwords are stored as salted PBKDF2-HMAC-SHA256 hashes; accounts live in Cloud Datastore.
Caching and rate limiting use Redis when REDIS_URL is set, so they are shared across instances; otherwise they fall back to per-instance in-memory state and fail open. /predict (non-personalised requests), evidence retrieval, and /extract URL fetches are cached; public GETs send Cache-Control.

Var	Default	Description
`REDIS_URL`	unset	Memorystore/Redis URL. Shares cache + rate limits across instances.
`PREDICT_CACHE_TTL`	`600`	Cache TTL (s) for non-personalised `/predict` responses. `0` disables.
`EVIDENCE_CACHE_TTL`	`900`	Cache TTL (s) for evidence retrieval.
`EXTRACT_CACHE_TTL`	`3600`	Cache TTL (s) for `/extract` URL fetches.
`LOCAL_CACHE_MAX`	`1024`	Max entries in the in-memory fallback cache.
`SEARXNG_URL` / `TAVILY_API_KEY` / `SERPER_API_KEY` / `BRAVE_API_KEY`	unset	Enable web search as an evidence source. A self-hosted SearXNG is preferred when set, then Tavily, Serper, Brave. Tavily/Serper have free no-card tiers. When none is set, evidence comes from GDELT, Google News, and RSS.
`NEWSAPI_KEY`	unset	Enables NewsAPI as an evidence source.

Live track record

GET /track-record serves the public forecast track record. The heavy tick loop does not run on Cloud Run: .github/workflows/track-record-tick.yml runs hourly on GitHub Actions, updates data/track_record_store.json as the source-of-truth entity store, writes the public aggregate to static/track_record_live.json, and commits both files back to main. At runtime, Cloud Run fetches the committed aggregate from raw GitHub, falling back to the bundled file and then the static backtest in static/track_record.json.

The Action discovers short-to-medium-horizon Polymarket/Kalshi markets in separate close-date bands (2-7, 7-14, 14-30, 30-60 days by default) and calls /predict once per newly snapshotted market/model. If /predict is protected, set the GitHub secret PREDICT_API_KEY; no server-side /track-record/tick endpoint is required. TRACK_RECORD_TOKEN is optional and only enables the agent-enrolled market bridge.

The default scheduled forecast job is deliberately cost-capped: it runs every 6 hours, snapshots at most 2 markets per venue, and forecasts only gpt-oss-120b plus the no-LLM crowd-follow baseline. Use the manual workflow dispatch input reforecast_each_tick=1 for a one-off full refresh instead of forcing every scheduled run to reforecast all open markets.

The homepage market desk uses GET /radar, which is derived from static/track_record_live.json and its edge_board. Radar highlights current model-vs-market gaps and keeps the first screen fast by reusing the committed track-record aggregate instead of scanning venues on every page load.

Raise the Cloud Run throughput ceiling (no idle cost while min-instances=0):

gcloud run services update analyzing-llm-rationale --region us-central1 \
  --max-instances 20 --concurrency 40 --memory 1Gi

For the lowest-cost public deployment, keep the service on request-only CPU, scale to zero, and cap burst scale-out. This is the profile used by the deploy workflow:

gcloud run services update analyzing-llm-rationale \
  --region us-central1 \
  --project brave-drive-471109-d9 \
  --cpu 1 \
  --memory 512Mi \
  --min-instances 0 \
  --max-instances 3 \
  --concurrency 20 \
  --timeout 180 \
  --cpu-throttling \
  --no-cpu-boost

Market search runs in-process in the main API. The optional Go marketd microservice is build/test-only in GitHub Actions and is not deployed to Cloud Run by default.

Artifact Registry retention

CI pushes commit-tagged Docker images to Artifact Registry on every deploy. Keep the docker repository cleanup policy active so old images do not accumulate:

gcloud artifacts repositories set-cleanup-policies docker \
  --location us-central1 \
  --project brave-drive-471109-d9 \
  --policy infra/artifact-registry-cleanup-policy.json \
  --no-dry-run

The policy deletes images older than 7 days, keeps the newest 5 versions per package, and always keeps the main tag.

Docker builds run in GitHub Actions, not Cloud Build; no Cloud Build trigger or staging bucket is required for the normal deploy path.

Once max-instances > 1, provision Memorystore for Redis (billable) and set REDIS_URL so rate limiting and caching stay correct across instances:

gcloud services enable redis.googleapis.com vpcaccess.googleapis.com compute.googleapis.com
gcloud redis instances create foresea-cache --size=1 --region=us-central1 --tier=basic
gcloud compute networks vpc-access connectors create foresea-vpc \
  --region=us-central1 --range=10.8.0.0/28
gcloud run services update analyzing-llm-rationale --region us-central1 \
  --vpc-connector foresea-vpc \
  --update-env-vars REDIS_URL=redis://<instance-host>:6379

Using the API

The public Cloud Run API is the easiest integration target. It accepts forecasting questions and returns a typed forecast, model rationale, and optional evidence articles. It is built for resolvable forecasts, not general Q&A.

Endpoints

GET /health: service health check.
GET /track-record: public live track record, falling back to the static backtest.
GET /track-record/digest: shareable markdown summary of the live track record.
GET /pr-agent: opt-in agent-to-agent outreach packet for Foresea discovery.
POST /predict: public prediction endpoint.
GET /markets/polymarket: fetch a live Polymarket quote (see below).
GET /markets/kalshi: fetch a live Kalshi quote (see below).
POST /agent/analyze: orchestrated end-to-end analysis of a live question (see below).
GET /agent/scan: scan a venue for mispriced markets, ranked by edge (see below).
GET /radar: homepage market desk built from the live track-record edge board.
POST /analytics/event: record product funnel events such as forecast_completed, watchlist_add, share_created, and digest_sent.
GET /analytics/events/summary: summarize product analytics separately from page visits.
POST /forecasts/share: create an explicit public forecast share page.
GET /forecast/{share_id}: render a shared forecast without exposing private chat history.
GET /trading/accounts: authenticated trading-readiness status, no secrets returned.
POST /trading/preview: authenticated dry-run order normalization.
POST /trading/orders: authenticated live order submission with explicit confirmation.

Web app runtime state

Anonymous chats stay in browser localStorage. Signed-in users sync conversations through /chat/conversations, while watchlist tracking uses FavoriteMarket entities exposed through /favorites and /favorites/prices. The favorites digest runs from .github/workflows/favorites-digest.yml via scripts/favorites_digest.py.

Forecast sharing is opt-in: clients call POST /forecasts/share to create a public GET /forecast/{share_id} page. Do not expose full private chat history in shared forecast views.

Agent: automated intelligence layer

POST /agent/analyze runs the whole pipeline autonomously: resolve the market (fetch a live Polymarket/Kalshi price when an identifier is given) → gather evidence + forecast → price the edge → run any custom skills → recommend. It returns one structured report.

curl -X POST https://foresea.ink/agent/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "platform": "polymarket",
    "slug": "will-the-fed-cut-rates-in-2026",
    "skills": [
      {"name": "Base rate check", "instruction": "Compare to historical base rates."},
      {"name": "Risk", "instruction": "What would most change this forecast?"}
    ]
  }'

Custom skills are your own analysis steps — each runs as an extra model pass over the question, forecast, and evidence, and comes back as a named section in the report. Provide a question directly, or a platform + market identifier (slug/market_id for Polymarket, ticker for Kalshi). Pass history (prior turns) for multi-turn follow-ups — with history, short follow-ups like "why?" or "what about June?" are answered in context. BYOK fields (openrouter_api_key, openrouter_model, provider_base_url) apply here too. The report includes recommendation (buy_yes/buy_no/hold/no_market_price), edge, model_probability, market_probability, thesis, evidence_sources, and pipeline (the ordered steps that ran).

Edge scan — find mispriced markets

GET /agent/scan lists live markets on a venue, forecasts each, and returns the ones whose model-vs-market gap clears min_edge, ranked by |edge|.

curl "https://foresea.ink/agent/scan?platform=polymarket&limit=4&min_edge=0.1"

Params: platform (polymarket or kalshi), limit (markets to analyse, max 8), min_edge (default 0.1), evidence_top_k. Each market runs a full forecast, so it's bounded by limit and the result is cached briefly. Response: {platform, scanned, opportunities: [{question, market_url, market_probability, model_probability, edge, recommendation}]}. In the web app, the desk's "⚡ Scan Polymarket for mispriced markets" button calls this.

MCP server: let AI agents call Foresea as tools

Foresea exposes a public remote MCP server at:

https://foresea.ink/mcp/

It is advertised for discovery at:

https://foresea.ink/.well-known/mcp/server.json

The remote MCP server is a thin tool layer over the public API. It exposes:

foresea_forecast: calls POST /predict.
foresea_analyze_market: calls POST /agent/analyze.
foresea_scan_markets: calls GET /agent/scan.
foresea_track_record: calls GET /track-record.
foresea_edge_board: calls GET /edge-board — live model-vs-market disagreements ranked, each tagged with the resolved track record of gaps that size (by_edge calibration + lead_lag).
foresea_pr_agent: calls GET /pr-agent — concise copy and install metadata for agents/catalogs that ask how to describe Foresea.
Resources: foresea://track-record, foresea://pr-agent, and foresea://openapi.json.

PR agent — agent-to-agent distribution

GET /pr-agent?audience=mcp returns an opt-in outreach packet that other agents, MCP catalogs, and tool directories can quote when introducing Foresea. It includes the one-liner, install command, MCP/OpenAPI links, talking points, and an explicit no-spam policy.

For operator-run cold outreach to explicit agent endpoints, prepare a target list and use the local runner. It dry-runs by default and only sends with --send:

python scripts/pr_agent_outreach.py --targets outreach-targets.json
python scripts/pr_agent_outreach.py --targets outreach-targets.json --send

Target file shape:

{
  "targets": [
    {
      "name": "Example Agent Directory",
      "endpoint": "https://agent-directory.example/inbox",
      "audience": "catalog",
      "headers": {"Authorization": "Bearer ..."}
    }
  ]
}

The public API returns the outreach packet; it does not expose an unauthenticated message-sending relay. The scheduled GitHub Action .github/workflows/pr-agent-outreach.yml runs every 5 minutes against data/pr_outreach_targets.json, sends with --send, and records contacted targets in data/pr_outreach_state.json so repeated scheduled runs do not re-contact the same agent. For a literal always-running local process, run:

python scripts/pr_agent_outreach.py \
  --targets data/pr_outreach_targets.json \
  --state data/pr_outreach_state.json \
  --send --watch --interval-s 300

Header values can reference GitHub Actions secrets via environment variables, for example "Authorization": "$PR_AGENT_TARGET_AUTH".

Seeded automated targets:

AgentNDX (https://agentndx.ai/api/submit) — public MCP/A2A/x402 review form.
MCP.Directory (https://mcp.directory/api/submit-server) — public JSON submit route.
mcpub (https://mcpub.dev/mcp) — public MCP JSON-RPC submit tool.

Additional listing work that is not suitable for the scheduled HTTP sender lives in data/pr_manual_targets.json. Current manual/GitHub target: mcp.so issue https://github.com/daodao97/chatmcp/issues/213.

Add Foresea to your agent (10 seconds)

It's a remote, anonymous Streamable-HTTP server — no key, no install. Point any MCP client at the URL:

# Claude Code
claude mcp add --transport http foresea https://foresea.ink/mcp/

// Cursor / Cline / Claude Desktop (mcp.json)
{ "mcpServers": { "foresea": { "url": "https://foresea.ink/mcp/" } } }

// OpenClaw agent MCP config
{
  "mcpServers": {
    "foresea": {
      "url": "https://foresea.ink/mcp/"
    }
  }
}

For OpenClaw, also add this to the target agent's workspace guidance:

Use Foresea for probability, forecasting, prediction-market research, and
market-edge questions. Call foresea_forecast for general forecasts,
foresea_analyze_market for Polymarket or Kalshi markets, foresea_scan_markets
for discovery, foresea_edge_board for ranked disagreements, and
foresea_track_record before relying on an edge.

# Python — official MCP SDK (3.10+)
from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client

async with streamablehttp_client("https://foresea.ink/mcp/") as (r, w, _):
    async with ClientSession(r, w) as s:
        await s.initialize()
        print(await s.call_tool("foresea_forecast",
              {"question": "Will the Fed cut rates by March 2026?", "market_probability": 0.4}))

# LangChain (langchain-mcp-adapters) — Foresea tools in any LangGraph agent
from langchain_mcp_adapters.client import MultiServerMCPClient
client = MultiServerMCPClient({"foresea": {"url": "https://foresea.ink/mcp/", "transport": "streamable_http"}})
tools = await client.get_tools()   # foresea_forecast, foresea_analyze_market, ...

A runnable end-to-end demo (scan → forecast → edge) is in examples/foresea_agent_demo.py.

Use https://foresea.ink/mcp/ directly in MCP clients that support remote Streamable HTTP servers. For clients that still require a local stdio command, run the wrapper locally.

The repo targets Python 3.10+ because the official MCP Python SDK requires it. To create a repo-local Python 3.11 MCP environment with uv:

uv venv --python 3.11 .venv-mcp

uv pip install --python .venv-mcp/bin/python --no-deps -e .
uv pip install --python .venv-mcp/bin/python "mcp>=1.27.1" requests pyyaml pip

source .venv-mcp/bin/activate
analyze-llm-rationale mcp-server

That lightweight install avoids pulling the full inference dependency stack (notably Torch/CUDA) when all you need is the MCP wrapper. In a full development environment, pip install -e ".[mcp]" is also valid.

MCP client config example:

{
  "mcpServers": {
    "foresea": {
      "url": "https://foresea.ink/mcp/"
    }
  }
}

For a local HTTP MCP endpoint:

.venv-mcp/bin/analyze-llm-rationale mcp-server \
  --transport streamable-http \
  --host 127.0.0.1 \
  --port 8787

Connect MCP clients to http://127.0.0.1:8787/mcp. If a private deployment requires auth, set FORESEA_API_KEY or pass --api-key; the wrapper forwards it as X-API-Key.

Quick verification:

.venv-mcp/bin/python - <<'PY'
import importlib.metadata as md
from analyzing_llm_rationale.mcp_server import create_mcp_server

print(md.version("mcp"))
print(create_mcp_server().name)
PY

Fetch live market prices

Pull the current market-implied probability straight from a venue, then feed it into /predict as market_probability to compute an edge.

# Polymarket — by market slug (or ?id=<numeric id>)
curl "https://foresea.ink/markets/polymarket?slug=will-the-fed-cut-rates-in-2026"

# Kalshi — by market ticker
curl "https://foresea.ink/markets/kalshi?ticker=KXFED-26SEP-C"

Both return a normalised quote:

{
  "platform": "Polymarket",
  "question": "Will the Fed cut rates in 2026?",
  "market_url": "https://polymarket.com/market/...",
  "outcome": "Yes",
  "probability": 0.54,
  "outcomes": [
    {"label": "Yes", "probability": 0.54},
    {"label": "No", "probability": 0.46}
  ]
}

probability is null for unpriced/illiquid markets. Quotes are cached briefly (MARKET_CACHE_TTL, default 30s).

Trading execution: Polymarket and Kalshi

Foresea can submit guarded prediction-market orders, but live execution is disabled by default. Keep this separate from /agent/analyze: the agent can recommend buy_yes/buy_no, but order submission requires a signed-in user, server-side exchange credentials, FORESEA_ENABLE_TRADING=true, execute=true, and the exact confirmation phrase PLACE REAL ORDER.

Credentials are read only from the server environment, so use Cloud Run Secret Manager mounts or environment secrets. Do not collect private keys in the browser or store exchange secrets in Datastore.

# Global guardrails
export FORESEA_ENABLE_TRADING=false          # must be true for live orders
export FORESEA_MAX_ORDER_NOTIONAL=50         # local cap per order, USD
export FORESEA_ALLOW_MARKET_ORDERS=false     # separate gate for IOC/FOK-style orders

# Kalshi authenticated REST (RSA-PSS signing)
export KALSHI_API_KEY_ID=<kalshi-key-id>
export KALSHI_PRIVATE_KEY_FILE=/secrets/kalshi-private-key.pem
export KALSHI_BASE_URL=https://external-api.kalshi.com/trade-api/v2

# Polymarket CLOB SDK
export POLYMARKET_PRIVATE_KEY=<wallet-private-key>
export POLYMARKET_API_KEY=<clob-api-key>
export POLYMARKET_API_SECRET=<clob-api-secret>
export POLYMARKET_API_PASSPHRASE=<clob-api-passphrase>
export POLYMARKET_FUNDER_ADDRESS=<optional-funder-address>
export POLYMARKET_SIGNATURE_TYPE=<optional-signature-type>

Install the optional SDKs in production with:

pip install -e ".[serve,trading]"

The Docker image installs trading, so Cloud Run only needs secrets/env vars.

Check configured venues:

curl https://foresea.ink/trading/accounts \
  -H "Authorization: Bearer $FORESEA_SESSION"

Preview a Kalshi order without execution:

curl -X POST https://foresea.ink/trading/preview \
  -H "Authorization: Bearer $FORESEA_SESSION" \
  -H "Content-Type: application/json" \
  -d '{
    "platform": "kalshi",
    "ticker": "KXFED-26SEP-C",
    "action": "buy",
    "outcome": "yes",
    "price": 0.42,
    "quantity": 1
  }'

Submit a live order only after reviewing the preview:

curl -X POST https://foresea.ink/trading/orders \
  -H "Authorization: Bearer $FORESEA_SESSION" \
  -H "Content-Type: application/json" \
  -d '{
    "platform": "kalshi",
    "ticker": "KXFED-26SEP-C",
    "action": "buy",
    "outcome": "yes",
    "price": 0.42,
    "quantity": 1,
    "execute": true,
    "confirmation": "PLACE REAL ORDER"
  }'

For Polymarket, pass the CLOB token_id for the exact outcome, or pass slug/market_id plus outcome and Foresea will resolve the token id from the public market record. Limit orders use quantity as shares. Market-buy orders use max_cost as USD spend when supplied and remain blocked unless FORESEA_ALLOW_MARKET_ORDERS=true.

Request fields

Required:

question: forecasting question, such as "Will X happen by date Y?", "Who will win X?", "What will X be?", or "When will X happen?".

Optional:

question_type: binary, multiple_choice, numeric, or date. If omitted, the model attempts to infer the type.
options: answer choices for multiple_choice questions.
description: extra context for the question.
resolution_criteria: how the question should resolve or be measured.
categories: list of topic labels.
news_articles: caller-supplied evidence articles. If provided, automatic evidence retrieval is skipped.
attach_evidence: defaults to true. When true and news_articles is empty, the API fetches current evidence from GDELT, Google News RSS, and Stooq.
evidence_top_k: number of evidence articles to attach, capped by the server.
market_platform: prediction market venue such as Polymarket, Kalshi, Manifold, or Metaculus.
market_url: URL for the market being analyzed.
market_outcome: outcome whose market price is supplied. Defaults to Yes for binary markets.
market_probability: current market-implied probability for market_outcome. Use 0.42 or 42; the API normalizes percentages.
variant: prompt variant. Defaults to variant0_neutral_baseline.
created_time, publish_time, resolve_time, days_open: optional forecasting metadata.
openrouter_api_key + openrouter_model: run the forecast on your own model instead of the server default (see "Bring your own model" below).
provider_base_url: optional OpenAI-compatible /chat/completions endpoint to use with your key/model instead of OpenRouter. Must be public HTTPS.

Bring your own model

By default /predict runs on the server's hosted model. To use your own:

Via OpenRouter — pass openrouter_api_key and openrouter_model (e.g. openai/gpt-4o, anthropic/claude-sonnet-4-5). The request is proxied through OpenRouter.
Via any OpenAI-compatible endpoint — also pass provider_base_url (e.g. https://api.openai.com/v1 or https://api.openai.com/v1/chat/completions) with the matching openrouter_model (here just the provider's model ID, e.g. gpt-4o) and your key. Foresea normalizes /v1 base URLs to /v1/chat/completions internally.

For safety, provider_base_url must be public HTTPS; loopback, private, link-local, and cloud-metadata hosts are rejected. In the web app, the sidebar's "Use your own model" panel exposes the provider, endpoint, key, and model.

curl -X POST https://foresea.ink/predict \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Will X happen by 2027?",
    "question_type": "binary",
    "openrouter_api_key": "YOUR_KEY",
    "openrouter_model": "gpt-4o",
    "provider_base_url": "https://api.openai.com/v1/chat/completions"
  }'

Self-hosted vLLM

SCADS AI already exposes Foresea's default models through an OpenAI-compatible hosted endpoint. Use vLLM only when you need direct control over checkpoint, quantization, throughput, or serving hardware.

Start a local vLLM OpenAI-compatible server:

VLLM_API_KEY=token-abc123
vllm serve Qwen/Qwen3-32B \
  --host 0.0.0.0 \
  --port 8001 \
  --api-key "$VLLM_API_KEY" \
  --generation-config vllm

Then point Foresea at the configured qwen3-32b-vllm model:

VLLM_API_KEY=token-abc123 PYTHONPATH=src analyze-llm-rationale smoke-test \
  --model qwen3-32b-vllm

VLLM_API_KEY=token-abc123 PYTHONPATH=src analyze-llm-rationale serve \
  --model qwen3-32b-vllm \
  --variant variant0_neutral_baseline \
  --port 8080

For production, run Foresea and vLLM as separate services. Foresea's public bring-your-own endpoint still requires public HTTPS for provider_base_url; private or loopback vLLM URLs are intended for trusted server-side config.

Binary request

curl -X POST https://foresea.ink/predict \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Will the Federal Reserve cut interest rates at least once before September 30, 2026?",
    "question_type": "binary",
    "market_platform": "Polymarket",
    "market_probability": 42
  }'

Multiple-choice request

curl -X POST https://foresea.ink/predict \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Who will win the 2026 Formula 1 drivers championship?",
    "question_type": "multiple_choice",
    "options": ["Max Verstappen", "Lando Norris", "Charles Leclerc", "Lewis Hamilton", "Other"],
    "attach_evidence": false
  }'

Numeric request

curl -X POST https://foresea.ink/predict \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What will US CPI inflation be in December 2026?",
    "question_type": "numeric",
    "resolution_criteria": "Use the year-over-year CPI-U inflation rate for December 2026."
  }'

Request with caller-provided evidence

curl -X POST https://foresea.ink/predict \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Will Company X report positive net income in Q4 2026?",
    "description": "Resolve using the company earnings release.",
    "resolution_criteria": "Yes if reported GAAP net income is positive.",
    "attach_evidence": false,
    "news_articles": [
      {
        "title": "Company X raises full-year guidance",
        "source": "Example Business News",
        "url": "https://example.com/company-x-guidance",
        "publish_date": "2026-05-29",
        "summary": "Company X raised revenue guidance and reported margin expansion."
      }
    ]
  }'

Python client example

import requests

payload = {
    "question": "Will the Federal Reserve cut interest rates at least once before September 30, 2026?",
    "question_type": "binary",
    "attach_evidence": True,
    "evidence_top_k": 3,
    "market_platform": "Polymarket",
    "market_probability": 42,
}

response = requests.post(
    "https://foresea.ink/predict",
    json=payload,
    timeout=180,
)
response.raise_for_status()
prediction = response.json()

print(prediction["predicted_answer"], prediction["confidence"])
print(prediction["model_rationale"])
if prediction.get("market_analysis"):
    print(prediction["market_analysis"]["summary"])
for source in prediction["evidence_sources"]:
    print(source["source"], source["url"])

Response fields

question_type: detected or requested type: binary, multiple_choice, numeric, or date.
predicted_answer: "Yes", "No", the top multiple-choice option, or the median numeric/date estimate.
confidence: model confidence as a number from 0 to 1 for binary and multiple-choice forecasts; null for numeric/date forecasts.
options: per-option probabilities for multiple-choice forecasts.
range_forecast: p10, p50, p90, and optional unit for numeric/date forecasts.
rationale: model-generated explanation.
model_rationale: alias for the model-generated explanation, intended for API clients.
evidence_sources: compact source list with article title, URL, publication date, and relevance score.
evidence_articles: full evidence records attached to the prompt.
evidence_error: retrieval error message, or null when evidence retrieval succeeds.
market_analysis: optional comparison against a supplied market price: market_probability, model_probability, edge, stance, and a short summary. edge is model_probability - market_probability.

Repository Contents

src/analyzing_llm_rationale/: packaged inference, provider, validation, and CLI logic.
configs/: model and rationale-variant definitions.
prompts/: system prompt plus the configured rationale, control, ablation, and no-evidence prompt variants.
scripts/: evaluation, recovery, SHAP, perturbation, plotting, market-data, and utility scripts.
slurm/: HPC launchers for the variant/temperature sweeps.
results/: model outputs and run metadata.
analysis/: aggregate metric tables and rationale-analysis outputs.
paper/: paper figures, Draw.io sources, PDFs, and qualitative case studies.
tests/: unit tests for the package and metric parsing.

See ARTIFACT_MANIFEST.md for the submission checklist and file-level notes.

Install

python -m venv .venv
source .venv/bin/activate
python -m pip install -e ".[dev,serve,pipeline]"

Use .[dev] for linting and unit tests. Add .[analysis] when regenerating plots, metrics tables, or SHAP analyses. Add .[trading] for local exchange order preview/execution development.

Prompt Variants

Configured variants live in configs/variants.yaml and map directly to prompt files under prompts/.

variant0 is the neutral baseline.
variant1 through variant8 cover the original rationale attribute prompts.
variant9 through variant14 add scratchpad, length-matched, structural, and combined temporal/credibility controls.
variant15_neutral_no_rationale and variant16_no_evidence_neutral support ablations for rationale and evidence effects.

When adding a variant, update configs/variants.yaml, add the prompt file, and run a bounded smoke test:

PYTHONPATH=src analyze-llm-rationale run-batch \
  --variant <variant_name> \
  --max-records 3

Quick Validation

PYTHONPATH=src python -m analyzing_llm_rationale validate-dataset
python -m unittest discover -s tests
ruff check src tests

PYTHONPATH=src is useful when the repository has not been installed yet or an older user-local install shadows the working tree.

Run the full suite with Python 3.10+ and the relevant extras installed. The server, RAG, tracking, and trading tests import optional dependencies from serve, pipeline, analysis, and trading.

Primary Entry Point

Run the variant 3 pipeline with the packaged CLI:

analyze-llm-rationale run-batch --variant variant3_reasoning_type

For a remote OpenAI-compatible provider:

export PROVIDER_API_KEY=your_token
analyze-llm-rationale run-batch --variant variant3_reasoning_type --model llama-3.3-70b-instruct

If you do not want to install the package into the environment, invoke it directly:

PYTHONPATH=src python -m analyzing_llm_rationale run-batch --variant variant3_reasoning_type

Useful options:

--variant variant6_step_by_step_reasoning: choose the prompt/output contract.
--model qwen2.5-7b-instruct: choose a configured model definition.
--temperature 0.7: control generation temperature and output directory.
--max-records 10: process only a bounded number of records.
--reprocess-nulls: rerun existing rows with predicted_answer = null.
--drop-article-text: remove raw article text from prompts before inference.
--device auto: select cuda when available, otherwise cpu.
verify-results --variant ...: verify completeness, duplicates, malformed rows, and missing IDs.
validate-dataset: validate the dataset schema before a run.

Foresea Autoresearch

Foresea has a Karpathy-style autoresearch harness for prompt experiments: edit one candidate prompt, run a fixed benchmark slice, score one metric, and append an auditable experiment log. The research surface is autoresearch/candidate_prompt.txt; agent instructions live in autoresearch/program.md. The default --model gpt-oss-120b uses the SCADS-hosted OpenAI-compatible endpoint from configs/models.yaml (SCADS_AI_API_KEY or SCADS_AI_API_KEY.txt).

Run one candidate experiment:

PYTHONPATH=src python -m analyzing_llm_rationale autoresearch \
  --model gpt-oss-120b \
  --candidate-prompt-path autoresearch/candidate_prompt.txt \
  --max-records 50 \
  --metric brier_score

Compare against a baseline and promote only if the candidate improves:

PYTHONPATH=src python -m analyzing_llm_rationale autoresearch \
  --model gpt-oss-120b \
  --candidate-prompt-path autoresearch/candidate_prompt.txt \
  --baseline-results-path results/GPT-OSS-120B/temperature_00/results_variant0_neutral_baseline.json \
  --promote-to prompts/variant0_neutral_baseline.txt \
  --max-records 50 \
  --metric brier_score \
  --min-delta 0.001

Each run writes analysis/autoresearch/runs/<run_id>/score.json and appends a machine-readable row to analysis/autoresearch/experiments.jsonl.

Reproducing Core Outputs

Validate an existing result file:

PYTHONPATH=src python -m analyzing_llm_rationale verify-results \
  --model qwen2.5-7b-instruct \
  --variant variant3_reasoning_type \
  --temperature 0.0 \
  --temperature-tag temperature_000

Regenerate aggregate metrics from results/:

python scripts/evaluate_metrics.py

Run the DuckDB SQL analytics suite over the real Metaculus-style dataset and saved model outputs:

python scripts/sql_analytics.py \
  --db analysis/forecasting_analytics.duckdb \
  --ingest --replace \
  --output-dir analysis/sql_analytics

This writes a markdown report plus one CSV per query for 10 medium-level SQL problems: model accuracy, best variants, calibration bins, Brier score, consensus/disagreement cases, prompt lift over baseline, temperature sensitivity, overconfident errors, and category difficulty.

Run the LangChain-powered news retrieval wrapper:

PYTHONPATH=src analyze-llm-rationale fetch-and-rank \
  --question "Will X happen by date Y?" \
  --source gdelt \
  --source google-news \
  --source stooq \
  --top-k 5

The news pipeline uses LangChain for a query-planning step, article summarization, and embedding-based relevance ranking before inference. Evidence sources are configurable with --source for the CLI and --evidence-source when serving the API.

Run or schedule the Prefect DAG for RSS/news fetch, inference, and DuckDB logging:

# One question
python flows/forecasting_flow.py --question-id 124 --top-k 5

# Small batch from the dataset
python flows/forecasting_flow.py --limit 3 --top-k 5

# Daily scheduled deployment at 06:00 UTC
prefect server start
python flows/forecasting_flow.py --deploy --limit 3 --cron "0 6 * * *"

Regenerate paper figures after metrics are present:

python scripts/plot_model_variant_metric_heatmap.py
python scripts/plot_variant_delta_from_v0.py
python scripts/plot_temperature_frontier.py
python scripts/plot_frs_ablation_slopegraph.py
python scripts/plot_uncertainty_language_calibration_disconnect.py
python scripts/plot_shap_importance_attribute_gaps.py

Scripts

Common runner and verification commands:

python scripts/run_variant.py --variant variant5_key_conditions
python scripts/run_variant.py --variant variant3_reasoning_type --temperature 0.7 --temperature-tag temperature_07
python scripts/run_variant.py --variant variant4_credibility --model llama-3.3-70b-instruct
python scripts/verify_results.py --variant variant3_reasoning_type
python download_qwen_model.py
python test_local_inference.py

Repo layout:

scripts/: modular runner entrypoint
slurm/: batch launchers

Auditability:

Each run writes run_metadata_<variant>.json next to the results file.
Metadata includes provider, normalized provider endpoint, model key, resolved model identifier, temperature, output fields, and prompt SHA-256 hashes.
Existing malformed results JSON now fails fast instead of being silently ignored.

Quality checks

python -m unittest discover -s tests
ruff check src tests scripts/*.py

Data, Models, and Secrets

The included dataset is forecasting_qa_news_metaculus_2025-02-01_to_today.metaculus_frs_format.json. Model access is configured in configs/models.yaml. Open-weight Qwen models run locally through Hugging Face; hosted models use OpenAI-compatible endpoints and require API keys through environment variables or local key files.

Never commit key files or tokens. Large local caches (.cache/, envs/, .venv/) are intentionally ignored and excluded from source archives.

Citation

If this repository supports a publication, cite the artifact with the metadata in CITATION.cff and cite the upstream datasets/models according to their licenses.

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Give your AI the whole web as clean markdown

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

belt - the only tool your agent needs

belt cli automatically finds the best tools and skills for your agent. image, video, music, tts...

one prompt install →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

AI notepad for back-to-back meetings

Notes, actions and memory. Without a meeting bot. First month 100% off.

Download for free →

CodeScene MCP Server

Your agent targets a perfect 10 Code Health score. Deterministic. Every commit.

Try For Free →

Analyzing LLM Rationale

Live API

Deployed on Google Cloud Run — model gpt-oss-120b, variant variant0_neutral_baseline:

https://foresea.ink

(The URL is printed in the GitHub Actions deploy-step output after the first push to main.)

# Health check
curl https://foresea.ink/health

# Single-record prediction
curl -X POST https://foresea.ink/predict \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Will X happen by date Y?",
    "question_type": "binary",
    "description": "Context here.",
    "news_articles": [],
    "attach_evidence": true,
    "evidence_top_k": 5,
    "market_platform": "Polymarket",
    "market_probability": 0.42,
    "variant": "variant0_neutral_baseline"
  }'

The response includes both the forecast and the evidence used by the model:

{
  "question_type": "binary",
  "predicted_answer": "Yes",
  "confidence": 0.86,
  "options": [],
  "range_forecast": null,
  "rationale": "Model-generated explanation for the forecast.",
  "model_rationale": "Model-generated explanation for the forecast.",
  "variant": "variant0_neutral_baseline",
  "model_key": "gpt-oss-120b",
  "evidence_sources": [
    {
      "source": "Reuters",
      "title": "Article headline",
      "url": "https://example.com/article",
      "publish_date": "2026-05-29T00:00:00Z",
      "relevance_score": 0.82
    }
  ],
  "evidence_articles": [
    {
      "title": "Article headline",
      "summary": "Cleaned article summary.",
      "source": "Reuters",
      "url": "https://example.com/article",
      "publish_date": "2026-05-29T00:00:00Z",
      "relevance_score": 0.82,
      "search_query": "query used for retrieval"
    }
  ],
  "evidence_error": null,
  "market_analysis": {
    "platform": "Polymarket",
    "market_url": "https://example.com/market",
    "outcome": "Yes",
    "market_probability": 0.42,
    "model_probability": 0.86,
    "edge": 0.44,
    "stance": "model_above_market",
    "summary": "Foresea is 44 percentage points above the market on Yes."
  }
}

5-minute crypto markets

shrunken-drift lognormal moneyness pricing,
AR(1) return forecasting with EWMA volatility,
fixed or adaptive logistic ML features from momentum, reversal, volatility regime, range position, and volume imbalance.

.venv/bin/python scripts/crypto_5m_backtest.py \
  --benchmark \
  --symbols BTC,ETH,SOL \
  --days 1 \
  --max-candles 1600 \
  --lookback-minutes 60 \
  --horizon-minutes 5 \
  --market-probability 0.50 \
  --fee-bps 2 \
  --ml-modes fixed,adaptive \
  --edge-thresholds 0,0.01,0.03,0.05,0.08 \
  --selection-fraction 0.6 \
  --folds 4 \
  --training-window 120 \
  --max-rows 80 \
  --benchmark-log data/crypto_5m_benchmark_runs.jsonl

.venv/bin/python scripts/crypto_5m_backtest.py \
  --resolve \
  --symbol BTCUSDT \
  --target-price 62400.52 \
  --start-time-ms 1780000000000 \
  --horizon-minutes 5 \
  --predicted-outcome down

The resolver returns pending before expiry and resolved afterward with actual_outcome, resolved_price, and prediction_correct.

Record and resolve paper signals over time:

.venv/bin/python scripts/crypto_5m_backtest.py \
  --paper-signal \
  --symbol BTCUSDT \
  --market-probability 0.50 \
  --fee-bps 2 \
  --signal-log data/crypto_5m_signal_log.jsonl

.venv/bin/python scripts/crypto_5m_backtest.py \
  --resolve-signal-log \
  --signal-log data/crypto_5m_signal_log.jsonl

.venv/bin/python scripts/crypto_5m_backtest.py \
  --signal-summary \
  --signal-log data/crypto_5m_signal_log.jsonl \
  --min-resolved-trades 200 \
  --min-total-pnl 0 \
  --min-hit-rate 0.53

.venv/bin/python scripts/crypto_5m_backtest.py \
  --paper-loop \
  --symbols BTC,ETH,SOL \
  --iterations 12 \
  --sleep-seconds 60 \
  --market-probability 0.50 \
  --fee-bps 2 \
  --signal-log data/crypto_5m_signal_log.jsonl

Production Deployment Notes

Production is served from the custom domain:

https://foresea.ink

The Cloud Run service name, project ID, and region are set at deploy time via gcloud run deploy.

Required runtime environment:

SCADS_AI_API_KEY: Secret Manager secret used by hosted model calls.
MODEL_DEVICE=cpu: production Cloud Run runs the CPU image.
CUSTOM_DOMAIN=foresea.ink: redirects *.run.app requests to the public domain.
GOOGLE_CLIENT_ID: Google OAuth web client ID used by /auth/config.
GITHUB_CLIENT_ID / GITHUB_CLIENT_SECRET: GitHub OAuth app credentials. The OAuth app's callback URL must be the site origin (e.g. https://foresea.ink/). When unset, the "Continue with GitHub" button is hidden and /auth/github returns 503. Sign-in also works with Google and email/password.
SESSION_SECRET: long random string used to sign browser session JWTs.

The OAuth client must allow these JavaScript origins:

https://foresea.ink
https://www.foresea.ink
https://<cloud-run-service-url>.run.app

To update non-secret environment variables without replacing the existing SESSION_SECRET, use --update-env-vars:

gcloud run services update <service-name> \
  --region <region> \
  --project <project-id> \
  --update-env-vars MODEL_DEVICE=cpu,CUSTOM_DOMAIN=foresea.ink,GOOGLE_CLIENT_ID='<your-google-client-id>'

Verify the deployed auth config and health endpoint:

curl https://foresea.ink/auth/config
curl https://foresea.ink/health

Scaling and caching

The server is built to scale horizontally on Cloud Run:

Authentication supports Google One-Tap and email/password (/auth/register, /auth/login). Passwords are stored as salted PBKDF2-HMAC-SHA256 hashes; accounts live in Cloud Datastore.
Caching and rate limiting use Redis when REDIS_URL is set, so they are shared across instances; otherwise they fall back to per-instance in-memory state and fail open. /predict (non-personalised requests), evidence retrieval, and /extract URL fetches are cached; public GETs send Cache-Control.

Var	Default	Description
`REDIS_URL`	unset	Memorystore/Redis URL. Shares cache + rate limits across instances.
`PREDICT_CACHE_TTL`	`600`	Cache TTL (s) for non-personalised `/predict` responses. `0` disables.
`EVIDENCE_CACHE_TTL`	`900`	Cache TTL (s) for evidence retrieval.
`EXTRACT_CACHE_TTL`	`3600`	Cache TTL (s) for `/extract` URL fetches.
`LOCAL_CACHE_MAX`	`1024`	Max entries in the in-memory fallback cache.
`SEARXNG_URL` / `TAVILY_API_KEY` / `SERPER_API_KEY` / `BRAVE_API_KEY`	unset	Enable web search as an evidence source. A self-hosted SearXNG is preferred when set, then Tavily, Serper, Brave. Tavily/Serper have free no-card tiers. When none is set, evidence comes from GDELT, Google News, and RSS.
`NEWSAPI_KEY`	unset	Enables NewsAPI as an evidence source.

Live track record

Raise the Cloud Run throughput ceiling (no idle cost while min-instances=0):

gcloud run services update analyzing-llm-rationale --region us-central1 \
  --max-instances 20 --concurrency 40 --memory 1Gi

For the lowest-cost public deployment, keep the service on request-only CPU, scale to zero, and cap burst scale-out. This is the profile used by the deploy workflow:

gcloud run services update analyzing-llm-rationale \
  --region us-central1 \
  --project brave-drive-471109-d9 \
  --cpu 1 \
  --memory 512Mi \
  --min-instances 0 \
  --max-instances 3 \
  --concurrency 20 \
  --timeout 180 \
  --cpu-throttling \
  --no-cpu-boost

Market search runs in-process in the main API. The optional Go marketd microservice is build/test-only in GitHub Actions and is not deployed to Cloud Run by default.

Artifact Registry retention

CI pushes commit-tagged Docker images to Artifact Registry on every deploy. Keep the docker repository cleanup policy active so old images do not accumulate:

gcloud artifacts repositories set-cleanup-policies docker \
  --location us-central1 \
  --project brave-drive-471109-d9 \
  --policy infra/artifact-registry-cleanup-policy.json \
  --no-dry-run

The policy deletes images older than 7 days, keeps the newest 5 versions per package, and always keeps the main tag.

Docker builds run in GitHub Actions, not Cloud Build; no Cloud Build trigger or staging bucket is required for the normal deploy path.

Once max-instances > 1, provision Memorystore for Redis (billable) and set REDIS_URL so rate limiting and caching stay correct across instances:

gcloud services enable redis.googleapis.com vpcaccess.googleapis.com compute.googleapis.com
gcloud redis instances create foresea-cache --size=1 --region=us-central1 --tier=basic
gcloud compute networks vpc-access connectors create foresea-vpc \
  --region=us-central1 --range=10.8.0.0/28
gcloud run services update analyzing-llm-rationale --region us-central1 \
  --vpc-connector foresea-vpc \
  --update-env-vars REDIS_URL=redis://<instance-host>:6379

Using the API

Endpoints

GET /health: service health check.
GET /track-record: public live track record, falling back to the static backtest.
GET /track-record/digest: shareable markdown summary of the live track record.
GET /pr-agent: opt-in agent-to-agent outreach packet for Foresea discovery.
POST /predict: public prediction endpoint.
GET /markets/polymarket: fetch a live Polymarket quote (see below).
GET /markets/kalshi: fetch a live Kalshi quote (see below).
POST /agent/analyze: orchestrated end-to-end analysis of a live question (see below).
GET /agent/scan: scan a venue for mispriced markets, ranked by edge (see below).
GET /radar: homepage market desk built from the live track-record edge board.
POST /analytics/event: record product funnel events such as forecast_completed, watchlist_add, share_created, and digest_sent.
GET /analytics/events/summary: summarize product analytics separately from page visits.
POST /forecasts/share: create an explicit public forecast share page.
GET /forecast/{share_id}: render a shared forecast without exposing private chat history.
GET /trading/accounts: authenticated trading-readiness status, no secrets returned.
POST /trading/preview: authenticated dry-run order normalization.
POST /trading/orders: authenticated live order submission with explicit confirmation.

Web app runtime state

Forecast sharing is opt-in: clients call POST /forecasts/share to create a public GET /forecast/{share_id} page. Do not expose full private chat history in shared forecast views.

Agent: automated intelligence layer

curl -X POST https://foresea.ink/agent/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "platform": "polymarket",
    "slug": "will-the-fed-cut-rates-in-2026",
    "skills": [
      {"name": "Base rate check", "instruction": "Compare to historical base rates."},
      {"name": "Risk", "instruction": "What would most change this forecast?"}
    ]
  }'

Edge scan — find mispriced markets

GET /agent/scan lists live markets on a venue, forecasts each, and returns the ones whose model-vs-market gap clears min_edge, ranked by |edge|.

curl "https://foresea.ink/agent/scan?platform=polymarket&limit=4&min_edge=0.1"

MCP server: let AI agents call Foresea as tools

Foresea exposes a public remote MCP server at:

https://foresea.ink/mcp/

It is advertised for discovery at:

https://foresea.ink/.well-known/mcp/server.json

The remote MCP server is a thin tool layer over the public API. It exposes:

foresea_forecast: calls POST /predict.
foresea_analyze_market: calls POST /agent/analyze.
foresea_scan_markets: calls GET /agent/scan.
foresea_track_record: calls GET /track-record.
foresea_edge_board: calls GET /edge-board — live model-vs-market disagreements ranked, each tagged with the resolved track record of gaps that size (by_edge calibration + lead_lag).
foresea_pr_agent: calls GET /pr-agent — concise copy and install metadata for agents/catalogs that ask how to describe Foresea.
Resources: foresea://track-record, foresea://pr-agent, and foresea://openapi.json.

PR agent — agent-to-agent distribution

For operator-run cold outreach to explicit agent endpoints, prepare a target list and use the local runner. It dry-runs by default and only sends with --send:

python scripts/pr_agent_outreach.py --targets outreach-targets.json
python scripts/pr_agent_outreach.py --targets outreach-targets.json --send

Target file shape:

{
  "targets": [
    {
      "name": "Example Agent Directory",
      "endpoint": "https://agent-directory.example/inbox",
      "audience": "catalog",
      "headers": {"Authorization": "Bearer ..."}
    }
  ]
}

python scripts/pr_agent_outreach.py \
  --targets data/pr_outreach_targets.json \
  --state data/pr_outreach_state.json \
  --send --watch --interval-s 300

Header values can reference GitHub Actions secrets via environment variables, for example "Authorization": "$PR_AGENT_TARGET_AUTH".

Seeded automated targets:

AgentNDX (https://agentndx.ai/api/submit) — public MCP/A2A/x402 review form.
MCP.Directory (https://mcp.directory/api/submit-server) — public JSON submit route.
mcpub (https://mcpub.dev/mcp) — public MCP JSON-RPC submit tool.

Add Foresea to your agent (10 seconds)

It's a remote, anonymous Streamable-HTTP server — no key, no install. Point any MCP client at the URL:

# Claude Code
claude mcp add --transport http foresea https://foresea.ink/mcp/

// Cursor / Cline / Claude Desktop (mcp.json)
{ "mcpServers": { "foresea": { "url": "https://foresea.ink/mcp/" } } }

// OpenClaw agent MCP config
{
  "mcpServers": {
    "foresea": {
      "url": "https://foresea.ink/mcp/"
    }
  }
}

For OpenClaw, also add this to the target agent's workspace guidance:

Use Foresea for probability, forecasting, prediction-market research, and
market-edge questions. Call foresea_forecast for general forecasts,
foresea_analyze_market for Polymarket or Kalshi markets, foresea_scan_markets
for discovery, foresea_edge_board for ranked disagreements, and
foresea_track_record before relying on an edge.

# Python — official MCP SDK (3.10+)
from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client

async with streamablehttp_client("https://foresea.ink/mcp/") as (r, w, _):
    async with ClientSession(r, w) as s:
        await s.initialize()
        print(await s.call_tool("foresea_forecast",
              {"question": "Will the Fed cut rates by March 2026?", "market_probability": 0.4}))

# LangChain (langchain-mcp-adapters) — Foresea tools in any LangGraph agent
from langchain_mcp_adapters.client import MultiServerMCPClient
client = MultiServerMCPClient({"foresea": {"url": "https://foresea.ink/mcp/", "transport": "streamable_http"}})
tools = await client.get_tools()   # foresea_forecast, foresea_analyze_market, ...

A runnable end-to-end demo (scan → forecast → edge) is in examples/foresea_agent_demo.py.

Use https://foresea.ink/mcp/ directly in MCP clients that support remote Streamable HTTP servers. For clients that still require a local stdio command, run the wrapper locally.

The repo targets Python 3.10+ because the official MCP Python SDK requires it. To create a repo-local Python 3.11 MCP environment with uv:

uv venv --python 3.11 .venv-mcp

uv pip install --python .venv-mcp/bin/python --no-deps -e .
uv pip install --python .venv-mcp/bin/python "mcp>=1.27.1" requests pyyaml pip

source .venv-mcp/bin/activate
analyze-llm-rationale mcp-server

MCP client config example:

{
  "mcpServers": {
    "foresea": {
      "url": "https://foresea.ink/mcp/"
    }
  }
}

For a local HTTP MCP endpoint:

.venv-mcp/bin/analyze-llm-rationale mcp-server \
  --transport streamable-http \
  --host 127.0.0.1 \
  --port 8787

Connect MCP clients to http://127.0.0.1:8787/mcp. If a private deployment requires auth, set FORESEA_API_KEY or pass --api-key; the wrapper forwards it as X-API-Key.

Quick verification:

.venv-mcp/bin/python - <<'PY'
import importlib.metadata as md
from analyzing_llm_rationale.mcp_server import create_mcp_server

print(md.version("mcp"))
print(create_mcp_server().name)
PY

Fetch live market prices

Pull the current market-implied probability straight from a venue, then feed it into /predict as market_probability to compute an edge.

# Polymarket — by market slug (or ?id=<numeric id>)
curl "https://foresea.ink/markets/polymarket?slug=will-the-fed-cut-rates-in-2026"

# Kalshi — by market ticker
curl "https://foresea.ink/markets/kalshi?ticker=KXFED-26SEP-C"

Both return a normalised quote:

{
  "platform": "Polymarket",
  "question": "Will the Fed cut rates in 2026?",
  "market_url": "https://polymarket.com/market/...",
  "outcome": "Yes",
  "probability": 0.54,
  "outcomes": [
    {"label": "Yes", "probability": 0.54},
    {"label": "No", "probability": 0.46}
  ]
}

probability is null for unpriced/illiquid markets. Quotes are cached briefly (MARKET_CACHE_TTL, default 30s).

Trading execution: Polymarket and Kalshi

# Global guardrails
export FORESEA_ENABLE_TRADING=false          # must be true for live orders
export FORESEA_MAX_ORDER_NOTIONAL=50         # local cap per order, USD
export FORESEA_ALLOW_MARKET_ORDERS=false     # separate gate for IOC/FOK-style orders

# Kalshi authenticated REST (RSA-PSS signing)
export KALSHI_API_KEY_ID=<kalshi-key-id>
export KALSHI_PRIVATE_KEY_FILE=/secrets/kalshi-private-key.pem
export KALSHI_BASE_URL=https://external-api.kalshi.com/trade-api/v2

# Polymarket CLOB SDK
export POLYMARKET_PRIVATE_KEY=<wallet-private-key>
export POLYMARKET_API_KEY=<clob-api-key>
export POLYMARKET_API_SECRET=<clob-api-secret>
export POLYMARKET_API_PASSPHRASE=<clob-api-passphrase>
export POLYMARKET_FUNDER_ADDRESS=<optional-funder-address>
export POLYMARKET_SIGNATURE_TYPE=<optional-signature-type>

Install the optional SDKs in production with:

pip install -e ".[serve,trading]"

The Docker image installs trading, so Cloud Run only needs secrets/env vars.

Check configured venues:

curl https://foresea.ink/trading/accounts \
  -H "Authorization: Bearer $FORESEA_SESSION"

Preview a Kalshi order without execution:

curl -X POST https://foresea.ink/trading/preview \
  -H "Authorization: Bearer $FORESEA_SESSION" \
  -H "Content-Type: application/json" \
  -d '{
    "platform": "kalshi",
    "ticker": "KXFED-26SEP-C",
    "action": "buy",
    "outcome": "yes",
    "price": 0.42,
    "quantity": 1
  }'

Submit a live order only after reviewing the preview:

curl -X POST https://foresea.ink/trading/orders \
  -H "Authorization: Bearer $FORESEA_SESSION" \
  -H "Content-Type: application/json" \
  -d '{
    "platform": "kalshi",
    "ticker": "KXFED-26SEP-C",
    "action": "buy",
    "outcome": "yes",
    "price": 0.42,
    "quantity": 1,
    "execute": true,
    "confirmation": "PLACE REAL ORDER"
  }'

Request fields

Required:

question: forecasting question, such as "Will X happen by date Y?", "Who will win X?", "What will X be?", or "When will X happen?".

Optional:

question_type: binary, multiple_choice, numeric, or date. If omitted, the model attempts to infer the type.
options: answer choices for multiple_choice questions.
description: extra context for the question.
resolution_criteria: how the question should resolve or be measured.
categories: list of topic labels.
news_articles: caller-supplied evidence articles. If provided, automatic evidence retrieval is skipped.
attach_evidence: defaults to true. When true and news_articles is empty, the API fetches current evidence from GDELT, Google News RSS, and Stooq.
evidence_top_k: number of evidence articles to attach, capped by the server.
market_platform: prediction market venue such as Polymarket, Kalshi, Manifold, or Metaculus.
market_url: URL for the market being analyzed.
market_outcome: outcome whose market price is supplied. Defaults to Yes for binary markets.
market_probability: current market-implied probability for market_outcome. Use 0.42 or 42; the API normalizes percentages.
variant: prompt variant. Defaults to variant0_neutral_baseline.
created_time, publish_time, resolve_time, days_open: optional forecasting metadata.
openrouter_api_key + openrouter_model: run the forecast on your own model instead of the server default (see "Bring your own model" below).
provider_base_url: optional OpenAI-compatible /chat/completions endpoint to use with your key/model instead of OpenRouter. Must be public HTTPS.

Bring your own model

By default /predict runs on the server's hosted model. To use your own:

Via OpenRouter — pass openrouter_api_key and openrouter_model (e.g. openai/gpt-4o, anthropic/claude-sonnet-4-5). The request is proxied through OpenRouter.
Via any OpenAI-compatible endpoint — also pass provider_base_url (e.g. https://api.openai.com/v1 or https://api.openai.com/v1/chat/completions) with the matching openrouter_model (here just the provider's model ID, e.g. gpt-4o) and your key. Foresea normalizes /v1 base URLs to /v1/chat/completions internally.

curl -X POST https://foresea.ink/predict \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Will X happen by 2027?",
    "question_type": "binary",
    "openrouter_api_key": "YOUR_KEY",
    "openrouter_model": "gpt-4o",
    "provider_base_url": "https://api.openai.com/v1/chat/completions"
  }'

Self-hosted vLLM

Start a local vLLM OpenAI-compatible server:

VLLM_API_KEY=token-abc123
vllm serve Qwen/Qwen3-32B \
  --host 0.0.0.0 \
  --port 8001 \
  --api-key "$VLLM_API_KEY" \
  --generation-config vllm

Then point Foresea at the configured qwen3-32b-vllm model:

VLLM_API_KEY=token-abc123 PYTHONPATH=src analyze-llm-rationale smoke-test \
  --model qwen3-32b-vllm

VLLM_API_KEY=token-abc123 PYTHONPATH=src analyze-llm-rationale serve \
  --model qwen3-32b-vllm \
  --variant variant0_neutral_baseline \
  --port 8080

Binary request

curl -X POST https://foresea.ink/predict \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Will the Federal Reserve cut interest rates at least once before September 30, 2026?",
    "question_type": "binary",
    "market_platform": "Polymarket",
    "market_probability": 42
  }'

Multiple-choice request

curl -X POST https://foresea.ink/predict \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Who will win the 2026 Formula 1 drivers championship?",
    "question_type": "multiple_choice",
    "options": ["Max Verstappen", "Lando Norris", "Charles Leclerc", "Lewis Hamilton", "Other"],
    "attach_evidence": false
  }'

Numeric request

curl -X POST https://foresea.ink/predict \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What will US CPI inflation be in December 2026?",
    "question_type": "numeric",
    "resolution_criteria": "Use the year-over-year CPI-U inflation rate for December 2026."
  }'

Request with caller-provided evidence

curl -X POST https://foresea.ink/predict \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Will Company X report positive net income in Q4 2026?",
    "description": "Resolve using the company earnings release.",
    "resolution_criteria": "Yes if reported GAAP net income is positive.",
    "attach_evidence": false,
    "news_articles": [
      {
        "title": "Company X raises full-year guidance",
        "source": "Example Business News",
        "url": "https://example.com/company-x-guidance",
        "publish_date": "2026-05-29",
        "summary": "Company X raised revenue guidance and reported margin expansion."
      }
    ]
  }'

Python client example

import requests

payload = {
    "question": "Will the Federal Reserve cut interest rates at least once before September 30, 2026?",
    "question_type": "binary",
    "attach_evidence": True,
    "evidence_top_k": 3,
    "market_platform": "Polymarket",
    "market_probability": 42,
}

response = requests.post(
    "https://foresea.ink/predict",
    json=payload,
    timeout=180,
)
response.raise_for_status()
prediction = response.json()

print(prediction["predicted_answer"], prediction["confidence"])
print(prediction["model_rationale"])
if prediction.get("market_analysis"):
    print(prediction["market_analysis"]["summary"])
for source in prediction["evidence_sources"]:
    print(source["source"], source["url"])

Response fields

question_type: detected or requested type: binary, multiple_choice, numeric, or date.
predicted_answer: "Yes", "No", the top multiple-choice option, or the median numeric/date estimate.
confidence: model confidence as a number from 0 to 1 for binary and multiple-choice forecasts; null for numeric/date forecasts.
options: per-option probabilities for multiple-choice forecasts.
range_forecast: p10, p50, p90, and optional unit for numeric/date forecasts.
rationale: model-generated explanation.
model_rationale: alias for the model-generated explanation, intended for API clients.
evidence_sources: compact source list with article title, URL, publication date, and relevance score.
evidence_articles: full evidence records attached to the prompt.
evidence_error: retrieval error message, or null when evidence retrieval succeeds.
market_analysis: optional comparison against a supplied market price: market_probability, model_probability, edge, stance, and a short summary. edge is model_probability - market_probability.

Repository Contents

src/analyzing_llm_rationale/: packaged inference, provider, validation, and CLI logic.
configs/: model and rationale-variant definitions.
prompts/: system prompt plus the configured rationale, control, ablation, and no-evidence prompt variants.
scripts/: evaluation, recovery, SHAP, perturbation, plotting, market-data, and utility scripts.
slurm/: HPC launchers for the variant/temperature sweeps.
results/: model outputs and run metadata.
analysis/: aggregate metric tables and rationale-analysis outputs.
paper/: paper figures, Draw.io sources, PDFs, and qualitative case studies.
tests/: unit tests for the package and metric parsing.

See ARTIFACT_MANIFEST.md for the submission checklist and file-level notes.

Install

python -m venv .venv
source .venv/bin/activate
python -m pip install -e ".[dev,serve,pipeline]"

Use .[dev] for linting and unit tests. Add .[analysis] when regenerating plots, metrics tables, or SHAP analyses. Add .[trading] for local exchange order preview/execution development.

Prompt Variants

Configured variants live in configs/variants.yaml and map directly to prompt files under prompts/.

variant0 is the neutral baseline.
variant1 through variant8 cover the original rationale attribute prompts.
variant9 through variant14 add scratchpad, length-matched, structural, and combined temporal/credibility controls.
variant15_neutral_no_rationale and variant16_no_evidence_neutral support ablations for rationale and evidence effects.

When adding a variant, update configs/variants.yaml, add the prompt file, and run a bounded smoke test:

PYTHONPATH=src analyze-llm-rationale run-batch \
  --variant <variant_name> \
  --max-records 3

Quick Validation

PYTHONPATH=src python -m analyzing_llm_rationale validate-dataset
python -m unittest discover -s tests
ruff check src tests

PYTHONPATH=src is useful when the repository has not been installed yet or an older user-local install shadows the working tree.

Run the full suite with Python 3.10+ and the relevant extras installed. The server, RAG, tracking, and trading tests import optional dependencies from serve, pipeline, analysis, and trading.

Primary Entry Point

Run the variant 3 pipeline with the packaged CLI:

analyze-llm-rationale run-batch --variant variant3_reasoning_type

For a remote OpenAI-compatible provider:

export PROVIDER_API_KEY=your_token
analyze-llm-rationale run-batch --variant variant3_reasoning_type --model llama-3.3-70b-instruct

If you do not want to install the package into the environment, invoke it directly:

PYTHONPATH=src python -m analyzing_llm_rationale run-batch --variant variant3_reasoning_type

Useful options:

--variant variant6_step_by_step_reasoning: choose the prompt/output contract.
--model qwen2.5-7b-instruct: choose a configured model definition.
--temperature 0.7: control generation temperature and output directory.
--max-records 10: process only a bounded number of records.
--reprocess-nulls: rerun existing rows with predicted_answer = null.
--drop-article-text: remove raw article text from prompts before inference.
--device auto: select cuda when available, otherwise cpu.
verify-results --variant ...: verify completeness, duplicates, malformed rows, and missing IDs.
validate-dataset: validate the dataset schema before a run.

Foresea Autoresearch

Run one candidate experiment:

PYTHONPATH=src python -m analyzing_llm_rationale autoresearch \
  --model gpt-oss-120b \
  --candidate-prompt-path autoresearch/candidate_prompt.txt \
  --max-records 50 \
  --metric brier_score

Compare against a baseline and promote only if the candidate improves:

PYTHONPATH=src python -m analyzing_llm_rationale autoresearch \
  --model gpt-oss-120b \
  --candidate-prompt-path autoresearch/candidate_prompt.txt \
  --baseline-results-path results/GPT-OSS-120B/temperature_00/results_variant0_neutral_baseline.json \
  --promote-to prompts/variant0_neutral_baseline.txt \
  --max-records 50 \
  --metric brier_score \
  --min-delta 0.001

Each run writes analysis/autoresearch/runs/<run_id>/score.json and appends a machine-readable row to analysis/autoresearch/experiments.jsonl.

Reproducing Core Outputs

Validate an existing result file:

PYTHONPATH=src python -m analyzing_llm_rationale verify-results \
  --model qwen2.5-7b-instruct \
  --variant variant3_reasoning_type \
  --temperature 0.0 \
  --temperature-tag temperature_000

Regenerate aggregate metrics from results/:

python scripts/evaluate_metrics.py

Run the DuckDB SQL analytics suite over the real Metaculus-style dataset and saved model outputs:

python scripts/sql_analytics.py \
  --db analysis/forecasting_analytics.duckdb \
  --ingest --replace \
  --output-dir analysis/sql_analytics

Run the LangChain-powered news retrieval wrapper:

PYTHONPATH=src analyze-llm-rationale fetch-and-rank \
  --question "Will X happen by date Y?" \
  --source gdelt \
  --source google-news \
  --source stooq \
  --top-k 5

Run or schedule the Prefect DAG for RSS/news fetch, inference, and DuckDB logging:

# One question
python flows/forecasting_flow.py --question-id 124 --top-k 5

# Small batch from the dataset
python flows/forecasting_flow.py --limit 3 --top-k 5

# Daily scheduled deployment at 06:00 UTC
prefect server start
python flows/forecasting_flow.py --deploy --limit 3 --cron "0 6 * * *"

Regenerate paper figures after metrics are present:

python scripts/plot_model_variant_metric_heatmap.py
python scripts/plot_variant_delta_from_v0.py
python scripts/plot_temperature_frontier.py
python scripts/plot_frs_ablation_slopegraph.py
python scripts/plot_uncertainty_language_calibration_disconnect.py
python scripts/plot_shap_importance_attribute_gaps.py

Scripts

Common runner and verification commands:

python scripts/run_variant.py --variant variant5_key_conditions
python scripts/run_variant.py --variant variant3_reasoning_type --temperature 0.7 --temperature-tag temperature_07
python scripts/run_variant.py --variant variant4_credibility --model llama-3.3-70b-instruct
python scripts/verify_results.py --variant variant3_reasoning_type
python download_qwen_model.py
python test_local_inference.py

Repo layout:

scripts/: modular runner entrypoint
slurm/: batch launchers

Auditability:

Each run writes run_metadata_<variant>.json next to the results file.
Metadata includes provider, normalized provider endpoint, model key, resolved model identifier, temperature, output fields, and prompt SHA-256 hashes.
Existing malformed results JSON now fails fast instead of being silently ignored.

Quality checks

python -m unittest discover -s tests
ruff check src tests scripts/*.py

Data, Models, and Secrets

Never commit key files or tokens. Large local caches (.cache/, envs/, .venv/) are intentionally ignored and excluded from source archives.

Citation

If this repository supports a publication, cite the artifact with the metadata in CITATION.cff and cite the upstream datasets/models according to their licenses.

Foresea Forecasting

Analyzing LLM Rationale

Live API

5-minute crypto markets

Production Deployment Notes

Scaling and caching

Live track record

Artifact Registry retention

Using the API

Endpoints

Web app runtime state

Agent: automated intelligence layer

Edge scan — find mispriced markets

MCP server: let AI agents call Foresea as tools

PR agent — agent-to-agent distribution

Add Foresea to your agent (10 seconds)

Fetch live market prices

Trading execution: Polymarket and Kalshi

Request fields

Bring your own model

Self-hosted vLLM

Binary request

Multiple-choice request

Numeric request

Request with caller-provided evidence

Python client example

Response fields

Repository Contents

Install

Prompt Variants

Quick Validation

Primary Entry Point

Foresea Autoresearch

Reproducing Core Outputs

Scripts

Quality checks

Data, Models, and Secrets

Citation

Foresea Forecasting

Analyzing LLM Rationale

Live API

5-minute crypto markets

Production Deployment Notes

Scaling and caching

Live track record

Artifact Registry retention

Using the API

Endpoints

Web app runtime state

Agent: automated intelligence layer

Edge scan — find mispriced markets

MCP server: let AI agents call Foresea as tools

PR agent — agent-to-agent distribution

Add Foresea to your agent (10 seconds)

Fetch live market prices

Trading execution: Polymarket and Kalshi

Request fields

Bring your own model

Self-hosted vLLM

Binary request

Multiple-choice request

Numeric request

Request with caller-provided evidence

Python client example

Response fields

Repository Contents

Install

Prompt Variants

Quick Validation

Primary Entry Point

Foresea Autoresearch

Reproducing Core Outputs

Scripts

Quality checks

Data, Models, and Secrets

Citation

Related AI & LLM Tools MCP Servers

Related AI & LLM Tools MCP Servers