Pulls aggregated metrics from MLflow tracking servers so you can analyze token usage, latency, and quality scores without writing custom queries. You get flexible bucketing by time or dimensions like trace name and status, plus percentiles for understanding distribution. The examples show real use cases like hourly token trends over 24 hours or P95 latency grouped by trace. It's a straightforward wrapper around MLflow's metrics API that saves you from dealing with the raw endpoints. Most useful when you're running LLM experiments in MLflow and need quick cost or performance insights without building dashboards.
npx skills add https://github.com/mlflow/skills --skill querying-mlflow-metrics