LangSmith gives you proper observability for LLM applications with tracing, evaluation, and monitoring in one platform. Wrap your OpenAI or Anthropic calls with @traceable and you get full execution traces showing inputs, outputs, latency, and costs. The evaluation system lets you build test datasets from production traces and run systematic comparisons with built-in or custom evaluators, which is genuinely useful for catching regressions before deployment. Works seamlessly with LangChain if you're using it, but the wrappers and client API work fine standalone. The tracing context and sampling controls are thoughtfully designed for production use. If you're debugging why your agent is failing or need to prove your prompt changes actually improved accuracy, this is the tool built specifically for that job.
npx skills add https://github.com/orchestra-research/ai-research-skills --skill langsmith-observability