This is your toolkit for building evaluation datasets in LangSmith without writing a bunch of SDK boilerplate. It wraps the langsmith CLI so you can export traces, shape them into datasets (final response, single step, trajectory, or RAG formats), and upload them for testing. The workflow is straightforward: pull traces from a project, transform the JSON into examples with inputs and outputs, then push it back up. Handy if you're iterating on agent behavior and need regression tests or comparative evals. The CLI handles confirmations on destructive ops, which is nice when you're moving fast and don't want to accidentally nuke a dataset.
npx skills add https://github.com/langchain-ai/langsmith-skills --skill langsmith-dataset