This is the standard toolkit for tracking machine learning experiments and keeping your sanity when you're running dozens of training runs. It automatically logs metrics, hyperparameters, and system stats to a web dashboard where you can compare runs side by side. The sweep functionality is legitimately useful for hyperparameter optimization, supporting grid, random, and Bayesian search strategies. You can also version datasets and models as artifacts with full lineage tracking. The integration code is minimal, usually just a wandb.init() and wandb.log() calls in your training loop. Works seamlessly with PyTorch, TensorFlow, and HuggingFace. If you've ever lost track of which learning rate produced your best model, this solves that problem.
npx skills add https://github.com/orchestra-research/ai-research-skills --skill weights-and-biases