These are the libraries you reach for when you have tabular data and need accurate predictions. XGBoost and LightGBM dominate Kaggle competitions and production ML systems because they handle the messy realities of real data: missing values, imbalanced classes, categorical features, and they give you interpretable feature importances out of the box. The key difference is speed versus dataset size. XGBoost is slower but often slightly more accurate on smaller datasets under 100k rows, while LightGBM's histogram-based approach makes it significantly faster on millions of rows. Both include built-in regularization and early stopping, which you should always use. If you're doing serious work with CSVs, databases, or any structured data, you need at least one of these in your toolkit.
npx skills add https://github.com/tondevrel/scientific-agent-skills --skill xgboost-lightgbm