A solid reference for writing cleaner pandas code. It covers the practical stuff you actually need: proper indexing with loc and iloc instead of chained operations, when to use vectorization over apply, memory optimization with categorical types, and method chaining patterns that stay readable. The groupby and aggregation sections are especially useful if you're tired of googling named aggregation syntax. It won't teach you pandas from scratch, but if you already know the basics and want to stop writing slow, fragile DataFrame code, this gives you specific patterns to follow. Think of it as the style guide your data team should have written six months ago.
npx -y skills add mindrally/skills --skill pandas-best-practices --agent claude-codeInstalls into .claude/skills of the current project.
Expert guidelines for Pandas development, focusing on data manipulation, analysis, and efficient DataFrame operations.
pd.read_csv(), pd.read_excel(), pd.read_json() with appropriate parametersdtype parameter to ensure correct data types on loadparse_dates for automatic datetime parsingindex_col when the data has a natural index columnchunksize for reading large files incrementally.loc[] for label-based indexing.iloc[] for integer position-based indexingdf['col'][0]) - use .loc or .iloc insteaddf[df['col'] > value].query() method for complex filtering conditions.pipe() for applying custom functions in a chain.assign(), .query(), .groupby(), .agg().isna() and .info().fillna(), .dropna(), or imputationpd.NA for nullable integer and boolean types.dtypes and convert as needed.duplicated() and handle appropriately.describe() for quick statistical overview.astype() for explicit type conversionpd.to_datetime() for date parsingpd.to_numeric() with errors='coerce' for safe numeric conversion.groupby() for efficient aggregation operations.agg() for multiple operations.transform() for broadcasting results back to original shape.pivot_table() for multi-dimensional aggregation.melt() to convert wide to long format.pivot() to convert long to wide format.stack() and .unstack() for hierarchical index manipulationpd.eval() and .eval() for large expression evaluation.apply() with row-wise functions.values or .to_numpy() for NumPy operations when faster.iterrows() - use vectorized operations.copy() when neededDatetimeIndex for time series data.resample() for time-based aggregation.shift() and .diff() for lag operations.rolling() and .expanding() for window calculations.merge() for SQL-style joinshow parameter: 'inner', 'outer', 'left', 'right'validate parameter to check join cardinality.concat() for stacking DataFramesimport pandas as pdsnake_case for column names when possiblesickn33/antigravity-awesome-skills