Pandas Best Practices

503 installs128 stars

Summary

A solid reference for writing cleaner pandas code. It covers the practical stuff you actually need: proper indexing with loc and iloc instead of chained operations, when to use vectorization over apply, memory optimization with categorical types, and method chaining patterns that stay readable. The groupby and aggregation sections are especially useful if you're tired of googling named aggregation syntax. It won't teach you pandas from scratch, but if you already know the basics and want to stop writing slow, fragile DataFrame code, this gives you specific patterns to follow. Think of it as the style guide your data team should have written six months ago.

Install to Claude Code

npx -y skills add mindrally/skills --skill pandas-best-practices --agent claude-code

Installs into .claude/skills of the current project.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

Put your SEO on autopilot

An agent that runs the SEO playbooks that move rankings and ships PRs you control.

Get founding access →

Vibe Prospecting MCP

Connect Claude to +800M contacts, +150M companies. Find & Enrich leads in chat.

Try For Free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

Put your SEO on autopilot

An agent that runs the SEO playbooks that move rankings and ships PRs you control.

Get founding access →

Vibe Prospecting MCP

Connect Claude to +800M contacts, +150M companies. Find & Enrich leads in chat.

Try For Free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Files

SKILL.mdView on GitHub

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

Put your SEO on autopilot

An agent that runs the SEO playbooks that move rankings and ships PRs you control.

Get founding access →

Vibe Prospecting MCP

Connect Claude to +800M contacts, +150M companies. Find & Enrich leads in chat.

Try For Free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Pandas Best Practices

Expert guidelines for Pandas development, focusing on data manipulation, analysis, and efficient DataFrame operations.

Code Style and Structure

Write concise, technical responses with accurate Python examples
Prioritize reproducibility in data analysis workflows
Use functional programming; avoid unnecessary classes
Prefer vectorized operations over explicit loops
Use descriptive variable names reflecting data content
Follow PEP 8 style guidelines

DataFrame Creation and I/O

Use pd.read_csv(), pd.read_excel(), pd.read_json() with appropriate parameters
Specify dtype parameter to ensure correct data types on load
Use parse_dates for automatic datetime parsing
Set index_col when the data has a natural index column
Use chunksize for reading large files incrementally

Data Selection

Use .loc[] for label-based indexing
Use .iloc[] for integer position-based indexing
Avoid chained indexing (e.g., df['col'][0]) - use .loc or .iloc instead
Use boolean indexing for conditional selection: df[df['col'] > value]
Use .query() method for complex filtering conditions

Method Chaining

Prefer method chaining for data transformations when possible
Use .pipe() for applying custom functions in a chain
Chain operations like .assign(), .query(), .groupby(), .agg()
Keep chains readable by breaking across multiple lines

Data Cleaning and Validation

Missing Data

Check for missing data with .isna() and .info()
Handle missing data appropriately: .fillna(), .dropna(), or imputation
Use pd.NA for nullable integer and boolean types
Document decisions about missing data handling

Data Quality Checks

Implement data quality checks at the beginning of analysis
Validate data types with .dtypes and convert as needed
Check for duplicates with .duplicated() and handle appropriately
Use .describe() for quick statistical overview

Type Conversion

Use .astype() for explicit type conversion
Use pd.to_datetime() for date parsing
Use pd.to_numeric() with errors='coerce' for safe numeric conversion
Utilize categorical data types for low-cardinality string columns

Grouping and Aggregation

GroupBy Operations

Use .groupby() for efficient aggregation operations
Specify aggregation functions with .agg() for multiple operations
Use named aggregation for clearer output column names
Consider .transform() for broadcasting results back to original shape

Pivot Tables and Reshaping

Use .pivot_table() for multi-dimensional aggregation
Use .melt() to convert wide to long format
Use .pivot() to convert long to wide format
Use .stack() and .unstack() for hierarchical index manipulation

Performance Optimization

Memory Efficiency

Use categorical data types for low-cardinality strings
Downcast numeric types when appropriate
Use pd.eval() and .eval() for large expression evaluation

Computation Speed

Use vectorized operations instead of .apply() with row-wise functions
Prefer built-in aggregation functions over custom ones
Use .values or .to_numpy() for NumPy operations when faster

Avoiding Common Pitfalls

Avoid iterating with .iterrows() - use vectorized operations
Don't modify DataFrames while iterating
Be aware of SettingWithCopyWarning - use .copy() when needed
Avoid growing DataFrames row by row - collect in list and create once

Time Series Operations

Use DatetimeIndex for time series data
Leverage .resample() for time-based aggregation
Use .shift() and .diff() for lag operations
Use .rolling() and .expanding() for window calculations

Merging and Joining

Use .merge() for SQL-style joins
Specify how parameter: 'inner', 'outer', 'left', 'right'
Use validate parameter to check join cardinality
Use .concat() for stacking DataFrames

Key Conventions

Import as import pandas as pd
Use snake_case for column names when possible
Document data sources and transformations
Keep notebooks reproducible with clear cell execution order

Pandas Best Practices

Install to Claude Code

Pandas Best Practices

Install to Claude Code

Pandas Best Practices

Code Style and Structure

DataFrame Creation and I/O

Data Selection

Method Chaining

Data Cleaning and Validation

Missing Data

Data Quality Checks

Type Conversion

Grouping and Aggregation

GroupBy Operations

Pivot Tables and Reshaping

Performance Optimization

Memory Efficiency

Computation Speed

Avoiding Common Pitfalls

Time Series Operations

Merging and Joining

Key Conventions

Recommended

Pandas Best Practices

Code Style and Structure

DataFrame Creation and I/O

Data Selection

Method Chaining

Data Cleaning and Validation

Missing Data

Data Quality Checks

Type Conversion

Grouping and Aggregation

GroupBy Operations

Pivot Tables and Reshaping

Performance Optimization

Memory Efficiency

Computation Speed

Avoiding Common Pitfalls

Time Series Operations

Merging and Joining

Key Conventions

Recommended