Spark Optimization

Editor's Note

If your Spark jobs are crawling or crashing, this gives you the exact patterns to fix them. It covers the critical stuff: calculating optimal partition sizes, implementing broadcast joins versus sort-merge joins, setting up proper caching with storage levels, and tuning executor memory configurations. The salting technique for handling data skew is especially useful when you have hot keys killing performance. It includes actual configuration values and memory breakdowns, not just theory. Most helpful when you're dealing with production workloads that need to scale beyond toy datasets.

Install

npx skills add https://github.com/wshobson/agents --skill spark-optimization

Votes

Installs4.4k

GitHub Stars33.7k

Spark Optimization

Install

Spark Optimization

Install

Related Backend & APIs Skills

Related Backend & APIs Skills