This is a solid reference for tuning Spark jobs when they're running slow or eating memory. You get concrete patterns for partitioning strategies, join optimization (broadcast, bucketed, skew handling), caching decisions, and memory configuration with actual code examples in PySpark. It covers the execution model, shows you how to calculate optimal partition sizes, and includes adaptive query execution settings. The implementation playbook apparently has more detailed examples if you need them. It's community sourced so treat it as a starting point rather than gospel, but the patterns here address the common bottlenecks like shuffle overhead, data skew, and GC pressure that actually slow down production Spark pipelines.
npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill spark-optimization