This is a PySpark migration and optimization toolkit built on SQLGlot. It converts SQL between dialects (PostgreSQL, Oracle, Redshift, MySQL, Snowflake) and generates PySpark DataFrame API code from SQL queries. The AWS Glue integration generates complete job templates, handles DynamicFrame conversions, and analyzes S3 partitioning strategies. You also get code review tools that scan existing PySpark for performance issues, suggest join strategies, and detect duplication across hundreds of files with concurrent batch processing. Reach for this when migrating legacy SQL workloads to Spark or when you need to generate Glue jobs without writing boilerplate. It won't handle recursive CTEs natively but provides Spark SQL equivalents and guidance for edge cases.
claude mcp add --transport stdio annasmazhar-pyspark_mcp uvx pyspark-mcp