This gives you programmatic access to 61+ million standardized single cells from the CELLxGENE Census, the largest curated single-cell atlas available. You can query expression data by cell type, tissue, or disease across thousands of datasets without downloading everything. The API handles both small queries that fit in memory with get_anndata() and massive out-of-core processing with iterative batching. It integrates cleanly with scanpy for standard analysis workflows and includes PyTorch dataloaders for training models on cell data. Worth noting that you should always filter for is_primary_data to avoid counting duplicate cells. If you're analyzing your own data rather than querying reference atlases, stick with scanpy or scvi-tools instead.
npx skills add https://github.com/k-dense-ai/scientific-agent-skills --skill cellxgene-census