If you're working with genomic intervals and machine learning, this handles the messy middle layer between BED files and actual models. It gives you Region2Vec for unsupervised region embeddings, BEDspace for joint region and metadata embeddings, and scEmbed specifically for single-cell ATAC-seq data. The universe building tools are honestly the most useful part since tokenization quality makes or breaks everything downstream. It's built on word2vec concepts applied to genomic regions, which sounds odd but works well in practice. Includes CLI tools and proper evaluation metrics. Ships with multiple consensus peak methods (coverage cutoff, HMM, maximum likelihood) that give you statistical rigor when combining datasets.
npx skills add https://github.com/k-dense-ai/scientific-agent-skills --skill geniml