single-cell preprocess documentation
TLDR
sc-preprocess is a Snakemake pipeline for single-cell preprocessing: Cell Ranger (GEX, ATAC, ARC), per-capture object creation (AnnData/MuData), demultiplexing, doublet detection, and cell type annotation — all from a single config file.
Description
Reproducibility and scalability are essential components of contemporary FAIR (Findable, Accessible, Interoperable, and Reproducible) single-cell ‘omics data analysis, yet preprocessing steps lack workflow infrastructure needed to standardize large-scale and collaborative studies. 10x Genomics’ Cell Ranger is critical software for preprocessing raw single-cell ‘omics modalities, but executing it reproducibly across hundreds or thousands of samples remains cumbersome, error-prone, and computationally inefficient. We present sc-preprocess, a Snakemake workflow wrapper that automates, scales, and standardizes Cell Ranger preprocessing for Gene Expression (GEX), Chromatin accessibility (ATAC), and multiome (ARC) data. The workflow supports flexible input specifications, integrated logging, and portable configuration files, making it straightforward to deploy in high-performance computing or cloud environments. By combining Snakemake’s reproducible workflow management with Cell Ranger, sc-preprocess improves reproducibility, reduces user error, and accelerates downstream single-cell ‘omics.