PIPSCOUT ( PIPseeker-based Single-Cell Output & UMAP Typing ) is a Python-based tool (notbeook) designed to extract and reformat outputs from the proprietary PIPseeker pipeline for flexible downstream analysis. While PIPseeker efficiently processes single-cell RNA-seq data from raw FASTQ files through alignment, quantification, clustering, and cell-type annotation, its outputs can be challenging to decipher.
PIPSCOUT bridges this gap by:
-
Extracting raw expression matrices, barcodes, clustering results, and annotations.
-
Converting them into open formats compatible with tools like Scanpy and Seurat.
-
Enabling custom dimensionality reduction (e.g., UMAP), re-clustering, and visualization.
PIPSCOUT is designed not as an endpoint, but as an entry point to more complex analytical workflows, including integration with broader toolkits for regulatory sequence inference and transciptional program discovery.
- Python 3.x
- Python packages:
numpy,pandas,matplotlib,umap-learn - PIPseeker v2.1.4 installed at
/bioinformatics/pipseeker-v2.1.4-linux/ - A genome annotation file such as
human-pbmc-v4.csvin the PIPseeker directory
Edit the configuration block in the script to match your dataset setup:
sample_name = 'sample2' # Name of your sample (FASTQ prefix)
sensitivity = '3' # Sensitivity level (1–5)
genome = 'human-pbmc-v4' # Genome annotation file name (no .csv)
rerun = True # Set False if PIPseeker outputs already existRuns PIPseeker to align and quantify the single-cell RNA-seq dataset and then decompresses the outputs needed for downstream analysis.
- Command-line execution via
os.system:pipseeker fullrun with paths to FASTQ and reference index- Unzips
matrix.mtx.gz,features.tsv.gz, andbarcodes.tsv.gz
Output folder:
../results/<sample_name>_results/filtered_matrix/sensitivity_<sensitivity>/
- Parses
features.tsvto extract gene names. - Parses
barcodes.tsvto extract cell barcodes. - Resulting Python lists:
genesandbarcodes.
Used for mapping indices to names and building matrix headers.
- Loads the
matrix.mtxfile (Matrix Market format) usingpandas.read_csv. - Initializes a dense NumPy matrix
Mof shape(num_genes, num_cells). - Populates
Musing the row, column, and value triplets in thematrix.mtx.
Result: A full gene × cell expression matrix (M) ready for downstream analysis.
- Transposes
Mso that each row represents a single cell’s expression profile. - Uses
umap.UMAP(n_components=12)to embed the cells into a 12-dimensional latent space. - Produces
embedding— a 2D array of UMAP coordinates per cell.
UMAP enables visualization and clustering of cells based on gene expression similarity.
- Loads cluster assignments from
clusters.csv(one cluster label per cell). - Loads cell-type labels from
graph_clusters.csv, mapping cluster ID to known cell type. - Plots a 2D UMAP projection using
matplotlib, coloring points by cell type. - Adds a legend identifying each cell type by color.
Final plot: A UMAP scatterplot with labeled cell clusters.
- Combines barcodes with their corresponding cell types (e.g.,
AACTT..._CD4_T_Cell). - Writes a new CSV file:
sample2_gene_matrix.csv- First row: Annotated barcodes as column headers
- Each subsequent row: A gene and its expression across cells
Output file is formatted for downstream analysis and includes both gene names and cell-type information.
matrix.mtx,features.tsv,barcodes.tsv— Raw PIPseeker outputs (decompressed)clusters.csv,graph_clusters.csv— Clustering and cell-type mappingssample2_gene_matrix.csv— Final expression matrix with cell-type-annotated barcodes- UMAP plot — Visual summary of cell populations
PIPSCOUT is ideal for:
- Visualizing and interpreting single-cell RNA-seq data
- Linking clusters to known cell types
- Exporting labeled expression matrices for ML or statistical modeling
PIPSCOUT outputs annotated expression matrices, making gene-level visualizations like violin plots trivial.
The example notebook PIPSCOUT_violin.ipynb notebook shows how to:
- Identify top-expressed genes in major cell types
- Visualize their distributions across clusters
|
|
|
@misc{talbert2025pipscout,
author = {Talbert, E. and Saisan, P.},
title = {PIPSCOUT: PIPseeker-based Single-Cell Output & UMAP Typing },
year = {2025},
publisher = {GitHub},
howpublished = {\url{https://github.com/psaisan/PIPSCOUT}},
version = {0.1},
}