parallel differential expression for single-cell perturbation sequencing
Add to your pyproject.toml file with uv
uv add pdexThis is a python package for performing parallel differential expression between multiple groups and a control.
It is optimized for very large datasets and very large numbers of perturbations.
It makes use of shared memory to parallelize the computation to a high number of threads and minimizes the IPC between processes to reduce overhead.
It supports the following metrics:
- Wilcoxon Rank Sum
- Anderson-Darling
- T-Test
pdex adapts its execution strategy based on how the AnnData object is stored:
- In-memory AnnData (e.g., loaded via
sc.read_h5ad(path)withoutbacked="r"): pdex uses a shared-memory multiprocessing workflow. Each worker process gets access to the full expression matrix through shared memory, which minimizes serialization overhead. Parallelism is configured via thenum_workersparameter (process count).num_threadsis ignored in this mode because numba kernels operate on per-target slices entirely in memory. - Backed AnnData (
adata.Xis an on-disk HDF5 dataset): pdex automatically switches to the low-memory chunked implementation. Gene chunks are streamed from disk, the reference group is computed once per chunk, and targets are processed in parallel vianum_workers(thread pool). Within each target, Wilcoxon metrics can additionally use numba parallelization controlled bynum_threads. This mode avoids loading the entire matrix into RAM while still enabling both target-level and gene-level parallelism.
If a backed dataset is supplied without enabling low-memory mode, pdex raises a
helpful error explaining that chunked processing is required. Conversely, you can
force the chunked path for large in-memory matrices by passing low_memory=True.
parallel_differential_expression exposes two orthogonal knobs for controlling
parallel execution:
num_workerscontrols the number of Python threads that process targets within each gene chunk.None(default in low-memory mode) enables an auto-detected worker count based on available CPUs, while1disables thread-level parallelism.num_threadscontrols the numba thread pool used by the Wilcoxon kernel.Nonelets numba auto-detect the optimal size, whereas1turns numba parallelization off. This setting is only used in low-memory mode and only whenmetric="wilcoxon". When pdex detects non-integer expression values in a gene chunk (for example, after log-normalization), it automatically disables numba for that chunk, logs a warning, and falls back to the SciPy implementation to preserve correct rank ordering.
These strategies can be combined: for example, num_workers=2, num_threads=8
runs two target threads that share an eight-thread numba pool. When the metric
does not support numba acceleration, pdex automatically logs a warning and
falls back to thread-only execution.
import anndata as ad
import numpy as np
import pandas as pd
from pdex import parallel_differential_expression
PERT_COL = "perturbation"
CONTROL_VAR = "control"
N_CELLS = 1000
N_GENES = 100
N_PERTS = 10
MAX_UMI = 1e6
def build_random_anndata(
n_cells: int = N_CELLS,
n_genes: int = N_GENES,
n_perts: int = N_PERTS,
pert_col: str = PERT_COL,
control_var: str = CONTROL_VAR,
) -> ad.AnnData:
"""Sample a random AnnData object."""
return ad.AnnData(
X=np.random.randint(0, MAX_UMI, size=(n_cells, n_genes)),
obs=pd.DataFrame(
{
pert_col: np.random.choice(
[f"pert_{i}" for i in range(n_perts)] + [control_var],
size=n_cells,
replace=True,
),
}
),
)
def main():
adata = build_random_anndata()
# Run pdex with default metric (wilcoxon)
results = parallel_differential_expression(
adata,
reference=CONTROL_VAR,
groupby_key=PERT_COL,
)
assert results.shape[0] == N_GENES * N_PERTS
# Run pdex with alt metric (anderson)
results = parallel_differential_expression(
adata,
reference=CONTROL_VAR,
groupby_key=PERT_COL,
metric="anderson"
)
assert results.shape[0] == N_GENES * N_PERTS