Publications

You can also find my articles on my Google Scholar profile.

Journal Articles


Single-cell RNA-seq data have prevalent blood contamination but can be rescued by Originator, a computational tool separating single-cell RNA-seq by genetic and contextual information

Published in Genome Biology, 2025

Single-cell RNA sequencing (scRNA-seq) data from complex human tissues have prevalent blood cell contamination during the sample preparation process. They may also comprise cells of different genetic makeups. We propose a new computational framework, Originator, which deciphers single cells by genetic origin and separates immune cells of blood contamination from those of expected tissue-resident cells. We demonstrate the accuracy of Originator at separating immune cells from the blood and tissue as well as cells of different genetic origins, using a variety of artificially mixed and real datasets, including pancreatic cancer and placentas as examples.

Download Paper

Preprints


[Preprint] STDrug enables spatially informed personalized drug repurposing from spatial transcriptomics

Published in bioRxiv, 2026

Drug repurposing offers a scalable route to accelerate therapeutic discovery, yet existing approaches based on single-cell RNA sequencing (scRNA-seq) often overlook spatial tissue context, limiting their ability to capture microenvironment-dependent drug responses. Here we present STDrug, a spatially informed computational framework that integrates spatial transcriptomics, graph-based modeling, and multimodal learning to enable patient-specific therapeutic prioritization. STDrug identifies and aligns disease and control spatial domains using graph convolutional networks and coherent point drift, and prioritizes candidate drugs through an integrative scoring scheme combining tumor-reversible gene signatures, perturbation-based reversal scores, and knowledge-guided gene weighting within a machine learning framework. By modeling spatial domain interactions alongside predicted drug efficacy and toxicity, STDrug generates robust patient-level drug scores. Across hepatocellular carcinoma and prostate cancer datasets, STDrug outperforms existing single-cell and spatial transcriptomics-based drug repurposing methods, achieving signficantly improved predictive accuracy (AUCs=0.81-0.82) across patients. Validation using large-scale electronic health records and in vitro assays further supports the translational relevance of top-ranked candidates. Taking together, STDrug establishes a generalizable framework for incorporating spatial omics into therapeutic discovery, advancing spatially informed and personalized drug repurposing.

Download Paper

[Preprint] Linking Spatial Omics to Patient Phenotypes at Population Scale Using BSNMani

Published in medRxiv, 2025

Spatial omics enables the integration of gene expression with clinical outcome, yet incorporating spatial single-cell data into predictive statistical models at the population scale remains a significant challenge. Here, we adapt BSNMani, a Bayesian scalar-on-network regression model with manifold learning, to incorporate spatial co-expression networks for disease outcome modeling. Using the Seattle Alzheimer’s Disease Brain Cell Atlas (SEA-AD) MERFISH dataset (n=26), we found that Smoothie is a desired method for constructing spatially informed sample-specific co-expression matrices within the BSNMani framework, among the four benchmarked methods, including WGCNA, Smoothie, SpaceX, and hdWGCNA. BSNMani reached an accuracy of AUC = 0.76 for Alzheimer’s Disease (AD) prediction, while revealing 4 distinct gene-gene co-expression subnetworks among the patients. We also applied the Smoothie + BSNMani workframe to predict the patient survival from a breast cancer spatial proteomics dataset obtained with Imaging Mass Cytometry (IMC) technology. The workframe showed robust predictive accuracy for patient survival and revealed biologically meaningful subnetworks associated with tumor progression, immune regulation, hormone signaling, and metabolic reprogramming. BSNMani is a powerful tool that integrates high-dimensional spatial omics data for clinical outcome prediction across diverse disease settings, while revealing deep biological insights and easy interpretation.

Download Paper