ML-Based Biomarker Discovery for Suicide Risk in Depression
RNA-seq analysis for psychiatric biomarker identification
Overview
A course-based research project conducted at NUS Yong Loo Lin School of Medicine, focusing on identifying psychiatric biomarkers for suicide risk in depression using machine learning and RNA-seq data.
Problem Statement
Depression is a complex psychiatric disorder with heterogeneous presentations. Suicide risk stratification requires:
- Robust, clinically relevant biomarkers
- Cross-cohort validation to ensure generalization
- Integration of statistical and biological significance
- Reproducible ML workflows for clinical applicability
Key Achievements
- RNA-seq count-level models with external cohort validation
- Cross-cohort validation demonstrating model robustness
- 4-class to binary mapping for clinical decision-making
- KEGG pathway enrichment linking biomarkers to biological mechanisms
- Model screening criteria (CV < 0.5) ensuring robust generalization
Methodology
Data Analysis
- GEO cohort integration and preprocessing
- Imbalanced learning (SMOTE-Tomek) for class balance
- Feature selection and KNN imputation
- External validation workflows
Biomarker Discovery
- RNA-seq count data modeling
- Pathway enrichment analysis
- Clinical outcome mapping
- Longitudinal validation
Results
- Identified key biomarkers: NR3C1a, HSPA1B
- Cross-cohort AUC > 0.8
- KEGG pathways: Glucocorticoid signaling, Heat shock response
Supervisor
A/Prof. Caroline Lee - Vice Dean & Programme Director, NUS Graduate School, Yong Loo Lin School of Medicine
Links
- Code: GitHub
- Supervisor: A/Prof. Caroline Lee
Timeline
- Start: August 2024
- End: December 2024
- Duration: 5 months