ML-Based Biomarker Discovery for Suicide Risk in Depression

RNA-seq analysis for psychiatric biomarker identification

Overview

A course-based research project conducted at NUS Yong Loo Lin School of Medicine, focusing on identifying psychiatric biomarkers for suicide risk in depression using machine learning and RNA-seq data.

Problem Statement

Depression is a complex psychiatric disorder with heterogeneous presentations. Suicide risk stratification requires:

  • Robust, clinically relevant biomarkers
  • Cross-cohort validation to ensure generalization
  • Integration of statistical and biological significance
  • Reproducible ML workflows for clinical applicability

Key Achievements

  • RNA-seq count-level models with external cohort validation
  • Cross-cohort validation demonstrating model robustness
  • 4-class to binary mapping for clinical decision-making
  • KEGG pathway enrichment linking biomarkers to biological mechanisms
  • Model screening criteria (CV < 0.5) ensuring robust generalization

Methodology

Data Analysis

  • GEO cohort integration and preprocessing
  • Imbalanced learning (SMOTE-Tomek) for class balance
  • Feature selection and KNN imputation
  • External validation workflows

Biomarker Discovery

  • RNA-seq count data modeling
  • Pathway enrichment analysis
  • Clinical outcome mapping
  • Longitudinal validation

Results

  • Identified key biomarkers: NR3C1a, HSPA1B
  • Cross-cohort AUC > 0.8
  • KEGG pathways: Glucocorticoid signaling, Heat shock response

Supervisor

A/Prof. Caroline Lee - Vice Dean & Programme Director, NUS Graduate School, Yong Loo Lin School of Medicine

Timeline

  • Start: August 2024
  • End: December 2024
  • Duration: 5 months