Machine Learning Biomarkers for Suicide Risk Assessment in Depression

Identifying NR3C1a and HSPA1B as genetic biomarkers for suicide risk in MDD using machine learning and gene expression analysis

Overview

Machine learning approach to identify genetic biomarkers for suicide risk assessment in major depressive disorder (MDD). Analyzed RNA expression data from 197 MDD patients and 151 controls using 10 ML models, identifying NR3C1a and HSPA1B as consistent biomarkers across multiple experimental configurations.

Problem Statement

Suicide claims over 700,000 lives globally each year. Current diagnostic methods rely on subjective self-reporting (BSRS-5, C-SSRS), leading to false positives/negatives and inconsistent risk assessment. Despite extensive biomarker research (DNA methylation, RNA expression), clinical implementation remains limited.

Methodology

Dataset & Preprocessing

Training Data: 197 MDD patients + 151 controls, 32 inflammation-related genes (qPCR RNA expression)
Test Data: Independent suicide ideation dataset for external validation

Data Processing:

  • KNN imputation for missing values, Min-Max normalization
  • Binary classification with two risk stratifications tested
  • 5-fold cross-validation across 10 ML models (Random Forest, SVM, XGBoost, etc.)

Biomarker Identification

Feature Selection: Top 3 genes per model based on decision impact
Consensus Approach: Most frequent genes across robust models (CV accuracy > 0.5)
Validation: Differential gene expression analysis with Bonferroni correction, KEGG pathway analysis

Results

Training Performance: Best configuration achieved 0.77 CV accuracy (Set 1)
Identified Biomarkers:

  • NR3C1a (log-fold change 0.374): Glucocorticoid receptor signaling, stress hormone regulation
  • HSPA1B (log-fold change 0.427): Heat shock protein, cellular stress response
  • ABL1 (log-fold change 0.519): Signal transduction in stress pathways

Test Set Performance: External validation showed 0.44-0.61 accuracy (limited generalizability due to small sample size and dataset heterogeneity)

Applications

  • Objective suicide risk assessment in clinical psychiatry
  • Biomarker-guided treatment stratification for MDD patients
  • Early intervention identification for high-risk individuals
  • Complement to existing subjective screening tools

Limitations & Future Work

Current Limitations: Limited test set generalizability (0.44-0.61), small sample size, dataset-specific biases

Future Directions:

  • Expand datasets with diverse demographics
  • Multi-omics integration (protein, metabolic markers)
  • Longitudinal biomarker tracking for risk trajectory analysis

Achievements & Recognition

Key Metrics

  • 10 ML models evaluated across 8 experimental combinations
  • Identified 2 primary biomarkers (NR3C1a, HSPA1B) with consistent upregulation
  • 0.77 training accuracy, differential expression validation with Bonferroni correction

Team & Collaboration

Supervisors: Ashley Lim & Prof. Caroline Lee
Institution: Department of Biochemistry, National University of Singapore

Timeline

Duration: August 2024 - December 2024 (5 months)