cv

Dive into the career highlights of Yanuo Zhou, a passionate researcher and engineer driving innovation in health and technology.

Basics

Name ZHOU, YANUO (Arnold)
Label Research Scientist & ML Engineer
Email yanuo.zhou@outlook.com
Url https://arnold117.github.io/
Summary MSc in Precision Health and Medicine (NUS) with expertise in machine learning, biomarker discovery, wearable signal processing, and reproducible ML pipelines. Focused on building audit-ready workflows for biomedical applications.

Work

  • 2024.08 - 2025.10
    Research Assistant
    TUM-CREATE (NRF Singapore research centre, collaborating with TUM, NUS and NTU)
    Developing reproducible ML pipelines for wearable spectral data analysis with focus on circadian health digital phenotyping.
    • Multi-sensor fusion and feature engineering for wearable spectral data
    • Circadian health digital phenotyping and biomarker discovery
    • Reproducible ML pipelines with audit-ready workflows
    • Supervisor: Prof. Dr. Manuel Spitschan (Technical University of Munich)
  • 2019.10 - 2023.06
    Research Assistant
    Key Laboratory of Non-Destructive Testing, Ministry of Education, China
    Diverse research projects spanning wearable sensors, signal processing, and medical imaging.
    • Wearable optical sweat analytics with microfluidics and colourimetry
    • ECG & EEG real-time signal detection systems
    • U-Net segmentation for dermatology imaging
    • Intelligent tunnel multi-parameter monitoring systems
    • Signal processing, computer vision, and embedded systems development

Education

  • 2024.01 - 2026.01

    Singapore

    Master of Science
    National University of Singapore (NUS)
    Precision Health and Medicine
    • AI & Machine Learning
    • Applied Statistics
    • High-Performance Computing
    • Human Genomics
    • Proteomics and Metabolomics
    • Precision Biomarker
    • Precision Diagnosis
  • 2019.09 - 2023.07

    Nanchang, China

    Bachelor of Engineering
    Nanchang Hangkong University (NCHU)
    Biomedical Engineering
    • Biomedical Digital Signal Processing
    • Medical Ultrasound
    • Medical Electronics
    • Foundation of Medical Software
    • Medical Imaging Technology
    • Principle of Medical Instrumentation Design

Awards

Publications

Skills

Machine Learning & Data Science
Python (pandas, scikit-learn, PyTorch, PyTorch Geometric)
R, MATLAB
Classical ML: SVM, XGBoost, Logistic Regression, Random Forest
Deep learning: CNN (U-Net), Large-scale Autoencoders (545M params), GNN (RGCN, bipartite GNN), VAE, Transformers
Knowledge graphs, drug-disease link prediction
Participant-wise 5-fold CV, stratified CV, Bayesian optimization
Imbalanced learning: SMOTE-Tomek, balanced class weights
Evaluation: ROC-AUC, Hits@K, MRR, F1, precision/recall, Spearman correlation
Model interpretability, feature importance, embedding analysis (t-SNE, UMAP)
GPU acceleration: NVIDIA CUDA, Apple Silicon MPS
Bioinformatics & Omics
Multi-omics integration: transcriptomics + metabolomics (132K genes × 7K metabolites)
Ultra-high-dimensional data: 6,619:1 feature-to-sample ratio handling
Large-scale autoencoders: dual-modality architecture (545M + 30.8M params)
Nonlinear dimensionality reduction & latent space analysis
Gene-metabolite association discovery (Spearman + FDR correction)
RNA-seq analysis (GEO cohorts, differential expression)
Cross-cohort validation for biomarker discovery
KEGG pathway enrichment & biological validation
Knowledge graphs (PrimeKG): 30K+ nodes, 849K+ edges
Drug repurposing & therapeutic indication prediction
Privacy-preserving collaborative research & data anonymization
Data Quality & Responsible AI
Data leakage detection & correction (PHQ-9 label contamination)
Ground truth validation & label quality assessment
Transparent limitation reporting (clinical readiness)
Negative evidence documentation (PCA, SMOTE failures)
Model bias detection & fairness evaluation
Reproducible validation protocols
Ethical AI practices & scientific integrity
Study protocol design & data collection SOPs
Feature Engineering & Wearable Pipelines
α-opic/SPD spectral conversion
L2/log normalization, hour-medoid aggregation
Cyclic time features (circadian analysis)
Behavioral features: GPS, app usage, communication patterns
Digital phenotyping from passive smartphone sensors
Config-driven pipelines with parameter sweeps
Audit-ready workflows with environment hashes
Multi-sensor fusion & signal synchronization
Reproducibility & Tooling
Git version control & collaborative workflows
Jupyter notebooks for literate programming
Docker containers for environment reproducibility
Config-driven experiments (YAML, JSON)
Seeded reproducible runs with deterministic pipelines
Performance optimization: embedding caching, vectorization
HPC & Nextflow workflows (extending to cloud)
AWS/Cloud computing (learning)
LLM & Multi-Agent Systems
LangGraph orchestration & state management
FastMCP 2.0 (Model Context Protocol)
Multi-agent architectures & agent debate
Hybrid local/cloud LLM: Qwen 3 (MLX) + Claude
Apple Silicon MLX optimization
Prompt engineering & chain-of-thought
Hallucination mitigation & citation traceability
RAG (Retrieval-Augmented Generation) pipelines
Vector search integration (semantic retrieval)
Biomedical Signal Processing & Embedded Systems
Wavelet/FFT filtering for ECG/EEG signals
Real-time signal processing (<50ms latency)
U-Net segmentation for medical images
STM32 microcontrollers & embedded firmware
Android development (Kotlin) for health apps
Sensor integration: ECG/EEG/SpO2/temperature acquisition
Anomaly detection & signal quality assessment
Data augmentation & preprocessing pipelines
IoT applications & wireless sensor networks
Microfluidics & Prototyping
Microfluidic device design (Tesla valve, Dean vortex)
RGB colorimetry for biomarker detection
Sweat collection & concentration gradient chips
PCB layout, validation & rapid prototyping
Optical sensing & fluorescence-based detection

Languages

English
Fluent
Chinese (Mandarin)
Native speaker

Interests

AI for Health (AI4Health)
Digital phenotyping
Precision medicine
Healthcare ML applications
Biomarker discovery
Wearable health monitoring
Behavioral pattern recognition
AI-Driven Drug Discovery (AIDD)
Graph neural networks
Drug-disease link prediction
Drug repurposing
Biomedical knowledge graphs
Molecular interactions
Computational drug design
Digital Phenotyping & Wearable Sensing
Continuous health monitoring
Circadian rhythm analysis
Wearable biosensors
Multi-sensor fusion
Real-time physiological signals
Light exposure classification
Bioinformatics & Omics Analysis
RNA-seq biomarker discovery
Pathway enrichment
Cross-cohort validation
NGS pipelines
Precision health genomics
Clinical relevance analysis
Biomedical Signal Processing
ECG/EEG analysis
Embedded systems
Sensor fusion
Real-time processing
Physiological monitoring
Time-series analysis
Wearable Non-Invasive Diagnosis & Therapeutics
Microfluidic devices
Point-of-care diagnostics
Non-invasive biosensors
Colorimetric detection
Real-time biomarker analysis
Wearable health tech

References

Prof Dr Manuel Spitschan
Former TUM-CREATE supervisor. Guided wearable spectral data analysis and circadian health research.
A/Prof Caroline Lee
NUS course supervisor. Supervised RNA-seq biomarker discovery for psychiatric applications.
A/Prof Shi Huanhuan
Undergraduate research supervisor. Co-authored Talanta Q1 paper on sweat analytics; supervised National Innovation Project.
Prof Jiang Shaofeng
Undergraduate mentor. Observed academic growth over 4 years; endorsed for PhD in biomedical engineering.
Prof Zhang Weiwei
Undergraduate research supervisor. Led Intelligent Tunnel Monitoring and Wearable Cortisol Detection projects.

Projects

  • 2026.01 - present
    LitScribe - Autonomous Academic Literature Synthesis Engine
    LangGraph multi-agent system with GraphRAG for deep cross-paper literature synthesis. 7-agent architecture producing 24-paper reviews with 100% citation grounding in 15 min at $0.098.
    • 7-agent architecture: Planning, Discovery, Critical Reading, GraphRAG, Synthesis, Self-Review, Refinement
    • GraphRAG knowledge synthesis: entity extraction, Leiden community detection, multi-level summarization
    • Benchmark: 24-paper CHO review, 7,679 words, 100% citation grounding, 15 min, $0.098
    • Multi-source search: arXiv, PubMed, Semantic Scholar with domain-aware filtering and Zotero integration
    • 272 tests across 20 files; scales to 50-500 papers via tiered review system
    • Multi-format export (Word/PDF/HTML/LaTeX) with 5 citation styles; multi-language generation with CJK support
  • 2025.12 - present
    Multi-Omics Integration with Autoencoders and Graph Neural Networks (ONGOING)
    Multi-evidence framework for biosynthetic pathway gene discovery. Triangulates AE feature importance, GAT validation, and sequence homology into unified gene ranking across 132K genes × 7K metabolites.
    • 5-stage pipeline: Data Prep → AE Training → Visualization → Feature Importance → Multi-Evidence Ranking
    • Dual autoencoders: Gene AE (545M params, 132K→64) + Metabolite AE (30.8M params, 7K→64)
    • GAT validates biological signal: classification accuracy confirms AE importance scores are meaningful
    • Multi-evidence ranking: 0.4 correlation + 0.3 AE importance + 0.2 GNN importance + 0.1 BLAST bonus
    • Gene family batch processing with BLAST/HMM integration and configurable YAML weights
    • Hardware abstraction: MPS (Apple Silicon) / CUDA / CPU with checkpoint system
  • 2025.11 - present
    Multimodal Depression Detection via Smartphone Sensing (ONGOING)
    Ongoing ML research for depression biomarker discovery from passive smartphone sensing. Core baseline completed; currently adding new features and models.
    • Core finding: Classical ML (Logistic Regression) outperformed deep learning (76.2% AUC-ROC)
    • Critical data quality work: Detected and corrected PHQ-9 label contamination
    • 7 models compared: Logistic Regression, Random Forest, XGBoost, VAE, GNN, Contrastive Learning, Transformer
    • 50 behavioral features across GPS, app usage, communication, physical activity
    • Top biomarker: app usage diversity variability (behavioral activation/anhedonia proxy)
    • Responsible AI: Transparent limitation reporting (NOT clinically ready)
  • 2025.10 - Present
    PrimeKG GNN Drug-Disease Link Prediction
    Multi-architecture GNN benchmark (6 models) for drug repurposing on PrimeKG. V2.0 with data leakage fix (71.5%→0%) and strict hard-negative evaluation. GAT best at 0.9866 AUC-ROC.
    • 6-model comparison: GAT (best), RGCN, GIN, GraphSAGE, GCN, MLP on 30,926-node PrimeKG graph
    • GAT: 0.9866 AUC-ROC, 0.7955 AP, 0.9831 Hits@10, 0.9031 MRR (strict eval, 50 hard negatives)
    • V2.0: Data leakage fix via undirected-edge-aware splitting (71.5% → 0% leakage)
    • Key finding: Attention mechanisms outperform explicit relation-type modeling (GAT > RGCN)
    • Advanced analyses: path-based explanations, case studies, embedding viz, biological validation
    • Modular encoder registry with CUDA/MPS/CPU auto-detection
  • 2024.08 - 2025.10
    AI-Driven Light Exposure Classification
    ML framework for circadian health phenotyping from wearable sensors. MSc Capstone project achieving 88.1% accuracy (AUC 0.938) in distinguishing natural from artificial light.
    • 88.1% accuracy (AUC 0.938) for natural vs. artificial light classification
    • 288-configuration grid search with participant-wise generalization
    • Spectral shape prioritization over absolute intensity
    • L2 normalization with hour-medoid aggregation
    • Reproducible pipeline with fixed seeds and environment hashes
    • Transparent negative evidence reporting (PCA, SMOTE-Tomek limitations)
  • 2024.08 - 2024.12
    Machine Learning Biomarkers for Suicide Risk Assessment in Depression
    Course-based research identifying psychiatric biomarkers (NR3C1a, HSPA1B) from RNA-seq with cross-cohort validation and imbalanced learning.
    • NR3C1a, HSPA1B as top biomarkers for suicide risk in depression
    • Cross-cohort validation (primary + 3 external cohorts)
    • Imbalanced learning with SMOTE-Tomek
    • KEGG pathway enrichment analysis
    • 4-class to binary mapping for clinical relevance
    • Reproducible ML workflow with audit-ready artifacts
  • 2020.09 - 2022.07
    Wearable Optical Sweat Analytics
    Microfluidics and RGB colorimetry for biomarker detection. National innovation project with Talanta publication and competition awards.
    • 3rd Prize (10th National Student Optoelectronic Design Competition, 2022)
    • Talanta Q1 journal publication (2022, co-author)
    • Tesla valve-based sweat collection optimization
    • RGB colorimetry signal processing
    • NCHU 'Three Small Projects' award
    • Funded by Outstanding Young Scientist Project (20192BCB23011)
  • 2022.09 - 2023.06
    Android Sleep Quality Monitoring System
    BEng thesis achieving 80% accurate sleep quality monitoring using phone sensors with volunteer validation. No external devices required.
    • 80% sleep quality classification accuracy
    • Kotlin-based Android implementation
    • 3-axis accelerometer, illumination, microphone sensor fusion
    • Volunteer cohort validation
    • Minimal resource consumption
    • Comparable performance to commercial wearables
  • 2022.02 - 2022.07
    Multi-Parameter Physiological Monitoring System
    Real-time monitoring of ECG, SpO2, respiration, temperature, and blood pressure with Qt/QML desktop and web interfaces.
    • 5-parameter monitoring (ECG, SpO2, respiration, temperature, BP)
    • STM32 hardware acquisition
    • Real-time signal conditioning and filtering
    • Qt/QML + HTML/JavaScript dashboards
    • Sub-50ms latency
    • Cross-platform deployment
  • 2021.02 - 2021.06
    U-Net for Skin Lesion Segmentation
    Complete implementation with LabelMe annotation, model training, validation, and PyQt5 GUI for dermatological image analysis.
    • U-Net architecture implementation
    • LabelMe manual annotation workflow
    • Class-specific IoU breakdown
    • Data augmentation for limited datasets
    • PyQt5 GUI for inference
    • Medical imaging preprocessing pipeline
  • 2019.10 - 2020.09
    Fluorescence-Based Fire Safety Monitoring System
    Smart monitoring system using fluorescence sensing for temperature, humidity, and fire detection in buildings and tunnels.
    • Silver Award (12th 'Challenge Cup' Jiangxi Student Entrepreneurship, 2020)
    • 2nd Prize (8th National Student Optoelectronic Design Competition, 2020)
    • 6 patents filed and published
    • Distributed optical fiber network integration
    • Multi-parameter sensing (temperature, humidity, trend analysis)
    • Electromagnetic interference immunity
  • 2019.12 - 2021.12
    Microfluidic Concentration Gradient Chip for Drug Susceptibility
    Dean vortex secondary flow mixing for high-throughput biochemical applications. 2 patents and journal publication.
    • 1 invention patent + 1 utility model patent
    • Published in Chinese Journal of Medical Physics (2021)
    • Dean vortex secondary flow optimization
    • Gradient-based drug susceptibility testing
    • High-throughput biochemical analysis
    • Funded by PhD Research Startup Fund (EA202008205)
  • 2021.02 - 2021.06
    Thermal Cycling Device for Fluorescence Quantitative PCR
    Research project on thermal cycling system design for qPCR instrumentation with precision temperature control.
    • Successful Participant (2021 National Undergraduate BME Innovation Design Competition)
    • Thermal control optimization
    • PCR instrumentation design
    • Precision temperature management
    • System integration and testing
  • 2022.09 - 2022.12
    Quantitative Investment Strategies in Cryptocurrency Markets
    Research on profitability of different investment strategies in cryptocurrency trading with backtesting and risk analysis.
    • 4 quantitative strategies implemented and compared
    • Backtesting framework development
    • Risk-return optimization analysis
    • Market dynamics understanding
    • Strategy adaptation importance demonstrated
    • Portfolio performance evaluation