cv

Dive into the career highlights of Yanuo Zhou, a passionate researcher and engineer driving innovation in health and technology.

Basics

Name ZHOU, YANUO (Arnold)
Label Research Scientist & ML Engineer
Email yanuo.zhou@outlook.com
Url https://arnold117.github.io/
Summary MSc in Precision Health and Medicine (NUS) with expertise in machine learning, biomarker discovery, wearable signal processing, and reproducible ML pipelines. Focused on building audit-ready workflows for biomedical applications.

Work

  • 2024.08 - 2025.10
    Research Assistant
    TUM-CREATE (NRF Singapore research centre, collaborating with TUM, NUS and NTU)
    Developing reproducible ML pipelines for wearable spectral data analysis with focus on circadian health digital phenotyping.
    • Multi-sensor fusion and feature engineering for wearable spectral data
    • Circadian health digital phenotyping and biomarker discovery
    • Reproducible ML pipelines with audit-ready workflows
    • Supervisor: Prof. Dr. Manuel Spitschan (Technical University of Munich)
  • 2019.10 - 2023.06
    Research Assistant
    Key Laboratory of Non-Destructive Testing, Ministry of Education, China
    Diverse research projects spanning wearable sensors, signal processing, and medical imaging.
    • Wearable optical sweat analytics with microfluidics and colourimetry
    • ECG & EEG real-time signal detection systems
    • U-Net segmentation for dermatology imaging
    • Intelligent tunnel multi-parameter monitoring systems
    • Signal processing, computer vision, and embedded systems development

Education

  • 2024.01 - 2026.01

    Singapore

    Master of Science
    National University of Singapore (NUS)
    Precision Health and Medicine
    • AI & Machine Learning
    • Applied Statistics
    • High-Performance Computing
    • Human Genomics
    • Proteomics and Metabolomics
    • Precision Biomarker
    • Precision Diagnosis
  • 2019.09 - 2023.07

    Nanchang, China

    Bachelor of Engineering
    Nanchang Hangkong University (NCHU)
    Biomedical Engineering
    • Biomedical Digital Signal Processing
    • Medical Ultrasound
    • Medical Electronics
    • Foundation of Medical Software
    • Medical Imaging Technology
    • Principle of Medical Instrumentation Design

Awards

Publications

Skills

Machine Learning & Data Science
Python (pandas, scikit-learn, PyTorch, PyTorch Geometric)
R, MATLAB
Classical ML: SVM, XGBoost, Logistic Regression, Random Forest
Deep learning: CNN (U-Net), Large-scale Autoencoders (545M params), GNN (RGCN, bipartite GNN), VAE, Transformers
Knowledge graphs, drug-disease link prediction
Participant-wise 5-fold CV, stratified CV, Bayesian optimization
Imbalanced learning: SMOTE-Tomek, balanced class weights
Evaluation: ROC-AUC, Hits@K, MRR, F1, precision/recall, Spearman correlation
Model interpretability, feature importance, embedding analysis (t-SNE, UMAP)
GPU acceleration: NVIDIA CUDA, Apple Silicon MPS
Bioinformatics & Omics
Multi-omics integration: transcriptomics + metabolomics (132K genes × 7K metabolites)
Ultra-high-dimensional data: 6,619:1 feature-to-sample ratio handling
Large-scale autoencoders: dual-modality architecture (545M + 30.8M params)
Nonlinear dimensionality reduction & latent space analysis
Gene-metabolite association discovery (Spearman + FDR correction)
RNA-seq analysis (GEO cohorts, differential expression)
Cross-cohort validation for biomarker discovery
KEGG pathway enrichment & biological validation
Knowledge graphs (PrimeKG): 30K+ nodes, 849K+ edges
Drug repurposing & therapeutic indication prediction
Privacy-preserving collaborative research & data anonymization
Data Quality & Responsible AI
Label leakage detection & correction in behavioral ML datasets
Ground truth validation & label quality assessment
Transparent limitation reporting (clinical readiness)
Negative evidence documentation (PCA, SMOTE failures)
Model bias detection & fairness evaluation
Reproducible validation protocols
Ethical AI practices & scientific integrity
Study protocol design & data collection SOPs
Feature Engineering & Wearable Pipelines
α-opic/SPD spectral conversion
L2/log normalization, hour-medoid aggregation
Cyclic time features (circadian analysis)
Behavioral features: GPS, app usage, communication patterns
Digital phenotyping from passive smartphone sensors
Config-driven pipelines with parameter sweeps
Audit-ready workflows with environment hashes
Multi-sensor fusion & signal synchronization
Reproducibility & Tooling
Git version control & collaborative workflows
Jupyter notebooks for literate programming
Docker containers for environment reproducibility
Config-driven experiments (YAML, JSON)
Seeded reproducible runs with deterministic pipelines
Performance optimization: embedding caching, vectorization
HPC & Nextflow workflows (extending to cloud)
AWS/Cloud computing (learning)
LLM & Multi-Agent Systems
LangGraph & DeepAgents orchestration, state management
FastMCP 2.0 (Model Context Protocol)
Multi-agent architectures & agent debate
Hybrid local/cloud LLM: Qwen 3 (MLX) + Claude
Apple Silicon MLX optimization
Prompt engineering & chain-of-thought
Hallucination mitigation & citation traceability
RAG (Retrieval-Augmented Generation) pipelines
Vector search integration (semantic retrieval)
Biomedical Signal Processing & Embedded Systems
Wavelet/FFT filtering for ECG/EEG signals
Real-time signal processing (<50ms latency)
U-Net segmentation for medical images
STM32 microcontrollers & embedded firmware
Android development (Kotlin) for health apps
Sensor integration: ECG/EEG/SpO2/temperature acquisition
Anomaly detection & signal quality assessment
Data augmentation & preprocessing pipelines
IoT applications & wireless sensor networks
Microfluidics & Prototyping
Microfluidic device design (Tesla valve, Dean vortex)
RGB colorimetry for biomarker detection
Sweat collection & concentration gradient chips
PCB layout, validation & rapid prototyping
Optical sensing & fluorescence-based detection

Languages

English
Fluent
Chinese (Mandarin)
Native speaker

Interests

AI for Health (AI4Health)
Digital phenotyping
Precision medicine
Healthcare ML applications
Biomarker discovery
Wearable health monitoring
Behavioral pattern recognition
AI-Driven Drug Discovery (AIDD)
Graph neural networks
Drug-disease link prediction
Drug repurposing
Biomedical knowledge graphs
Molecular interactions
Computational drug design
Digital Phenotyping & Wearable Sensing
Continuous health monitoring
Circadian rhythm analysis
Wearable biosensors
Multi-sensor fusion
Real-time physiological signals
Light exposure classification
Bioinformatics & Omics Analysis
RNA-seq biomarker discovery
Pathway enrichment
Cross-cohort validation
NGS pipelines
Precision health genomics
Clinical relevance analysis
Biomedical Signal Processing
ECG/EEG analysis
Embedded systems
Sensor fusion
Real-time processing
Physiological monitoring
Time-series analysis
Wearable Non-Invasive Diagnosis & Therapeutics
Microfluidic devices
Point-of-care diagnostics
Non-invasive biosensors
Colorimetric detection
Real-time biomarker analysis
Wearable health tech

References

Prof Dr Manuel Spitschan
Former TUM-CREATE supervisor. Guided wearable spectral data analysis and circadian health research.
A/Prof Caroline Lee
NUS course supervisor. Supervised RNA-seq biomarker discovery for psychiatric applications.
Asst Prof Cyrus Ho Su Hui
NUS Department of Psychological Medicine (Yong Loo Lin School of Medicine); Assistant Dean (Student Life and Wellbeing), NUS Graduate School; Senior Consultant Psychiatrist at NUH; Clarivate Highly Cited Researcher (2021-2023). Supervising the three-study head-to-head comparison of passive sensing vs. personality questionnaires for mental health prediction (manuscript under review at IEEE JBHI).
A/Prof Shi Huanhuan
Undergraduate research supervisor. Co-authored Talanta Q1 paper on sweat analytics; supervised National Innovation Project.
Prof Jiang Shaofeng
Undergraduate mentor. Observed academic growth over 4 years; endorsed for PhD in biomedical engineering.
Prof Zhang Weiwei
Undergraduate research supervisor. Led Intelligent Tunnel Monitoring and Wearable Cortisol Detection projects.

Projects

  • 2026.01 - present
    LitScribe - AI Literature Review with Citation Grounding & Contradiction Detection
    DeepAgents supervisor + 11-step deterministic pipeline producing verified, contradiction-aware literature reviews in ~2 minutes. Citation grounding 83-100%; arXiv preprint drafted.
    • DeepAgents supervisor (7 tools) routing natural-language requests to an 11-step deterministic pipeline
    • Three novel contributions: contradiction-aware synthesis, metacognitive quality loop, search-augmented refinement
    • Multi-agent debate (reviewer ↔ synthesizer, 2 rounds) plus per-claim citation grounding against source PDFs
    • 6-database parallel search (OpenAlex, Semantic Scholar, CrossRef, Europe PMC, PubMed, arXiv) + Unpaywall OA enrichment
    • Benchmark across 5 domains: scores 0.55-0.82, 83-100% grounding, ~2 min/review
    • CLI (12 commands), FastAPI Web UI with 16 SSE-streaming endpoints, cross-lingual (EN/ZH), local paper modes (draft/outline/augment)
  • 2025.12 - present
    Multi-Omics Integration with Autoencoders and Graph Neural Networks (ONGOING)
    Multi-evidence framework for biosynthetic pathway gene discovery. Triangulates AE feature importance, GAT validation, and sequence homology into unified gene ranking across 132K genes × 7K metabolites.
    • 5-stage pipeline: Data Prep → AE Training → Visualization → Feature Importance → Multi-Evidence Ranking
    • Dual autoencoders: Gene AE (545M params, 132K→64) + Metabolite AE (30.8M params, 7K→64)
    • GAT validates biological signal: classification accuracy confirms AE importance scores are meaningful
    • Multi-evidence ranking: 0.4 correlation + 0.3 AE importance + 0.2 GNN importance + 0.1 BLAST bonus
    • Gene family batch processing with BLAST/HMM integration and configurable YAML weights
    • Hardware abstraction: MPS (Apple Silicon) / CUDA / CPU with checkpoint system
  • 2025.11 - present
    Passive Sensing vs. Personality Questionnaires for Mental Health Prediction (UNDER REVIEW)
    Three-study head-to-head comparison (N=1,559, 3 universities, 15 outcomes) showing personality questionnaires win 14/15 outcomes vs. weeks of continuous passive sensing. Manuscript under review at IEEE JBHI.
    • 3 datasets unified: StudentLife (Dartmouth, N=28), NetHealth (Notre Dame, N=722), GLOBEM (UW, N=809)
    • Headline result: 2 BFI items (10 sec) outperform 28 sensing features (weeks); personality R²=0.126 vs. sensing R²=-0.153
    • Deep learning baselines confirm signal-not-model: 1D-CNN R²=-0.03 to -0.10; MOMENT foundation model R²=-1.0 to -1.7
    • Reframed sensing as idiographic (17% of individuals R²>0.3) not nomothetic; 8 conditions where sensing adds value identified
    • 44 robustness analyses (FDR-corrected): reliability ICC=0.73-0.98, dose-response, prospective change, residualized prediction, cross-study transfer
    • Manuscript submitted to IEEE Journal of Biomedical and Health Informatics (JBHI); currently under review
    • Supervisor: Asst Prof Cyrus Ho Su Hui (NUS Dept of Psychological Medicine; NUH Senior Consultant Psychiatrist)
  • 2025.10 - Present
    PrimeKG GNN Drug-Disease Link Prediction
    Multi-architecture GNN benchmark (6 models) for drug repurposing on PrimeKG. V2.0 with data leakage fix (71.5%→0%) and strict hard-negative evaluation. GAT best at 0.9866 AUC-ROC.
    • 6-model comparison: GAT (best), RGCN, GIN, GraphSAGE, GCN, MLP on 30,926-node PrimeKG graph
    • GAT: 0.9866 AUC-ROC, 0.7955 AP, 0.9831 Hits@10, 0.9031 MRR (strict eval, 50 hard negatives)
    • V2.0: Data leakage fix via undirected-edge-aware splitting (71.5% → 0% leakage)
    • Key finding: Attention mechanisms outperform explicit relation-type modeling (GAT > RGCN)
    • Advanced analyses: path-based explanations, case studies, embedding viz, biological validation
    • Modular encoder registry with CUDA/MPS/CPU auto-detection
  • 2024.08 - 2025.10
    AI-Driven Light Exposure Classification
    ML framework for circadian health phenotyping from wearable sensors. MSc Capstone project achieving 88.1% accuracy (AUC 0.938) in distinguishing natural from artificial light.
    • 88.1% accuracy (AUC 0.938) for natural vs. artificial light classification
    • 288-configuration grid search with participant-wise generalization
    • Spectral shape prioritization over absolute intensity
    • L2 normalization with hour-medoid aggregation
    • Reproducible pipeline with fixed seeds and environment hashes
    • Transparent negative evidence reporting (PCA, SMOTE-Tomek limitations)
  • 2024.08 - 2024.12
    Machine Learning Biomarkers for Suicide Risk Assessment in Depression
    Course-based research identifying psychiatric biomarkers (NR3C1a, HSPA1B) from RNA-seq with cross-cohort validation and imbalanced learning.
    • NR3C1a, HSPA1B as top biomarkers for suicide risk in depression
    • Cross-cohort validation (primary + 3 external cohorts)
    • Imbalanced learning with SMOTE-Tomek
    • KEGG pathway enrichment analysis
    • 4-class to binary mapping for clinical relevance
    • Reproducible ML workflow with audit-ready artifacts
  • 2020.09 - 2022.07
    Wearable Optical Sweat Analytics
    Microfluidics and RGB colorimetry for biomarker detection. National innovation project with Talanta publication and competition awards.
    • 3rd Prize (10th National Student Optoelectronic Design Competition, 2022)
    • Talanta Q1 journal publication (2022, co-author)
    • Tesla valve-based sweat collection optimization
    • RGB colorimetry signal processing
    • NCHU 'Three Small Projects' award
    • Funded by Outstanding Young Scientist Project (20192BCB23011)
  • 2022.09 - 2023.06
    Android Sleep Quality Monitoring System
    BEng thesis achieving 80% accurate sleep quality monitoring using phone sensors with volunteer validation. No external devices required.
    • 80% sleep quality classification accuracy
    • Kotlin-based Android implementation
    • 3-axis accelerometer, illumination, microphone sensor fusion
    • Volunteer cohort validation
    • Minimal resource consumption
    • Comparable performance to commercial wearables
  • 2022.02 - 2022.07
    Multi-Parameter Physiological Monitoring System
    Real-time monitoring of ECG, SpO2, respiration, temperature, and blood pressure with Qt/QML desktop and web interfaces.
    • 5-parameter monitoring (ECG, SpO2, respiration, temperature, BP)
    • STM32 hardware acquisition
    • Real-time signal conditioning and filtering
    • Qt/QML + HTML/JavaScript dashboards
    • Sub-50ms latency
    • Cross-platform deployment
  • 2021.02 - 2021.06
    U-Net for Skin Lesion Segmentation
    Complete implementation with LabelMe annotation, model training, validation, and PyQt5 GUI for dermatological image analysis.
    • U-Net architecture implementation
    • LabelMe manual annotation workflow
    • Class-specific IoU breakdown
    • Data augmentation for limited datasets
    • PyQt5 GUI for inference
    • Medical imaging preprocessing pipeline
  • 2019.10 - 2020.09
    Fluorescence-Based Fire Safety Monitoring System
    Smart monitoring system using fluorescence sensing for temperature, humidity, and fire detection in buildings and tunnels.
    • Silver Award (12th 'Challenge Cup' Jiangxi Student Entrepreneurship, 2020)
    • 2nd Prize (8th National Student Optoelectronic Design Competition, 2020)
    • 6 patents filed and published
    • Distributed optical fiber network integration
    • Multi-parameter sensing (temperature, humidity, trend analysis)
    • Electromagnetic interference immunity
  • 2019.12 - 2021.12
    Microfluidic Concentration Gradient Chip for Drug Susceptibility
    Dean vortex secondary flow mixing for high-throughput biochemical applications. 2 patents and journal publication.
    • 1 invention patent + 1 utility model patent
    • Published in Chinese Journal of Medical Physics (2021)
    • Dean vortex secondary flow optimization
    • Gradient-based drug susceptibility testing
    • High-throughput biochemical analysis
    • Funded by PhD Research Startup Fund (EA202008205)
  • 2021.02 - 2021.06
    Thermal Cycling Device for Fluorescence Quantitative PCR
    Research project on thermal cycling system design for qPCR instrumentation with precision temperature control.
    • Successful Participant (2021 National Undergraduate BME Innovation Design Competition)
    • Thermal control optimization
    • PCR instrumentation design
    • Precision temperature management
    • System integration and testing
  • 2022.09 - 2022.12
    Quantitative Investment Strategies in Cryptocurrency Markets
    Research on profitability of different investment strategies in cryptocurrency trading with backtesting and risk analysis.
    • 4 quantitative strategies implemented and compared
    • Backtesting framework development
    • Risk-return optimization analysis
    • Market dynamics understanding
    • Strategy adaptation importance demonstrated
    • Portfolio performance evaluation