cv

Dive into the career highlights of Yanuo Zhou, a passionate researcher and engineer driving innovation in health and technology.

Basics

Name ZHOU, YANUO (Arnold)
Label Research Scientist & ML Engineer
Email yanuo.zhou@outlook.com
Url https://arnold117.github.io/
Summary MSc in Precision Health and Medicine (NUS) with expertise in machine learning, biomarker discovery, wearable signal processing, and reproducible ML pipelines. Focused on building audit-ready workflows for biomedical applications.

Work

  • 2024.08 - 2025.10
    Research Assistant
    TUM-CREATE (NRF Singapore research centre, collaborating with TUM, NUS and NTU)
    Developing reproducible ML pipelines for wearable spectral data analysis with focus on circadian health digital phenotyping.
    • Multi-sensor fusion and feature engineering for wearable spectral data
    • Circadian health digital phenotyping and biomarker discovery
    • Reproducible ML pipelines with audit-ready workflows
    • Supervisor: Prof. Dr. Manuel Spitschan (Technical University of Munich)
  • 2023.02 - 2023.03

    China

    Front-End Engineer (Intern)
    SEECEN TECHNOLOGY CO., LTD.
    Front-end web development with focus on cross-browser compatibility.
    • Front-end web development and cross-browser compatibility
  • 2019.10 - 2023.06
    Research Assistant
    Key Laboratory of Non-Destructive Testing, Ministry of Education, China
    Diverse research projects spanning wearable sensors, signal processing, and medical imaging.
    • Wearable optical sweat analytics with microfluidics and colourimetry
    • ECG & EEG real-time signal detection systems
    • U-Net segmentation for dermatology imaging
    • Intelligent tunnel multi-parameter monitoring systems
    • Signal processing, computer vision, and embedded systems development

Education

  • 2024.01 - 2025.12

    Singapore

    Master of Science
    National University of Singapore (NUS)
    Precision Health and Medicine
    • AI & Machine Learning
    • Applied Statistics
    • High-Performance Computing
    • Human Genomics
    • Proteomics and Metabolomics
    • Precision Biomarker
    • Precision Diagnosis
  • 2019.09 - 2023.07

    Nanchang, China

    Bachelor of Engineering
    Nanchang Hangkong University (NCHU)
    Biomedical Engineering
    • Biomedical Digital Signal Processing
    • Medical Ultrasound
    • Medical Electronics
    • Foundation of Medical Software
    • Medical Imaging Technology
    • Principle of Medical Instrumentation Design

Awards

Publications

Skills

Machine Learning & Data Science
Python (pandas, scikit-learn, PyTorch, PyTorch Geometric)
R, MATLAB
Classical ML: SVM, XGBoost, Logistic Regression, Random Forest
Deep learning: CNN (U-Net), Large-scale Autoencoders (545M params), GNN (RGCN, bipartite GNN), VAE, Transformers
Knowledge graphs, drug-disease link prediction
Participant-wise 5-fold CV, stratified CV, Bayesian optimization
Imbalanced learning: SMOTE-Tomek, balanced class weights
Evaluation: ROC-AUC, Hits@K, MRR, F1, precision/recall, Spearman correlation
Model interpretability, feature importance, embedding analysis (t-SNE, UMAP)
GPU acceleration: NVIDIA CUDA, Apple Silicon MPS
Bioinformatics & Omics
Multi-omics integration: transcriptomics + metabolomics (132K genes × 7K metabolites)
Ultra-high-dimensional data: 6,619:1 feature-to-sample ratio handling
Large-scale autoencoders: dual-modality architecture (545M + 30.8M params)
Nonlinear dimensionality reduction & latent space analysis
Gene-metabolite association discovery (Spearman + FDR correction)
RNA-seq analysis (GEO cohorts, differential expression)
Cross-cohort validation for biomarker discovery
KEGG pathway enrichment & biological validation
Knowledge graphs (PrimeKG): 30K+ nodes, 849K+ edges
Drug repurposing & therapeutic indication prediction
Privacy-preserving collaborative research & data anonymization
Data Quality & Responsible AI
Data leakage detection & correction (PHQ-9 label contamination)
Ground truth validation & label quality assessment
Transparent limitation reporting (clinical readiness)
Negative evidence documentation (PCA, SMOTE failures)
Model bias detection & fairness evaluation
Reproducible validation protocols
Ethical AI practices & scientific integrity
Study protocol design & data collection SOPs
Feature Engineering & Wearable Pipelines
α-opic/SPD spectral conversion
L2/log normalization, hour-medoid aggregation
Cyclic time features (circadian analysis)
Behavioral features: GPS, app usage, communication patterns
Digital phenotyping from passive smartphone sensors
Config-driven pipelines with parameter sweeps
Audit-ready workflows with environment hashes
Multi-sensor fusion & signal synchronization
Reproducibility & Tooling
Git version control & collaborative workflows
Jupyter notebooks for literate programming
Docker containers for environment reproducibility
Config-driven experiments (YAML, JSON)
Seeded reproducible runs with deterministic pipelines
Performance optimization: embedding caching, vectorization
HPC & Nextflow workflows (extending to cloud)
AWS/Cloud computing (learning)
LLM & Multi-Agent Systems
LangGraph orchestration & state management
FastMCP 2.0 (Model Context Protocol)
Multi-agent architectures & agent debate
Hybrid local/cloud LLM: Qwen 3 (MLX) + Claude
Apple Silicon MLX optimization
Prompt engineering & chain-of-thought
Hallucination mitigation & citation traceability
RAG (Retrieval-Augmented Generation) pipelines
Vector search integration (semantic retrieval)
Biomedical Signal Processing & Embedded Systems
Wavelet/FFT filtering for ECG/EEG signals
Real-time signal processing (<50ms latency)
U-Net segmentation for medical images
STM32 microcontrollers & embedded firmware
Android development (Kotlin) for health apps
Sensor integration: ECG/EEG/SpO2/temperature acquisition
Anomaly detection & signal quality assessment
Data augmentation & preprocessing pipelines
IoT applications & wireless sensor networks
Microfluidics & Prototyping
Microfluidic device design (Tesla valve, Dean vortex)
RGB colorimetry for biomarker detection
Sweat collection & concentration gradient chips
PCB layout, validation & rapid prototyping
Optical sensing & fluorescence-based detection

Languages

English
Fluent
Chinese (Mandarin)
Native speaker

Interests

AI for Health (AI4Health)
Digital phenotyping
Precision medicine
Healthcare ML applications
Biomarker discovery
Wearable health monitoring
Behavioral pattern recognition
AI-Driven Drug Discovery (AIDD)
Graph neural networks
Drug-disease link prediction
Drug repurposing
Biomedical knowledge graphs
Molecular interactions
Computational drug design
Digital Phenotyping & Wearable Sensing
Continuous health monitoring
Circadian rhythm analysis
Wearable biosensors
Multi-sensor fusion
Real-time physiological signals
Light exposure classification
Bioinformatics & Omics Analysis
RNA-seq biomarker discovery
Pathway enrichment
Cross-cohort validation
NGS pipelines
Precision health genomics
Clinical relevance analysis
Biomedical Signal Processing
ECG/EEG analysis
Embedded systems
Sensor fusion
Real-time processing
Physiological monitoring
Time-series analysis
Wearable Non-Invasive Diagnosis & Therapeutics
Microfluidic devices
Point-of-care diagnostics
Non-invasive biosensors
Colorimetric detection
Real-time biomarker analysis
Wearable health tech

References

Prof Dr Manuel Spitschan
Current TUM-CREATE supervisor. Guided wearable spectral data analysis and circadian health research.
A/Prof Caroline Lee
NUS course supervisor. Supervised RNA-seq biomarker discovery for psychiatric applications.
Prof Zhang Weiwei
Undergraduate research supervisor. Led Intelligent Tunnel Monitoring and Wearable Cortisol Detection projects.
A/Prof Shi Huanhuan
Undergraduate research supervisor. Co-authored Talanta Q1 paper on sweat analytics; supervised National Innovation Project.
Prof Jiang Shaofeng
Undergraduate mentor. Observed academic growth over 4 years; endorsed for PhD in biomedical engineering.

Projects

  • 2024.08 - 2025.10
    AI-Driven Light Exposure Classification
    ML framework for circadian health phenotyping from wearable sensors. MSc Capstone project achieving 88.1% accuracy (AUC 0.938) in distinguishing natural from artificial light.
    • 88.1% accuracy (AUC 0.938) for natural vs. artificial light classification
    • 288-configuration grid search with participant-wise generalization
    • Spectral shape prioritization over absolute intensity
    • L2 normalization with hour-medoid aggregation
    • Reproducible pipeline with fixed seeds and environment hashes
    • Transparent negative evidence reporting (PCA, SMOTE-Tomek limitations)
  • 2024.08 - 2024.12
    Machine Learning Biomarkers for Suicide Risk Assessment in Depression
    Course-based research identifying psychiatric biomarkers (NR3C1a, HSPA1B) from RNA-seq with cross-cohort validation and imbalanced learning.
    • NR3C1a, HSPA1B as top biomarkers for suicide risk in depression
    • Cross-cohort validation (primary + 3 external cohorts)
    • Imbalanced learning with SMOTE-Tomek
    • KEGG pathway enrichment analysis
    • 4-class to binary mapping for clinical relevance
    • Reproducible ML workflow with audit-ready artifacts
  • 2024.11 - present
    Multimodal Depression Detection via Smartphone Sensing (ONGOING)
    Ongoing ML research for depression biomarker discovery from passive smartphone sensing. Core baseline completed; currently adding new features and models.
    • Core finding: Classical ML (Logistic Regression) outperformed deep learning (76.2% AUC-ROC)
    • Critical data quality work: Detected and corrected PHQ-9 label contamination
    • 7 models compared: Logistic Regression, Random Forest, XGBoost, VAE, GNN, Contrastive Learning, Transformer
    • 50 behavioral features across GPS, app usage, communication, physical activity
    • Top biomarker: app usage diversity variability (behavioral activation/anhedonia proxy)
    • Responsible AI: Transparent limitation reporting (NOT clinically ready)
  • 2025.10 - Present
    PrimeKG-RGCN Drug-Disease Link Prediction (Phase 1 Optimized)
    Graph neural networks for computational drug discovery. Phase 1 optimization achieved substantial gains through architectural improvements and computational efficiency enhancements.
    • Phase 1 Results: AUC-ROC 0.9985 (+2.98%), F1 0.9877 (+2.90%)
    • Ranking: MRR 0.2707 (+19.73%), mean rank 58.75 (88.10% reduction from 493.53)
    • Computational: 75× evaluation speedup (300s → 4s), Hits@10 61.0% (+11.89%)
    • Architecture optimizations: LayerNorm, skip connections, embedding caching, vectorized ranking
    • Phase 2 planning: Deeper RGCN layers, RotatE embeddings, attention mechanisms
    • RGCN on PrimeKG: 30,926 nodes (6,282 drugs, 5,593 diseases), 849,456 edges across 20 databases
  • 2026.01 - present
    LitScribe - Autonomous Academic Literature Synthesis Engine (ONGOING)
    Multi-agent system for automated literature review with hybrid local/cloud LLM architecture. MVP in development using LangGraph orchestration and FastMCP 2.0.
    • Multi-agent architecture: LangGraph orchestration + FastMCP 2.0 (Model Context Protocol)
    • Hybrid models: Qwen 3 (MLX/Apple Silicon local) + Claude Opus/Sonnet (cloud)
    • Multi-source search: arXiv, PubMed, Google Scholar, Zotero integration
    • Pipeline: Deduplication → PDF-to-Markdown → Vector search → Synthesis
    • Planned: Multi-agent debate, hallucination controls, citation traceability
    • Target: Responsible AI for literature synthesis with transparent limitations
  • 2025.12 - present
    Multi-Omics Data Integration Framework with Autoencoders and Graph Neural Networks (ONGOING)
    Open-source framework for multi-omics integration using large-scale dual autoencoders (545M params) and GNN. Handles ultra-high-dimensional data with extreme feature-to-sample ratios. Privacy-preserving collaborative research.
    • Ultra-high-dimensional integration: 132K genes × 7K metabolites (6,619:1 feature-to-sample ratio)
    • Large-scale dual autoencoders: Gene AE (545M params, 132K→64), Metabolite AE (30.8M params, 7K→64)
    • Bipartite GNN (79.9K params) for gene-metabolite association refinement
    • Extensive regularization: Dropout, early stopping, cross-validation for small-sample regime (n=21)
    • Latent space correlation: Spearman + FDR correction for association discovery
    • Privacy-preserving: Full biological data anonymization for collaborative research
    • Open-source contribution: GitHub repository with framework generalizability
    • Biological validation: Known enzyme family associations confirmed
  • 2020.09 - 2022.07
    Wearable Optical Sweat Analytics
    Microfluidics and RGB colorimetry for biomarker detection. National innovation project with Talanta publication and competition awards.
    • 3rd Prize (10th National Student Optoelectronic Design Competition, 2022)
    • Talanta Q1 journal publication (2022, co-author)
    • Tesla valve-based sweat collection optimization
    • RGB colorimetry signal processing
    • NCHU 'Three Small Projects' award
    • Funded by Outstanding Young Scientist Project (20192BCB23011)
  • 2022.09 - 2023.06
    Android Sleep Quality Monitoring System
    BEng thesis achieving 80% accurate sleep quality monitoring using phone sensors with volunteer validation. No external devices required.
    • 80% sleep quality classification accuracy
    • Kotlin-based Android implementation
    • 3-axis accelerometer, illumination, microphone sensor fusion
    • Volunteer cohort validation
    • Minimal resource consumption
    • Comparable performance to commercial wearables
  • 2022.02 - 2022.07
    Multi-Parameter Physiological Monitoring System
    Real-time monitoring of ECG, SpO2, respiration, temperature, and blood pressure with Qt/QML desktop and web interfaces.
    • 5-parameter monitoring (ECG, SpO2, respiration, temperature, BP)
    • STM32 hardware acquisition
    • Real-time signal conditioning and filtering
    • Qt/QML + HTML/JavaScript dashboards
    • Sub-50ms latency
    • Cross-platform deployment
  • 2021.02 - 2021.06
    U-Net for Skin Lesion Segmentation
    Complete implementation with LabelMe annotation, model training, validation, and PyQt5 GUI for dermatological image analysis.
    • U-Net architecture implementation
    • LabelMe manual annotation workflow
    • Class-specific IoU breakdown
    • Data augmentation for limited datasets
    • PyQt5 GUI for inference
    • Medical imaging preprocessing pipeline
  • 2019.10 - 2020.09
    Fluorescence-Based Fire Safety Monitoring System
    Smart monitoring system using fluorescence sensing for temperature, humidity, and fire detection in buildings and tunnels.
    • Silver Award (12th 'Challenge Cup' Jiangxi Student Entrepreneurship, 2020)
    • 2nd Prize (8th National Student Optoelectronic Design Competition, 2020)
    • 6 patents filed and published
    • Distributed optical fiber network integration
    • Multi-parameter sensing (temperature, humidity, trend analysis)
    • Electromagnetic interference immunity
  • 2019.12 - 2021.12
    Microfluidic Concentration Gradient Chip for Drug Susceptibility
    Dean vortex secondary flow mixing for high-throughput biochemical applications. 2 patents and journal publication.
    • 1 invention patent + 1 utility model patent
    • Published in Chinese Journal of Medical Physics (2021)
    • Dean vortex secondary flow optimization
    • Gradient-based drug susceptibility testing
    • High-throughput biochemical analysis
    • Funded by PhD Research Startup Fund (EA202008205)
  • 2021.02 - 2021.06
    Thermal Cycling Device for Fluorescence Quantitative PCR
    Research project on thermal cycling system design for qPCR instrumentation with precision temperature control.
    • Successful Participant (2021 National Undergraduate BME Innovation Design Competition)
    • Thermal control optimization
    • PCR instrumentation design
    • Precision temperature management
    • System integration and testing
  • 2022.09 - 2022.12
    Quantitative Investment Strategies in Cryptocurrency Markets
    Research on profitability of different investment strategies in cryptocurrency trading with backtesting and risk analysis.
    • 4 quantitative strategies implemented and compared
    • Backtesting framework development
    • Risk-return optimization analysis
    • Market dynamics understanding
    • Strategy adaptation importance demonstrated
    • Portfolio performance evaluation