cv
Dive into the career highlights of Yanuo Zhou, a passionate researcher and engineer driving innovation in health and technology.
Basics
| Name | ZHOU, YANUO (Arnold) |
| Label | Research Scientist & ML Engineer |
| yanuo.zhou@outlook.com | |
| Url | https://arnold117.github.io/ |
| Summary | MSc in Precision Health and Medicine (NUS) with expertise in machine learning, biomarker discovery, wearable signal processing, and reproducible ML pipelines. Focused on building audit-ready workflows for biomedical applications. |
Work
-
2024.08 - 2025.10 Research Assistant
TUM-CREATE (NRF Singapore research centre, collaborating with TUM, NUS and NTU)
Developing reproducible ML pipelines for wearable spectral data analysis with focus on circadian health digital phenotyping.
- Multi-sensor fusion and feature engineering for wearable spectral data
- Circadian health digital phenotyping and biomarker discovery
- Reproducible ML pipelines with audit-ready workflows
- Supervisor: Prof. Dr. Manuel Spitschan (Technical University of Munich)
-
2023.02 - 2023.03 China
Front-End Engineer (Intern)
SEECEN TECHNOLOGY CO., LTD.
Front-end web development with focus on cross-browser compatibility.
- Front-end web development and cross-browser compatibility
-
2019.10 - 2023.06 Research Assistant
Key Laboratory of Non-Destructive Testing, Ministry of Education, China
Diverse research projects spanning wearable sensors, signal processing, and medical imaging.
- Wearable optical sweat analytics with microfluidics and colourimetry
- ECG & EEG real-time signal detection systems
- U-Net segmentation for dermatology imaging
- Intelligent tunnel multi-parameter monitoring systems
- Signal processing, computer vision, and embedded systems development
Education
-
2024.01 - 2025.12 Singapore
Master of Science
National University of Singapore (NUS)
Precision Health and Medicine
- AI & Machine Learning
- Applied Statistics
- High-Performance Computing
- Human Genomics
- Proteomics and Metabolomics
- Precision Biomarker
- Precision Diagnosis
-
2019.09 - 2023.07 Nanchang, China
Bachelor of Engineering
Nanchang Hangkong University (NCHU)
Biomedical Engineering
- Biomedical Digital Signal Processing
- Medical Ultrasound
- Medical Electronics
- Foundation of Medical Software
- Medical Imaging Technology
- Principle of Medical Instrumentation Design
Awards
- 2023.10.01
Precision Health & Medicine Scholarship (PRECISE, Singapore)
National University of Singapore
Competitive scholarship for MSc program in Precision Health and Medicine
- 2022.08.01
Third Prize, 10th National Student Optoelectronic Design Competition
China National Student Optoelectronic Design Competition
Awarded for Wearable Optical Sweat Analytics project
- 2020.08.01
Second Prize, 8th National Student Optoelectronic Design Competition
China National Student Optoelectronic Design Competition
Awarded for Intelligent Tunnel Multi-Parameter Monitoring System project
- 2020.09.01
Silver Award, 12th "Challenge Cup" Jiangxi Student Entrepreneurship Competition
Jiangxi Provincial Education Department
Awarded for Tunnel Monitoring project
Publications
-
2022.01.01 Wearable tesla valve-based sweat collection device for sweat colorimetric analysis
Talanta
Huanhuan Shi, Yu Cao, Yining Zeng, Yanuo Zhou, et al. Peer-reviewed, high-impact journal publication on wearable sweat analytics technology.
-
2021.01.01 Development and Application of Microfluidic Concentration Gradient Chip
Chinese Journal of Medical Physics
Shi, H., Cao, Y., Zhou, Y., et al. Publication on microfluidic device development for drug susceptibility analysis.
Skills
| Machine Learning & Data Science | |
| Python (pandas, scikit-learn, PyTorch, PyTorch Geometric) | |
| R, MATLAB | |
| Classical ML: SVM, XGBoost, Logistic Regression, Random Forest | |
| Deep learning: CNN (U-Net), Large-scale Autoencoders (545M params), GNN (RGCN, bipartite GNN), VAE, Transformers | |
| Knowledge graphs, drug-disease link prediction | |
| Participant-wise 5-fold CV, stratified CV, Bayesian optimization | |
| Imbalanced learning: SMOTE-Tomek, balanced class weights | |
| Evaluation: ROC-AUC, Hits@K, MRR, F1, precision/recall, Spearman correlation | |
| Model interpretability, feature importance, embedding analysis (t-SNE, UMAP) | |
| GPU acceleration: NVIDIA CUDA, Apple Silicon MPS |
| Bioinformatics & Omics | |
| Multi-omics integration: transcriptomics + metabolomics (132K genes × 7K metabolites) | |
| Ultra-high-dimensional data: 6,619:1 feature-to-sample ratio handling | |
| Large-scale autoencoders: dual-modality architecture (545M + 30.8M params) | |
| Nonlinear dimensionality reduction & latent space analysis | |
| Gene-metabolite association discovery (Spearman + FDR correction) | |
| RNA-seq analysis (GEO cohorts, differential expression) | |
| Cross-cohort validation for biomarker discovery | |
| KEGG pathway enrichment & biological validation | |
| Knowledge graphs (PrimeKG): 30K+ nodes, 849K+ edges | |
| Drug repurposing & therapeutic indication prediction | |
| Privacy-preserving collaborative research & data anonymization |
| Data Quality & Responsible AI | |
| Data leakage detection & correction (PHQ-9 label contamination) | |
| Ground truth validation & label quality assessment | |
| Transparent limitation reporting (clinical readiness) | |
| Negative evidence documentation (PCA, SMOTE failures) | |
| Model bias detection & fairness evaluation | |
| Reproducible validation protocols | |
| Ethical AI practices & scientific integrity | |
| Study protocol design & data collection SOPs |
| Feature Engineering & Wearable Pipelines | |
| α-opic/SPD spectral conversion | |
| L2/log normalization, hour-medoid aggregation | |
| Cyclic time features (circadian analysis) | |
| Behavioral features: GPS, app usage, communication patterns | |
| Digital phenotyping from passive smartphone sensors | |
| Config-driven pipelines with parameter sweeps | |
| Audit-ready workflows with environment hashes | |
| Multi-sensor fusion & signal synchronization |
| Reproducibility & Tooling | |
| Git version control & collaborative workflows | |
| Jupyter notebooks for literate programming | |
| Docker containers for environment reproducibility | |
| Config-driven experiments (YAML, JSON) | |
| Seeded reproducible runs with deterministic pipelines | |
| Performance optimization: embedding caching, vectorization | |
| HPC & Nextflow workflows (extending to cloud) | |
| AWS/Cloud computing (learning) |
| LLM & Multi-Agent Systems | |
| LangGraph orchestration & state management | |
| FastMCP 2.0 (Model Context Protocol) | |
| Multi-agent architectures & agent debate | |
| Hybrid local/cloud LLM: Qwen 3 (MLX) + Claude | |
| Apple Silicon MLX optimization | |
| Prompt engineering & chain-of-thought | |
| Hallucination mitigation & citation traceability | |
| RAG (Retrieval-Augmented Generation) pipelines | |
| Vector search integration (semantic retrieval) |
| Biomedical Signal Processing & Embedded Systems | |
| Wavelet/FFT filtering for ECG/EEG signals | |
| Real-time signal processing (<50ms latency) | |
| U-Net segmentation for medical images | |
| STM32 microcontrollers & embedded firmware | |
| Android development (Kotlin) for health apps | |
| Sensor integration: ECG/EEG/SpO2/temperature acquisition | |
| Anomaly detection & signal quality assessment | |
| Data augmentation & preprocessing pipelines | |
| IoT applications & wireless sensor networks |
| Microfluidics & Prototyping | |
| Microfluidic device design (Tesla valve, Dean vortex) | |
| RGB colorimetry for biomarker detection | |
| Sweat collection & concentration gradient chips | |
| PCB layout, validation & rapid prototyping | |
| Optical sensing & fluorescence-based detection |
Languages
| English | |
| Fluent |
| Chinese (Mandarin) | |
| Native speaker |
Interests
| AI for Health (AI4Health) | |
| Digital phenotyping | |
| Precision medicine | |
| Healthcare ML applications | |
| Biomarker discovery | |
| Wearable health monitoring | |
| Behavioral pattern recognition |
| AI-Driven Drug Discovery (AIDD) | |
| Graph neural networks | |
| Drug-disease link prediction | |
| Drug repurposing | |
| Biomedical knowledge graphs | |
| Molecular interactions | |
| Computational drug design |
| Digital Phenotyping & Wearable Sensing | |
| Continuous health monitoring | |
| Circadian rhythm analysis | |
| Wearable biosensors | |
| Multi-sensor fusion | |
| Real-time physiological signals | |
| Light exposure classification |
| Bioinformatics & Omics Analysis | |
| RNA-seq biomarker discovery | |
| Pathway enrichment | |
| Cross-cohort validation | |
| NGS pipelines | |
| Precision health genomics | |
| Clinical relevance analysis |
| Biomedical Signal Processing | |
| ECG/EEG analysis | |
| Embedded systems | |
| Sensor fusion | |
| Real-time processing | |
| Physiological monitoring | |
| Time-series analysis |
| Wearable Non-Invasive Diagnosis & Therapeutics | |
| Microfluidic devices | |
| Point-of-care diagnostics | |
| Non-invasive biosensors | |
| Colorimetric detection | |
| Real-time biomarker analysis | |
| Wearable health tech |
References
| Prof Dr Manuel Spitschan | |
| Current TUM-CREATE supervisor. Guided wearable spectral data analysis and circadian health research. |
| A/Prof Caroline Lee | |
| NUS course supervisor. Supervised RNA-seq biomarker discovery for psychiatric applications. |
| Prof Zhang Weiwei | |
| Undergraduate research supervisor. Led Intelligent Tunnel Monitoring and Wearable Cortisol Detection projects. |
| A/Prof Shi Huanhuan | |
| Undergraduate research supervisor. Co-authored Talanta Q1 paper on sweat analytics; supervised National Innovation Project. |
| Prof Jiang Shaofeng | |
| Undergraduate mentor. Observed academic growth over 4 years; endorsed for PhD in biomedical engineering. |
Projects
- 2024.08 - 2025.10
AI-Driven Light Exposure Classification
ML framework for circadian health phenotyping from wearable sensors. MSc Capstone project achieving 88.1% accuracy (AUC 0.938) in distinguishing natural from artificial light.
- 88.1% accuracy (AUC 0.938) for natural vs. artificial light classification
- 288-configuration grid search with participant-wise generalization
- Spectral shape prioritization over absolute intensity
- L2 normalization with hour-medoid aggregation
- Reproducible pipeline with fixed seeds and environment hashes
- Transparent negative evidence reporting (PCA, SMOTE-Tomek limitations)
- 2024.08 - 2024.12
Machine Learning Biomarkers for Suicide Risk Assessment in Depression
Course-based research identifying psychiatric biomarkers (NR3C1a, HSPA1B) from RNA-seq with cross-cohort validation and imbalanced learning.
- NR3C1a, HSPA1B as top biomarkers for suicide risk in depression
- Cross-cohort validation (primary + 3 external cohorts)
- Imbalanced learning with SMOTE-Tomek
- KEGG pathway enrichment analysis
- 4-class to binary mapping for clinical relevance
- Reproducible ML workflow with audit-ready artifacts
- 2024.11 - present
Multimodal Depression Detection via Smartphone Sensing (ONGOING)
Ongoing ML research for depression biomarker discovery from passive smartphone sensing. Core baseline completed; currently adding new features and models.
- Core finding: Classical ML (Logistic Regression) outperformed deep learning (76.2% AUC-ROC)
- Critical data quality work: Detected and corrected PHQ-9 label contamination
- 7 models compared: Logistic Regression, Random Forest, XGBoost, VAE, GNN, Contrastive Learning, Transformer
- 50 behavioral features across GPS, app usage, communication, physical activity
- Top biomarker: app usage diversity variability (behavioral activation/anhedonia proxy)
- Responsible AI: Transparent limitation reporting (NOT clinically ready)
- 2025.10 - Present
PrimeKG-RGCN Drug-Disease Link Prediction (Phase 1 Optimized)
Graph neural networks for computational drug discovery. Phase 1 optimization achieved substantial gains through architectural improvements and computational efficiency enhancements.
- Phase 1 Results: AUC-ROC 0.9985 (+2.98%), F1 0.9877 (+2.90%)
- Ranking: MRR 0.2707 (+19.73%), mean rank 58.75 (88.10% reduction from 493.53)
- Computational: 75× evaluation speedup (300s → 4s), Hits@10 61.0% (+11.89%)
- Architecture optimizations: LayerNorm, skip connections, embedding caching, vectorized ranking
- Phase 2 planning: Deeper RGCN layers, RotatE embeddings, attention mechanisms
- RGCN on PrimeKG: 30,926 nodes (6,282 drugs, 5,593 diseases), 849,456 edges across 20 databases
- 2026.01 - present
LitScribe - Autonomous Academic Literature Synthesis Engine (ONGOING)
Multi-agent system for automated literature review with hybrid local/cloud LLM architecture. MVP in development using LangGraph orchestration and FastMCP 2.0.
- Multi-agent architecture: LangGraph orchestration + FastMCP 2.0 (Model Context Protocol)
- Hybrid models: Qwen 3 (MLX/Apple Silicon local) + Claude Opus/Sonnet (cloud)
- Multi-source search: arXiv, PubMed, Google Scholar, Zotero integration
- Pipeline: Deduplication → PDF-to-Markdown → Vector search → Synthesis
- Planned: Multi-agent debate, hallucination controls, citation traceability
- Target: Responsible AI for literature synthesis with transparent limitations
- 2025.12 - present
Multi-Omics Data Integration Framework with Autoencoders and Graph Neural Networks (ONGOING)
Open-source framework for multi-omics integration using large-scale dual autoencoders (545M params) and GNN. Handles ultra-high-dimensional data with extreme feature-to-sample ratios. Privacy-preserving collaborative research.
- Ultra-high-dimensional integration: 132K genes × 7K metabolites (6,619:1 feature-to-sample ratio)
- Large-scale dual autoencoders: Gene AE (545M params, 132K→64), Metabolite AE (30.8M params, 7K→64)
- Bipartite GNN (79.9K params) for gene-metabolite association refinement
- Extensive regularization: Dropout, early stopping, cross-validation for small-sample regime (n=21)
- Latent space correlation: Spearman + FDR correction for association discovery
- Privacy-preserving: Full biological data anonymization for collaborative research
- Open-source contribution: GitHub repository with framework generalizability
- Biological validation: Known enzyme family associations confirmed
- 2020.09 - 2022.07
Wearable Optical Sweat Analytics
Microfluidics and RGB colorimetry for biomarker detection. National innovation project with Talanta publication and competition awards.
- 3rd Prize (10th National Student Optoelectronic Design Competition, 2022)
- Talanta Q1 journal publication (2022, co-author)
- Tesla valve-based sweat collection optimization
- RGB colorimetry signal processing
- NCHU 'Three Small Projects' award
- Funded by Outstanding Young Scientist Project (20192BCB23011)
- 2022.09 - 2023.06
Android Sleep Quality Monitoring System
BEng thesis achieving 80% accurate sleep quality monitoring using phone sensors with volunteer validation. No external devices required.
- 80% sleep quality classification accuracy
- Kotlin-based Android implementation
- 3-axis accelerometer, illumination, microphone sensor fusion
- Volunteer cohort validation
- Minimal resource consumption
- Comparable performance to commercial wearables
- 2022.02 - 2022.07
Multi-Parameter Physiological Monitoring System
Real-time monitoring of ECG, SpO2, respiration, temperature, and blood pressure with Qt/QML desktop and web interfaces.
- 5-parameter monitoring (ECG, SpO2, respiration, temperature, BP)
- STM32 hardware acquisition
- Real-time signal conditioning and filtering
- Qt/QML + HTML/JavaScript dashboards
- Sub-50ms latency
- Cross-platform deployment
- 2021.02 - 2021.06
U-Net for Skin Lesion Segmentation
Complete implementation with LabelMe annotation, model training, validation, and PyQt5 GUI for dermatological image analysis.
- U-Net architecture implementation
- LabelMe manual annotation workflow
- Class-specific IoU breakdown
- Data augmentation for limited datasets
- PyQt5 GUI for inference
- Medical imaging preprocessing pipeline
- 2019.10 - 2020.09
Fluorescence-Based Fire Safety Monitoring System
Smart monitoring system using fluorescence sensing for temperature, humidity, and fire detection in buildings and tunnels.
- Silver Award (12th 'Challenge Cup' Jiangxi Student Entrepreneurship, 2020)
- 2nd Prize (8th National Student Optoelectronic Design Competition, 2020)
- 6 patents filed and published
- Distributed optical fiber network integration
- Multi-parameter sensing (temperature, humidity, trend analysis)
- Electromagnetic interference immunity
- 2019.12 - 2021.12
Microfluidic Concentration Gradient Chip for Drug Susceptibility
Dean vortex secondary flow mixing for high-throughput biochemical applications. 2 patents and journal publication.
- 1 invention patent + 1 utility model patent
- Published in Chinese Journal of Medical Physics (2021)
- Dean vortex secondary flow optimization
- Gradient-based drug susceptibility testing
- High-throughput biochemical analysis
- Funded by PhD Research Startup Fund (EA202008205)
- 2021.02 - 2021.06
Thermal Cycling Device for Fluorescence Quantitative PCR
Research project on thermal cycling system design for qPCR instrumentation with precision temperature control.
- Successful Participant (2021 National Undergraduate BME Innovation Design Competition)
- Thermal control optimization
- PCR instrumentation design
- Precision temperature management
- System integration and testing
- 2022.09 - 2022.12
Quantitative Investment Strategies in Cryptocurrency Markets
Research on profitability of different investment strategies in cryptocurrency trading with backtesting and risk analysis.
- 4 quantitative strategies implemented and compared
- Backtesting framework development
- Risk-return optimization analysis
- Market dynamics understanding
- Strategy adaptation importance demonstrated
- Portfolio performance evaluation