AI-Driven Light Exposure Classification
ML framework for circadian health phenotyping from wearable sensors
Overview
An MSc Capstone project developing a reproducible deep learning pipeline for classifying biologically relevant light exposure contexts from wearable spectral data. Current circadian health research relies on simplistic intensity thresholds that fail to capture meaningful differences in exposure contexts (natural vs. artificial light; indoor vs. outdoor). This work introduces a transparent, participant-wise generalized framework that prioritizes spectral shape over absolute intensity, enabling robust classification of light exposure contexts as potential digital biomarkers for circadian health monitoring. The system achieved 88.1% accuracy (AUC 0.938) in distinguishing natural from artificial light using ActLumus wearable spectroradiometer data from 26 participants.
Problem Statement
Research Gaps: Threshold-based light exposure methods fail systematically—cloudy daylight (~200 lux) gets classified as “dim” despite broad-spectrum natural light, while bright indoor LED (~500 lux) gets misclassified as “outdoor” based on intensity alone. Iso-illuminant conditions (same lux, different spectra) produce divergent biological effects that current methods cannot distinguish.
Three Critical Challenges:
- Missing contextual differentiation: Most studies emphasize intensity, neglecting spectral composition and temporal dynamics crucial for circadian regulation
- Data processing complexity: Wearable devices produce noisy, imbalanced time-series with non-wear periods requiring robust preprocessing
- Lack of standardized workflows: Open-source tools differ in implementation, yielding inconsistent results across studies
Why This Matters: Light regulates circadian physiology via intrinsically photosensitive retinal ganglion cells (ipRGCs) affecting melatonin secretion, hormone rhythms, metabolic processes, and cognitive performance. Natural daylight provides broad spectral composition (~400-800nm) with dynamic diurnal variation, while artificial sources exhibit narrower, temporally inconsistent spectra—driving distinct circadian responses even at identical lux levels.
Methodology
Dataset: cyepi study (Tübingen, Germany, Aug-Nov 2023) with 26 participants wearing ActLumus spectroradiometers for 7 days, providing 10-second resolution across 10 spectral channels (F1-F8, IR, CLEAR). Ground truth from daily mHLEA questionnaires describing exposure context. Public dataset from Guidolin et al. (2024)—no new data collected.
Pipeline Design Principles:
- Participant-wise generalization: All train/validation/test splits preserve subject boundaries to prevent data leakage
- Physical interpretability: Feature engineering grounded in circadian photobiology
- Reproducibility: Configuration-driven with fixed random seeds and environment hashes
Processing Sequence:
- Domain Selection: Raw sensor (10 channels) / α-opic projections (5 photoreceptor-weighted) / SPD reconstructions (12 bands)
- Logarithmic Transformation: log₁₀(max(x, 10⁻⁴)) to stabilize dynamic range
- L2 Normalization: Per-channel normalization prioritizing spectral shape over absolute intensity
- Hour-Medoid Aggregation: Robust temporal summarization resistant to outliers and non-wear periods
- Cyclic Hour Encoding: Sin/cos features to capture circadian priors without midnight discontinuities
- Model Training: MLP with Bayesian hyperparameter optimization (288 configurations tested)
Key Innovation: L2 normalization with absolute intensity excluded prioritizes spectral shape, outperforming magnitude-based approaches. Raw sensor domain preserves broadband/NIR information, matching or exceeding α-opic projections.
Results
Primary Task (Natural vs. Artificial Light):
- Test Performance: AUC 0.938, Accuracy 88.1%, Precision 91.4%, Recall 88.7%
- Generalization: All Top-20 CV-selected configurations achieved test AUC > 0.93 across held-out participants
- 288-Configuration Grid Search: Systematic evaluation showing robustness to preprocessing variations
Feature Importance:
- Top predictors: IR.LIGHT, CLEAR (broadband), cyclic hour features
- Spectral patterns: Mid-bands (550-630nm) differentiate artificial sources; short/long wavelengths elevate for natural light
- Temporal priors: Time-of-day does not act alone; spectral channels provide complementary discriminative power
Exploratory Task (Indoor vs. Outdoor): Test AUC ~0.663, demonstrating that spectral data alone provide insufficient evidence without contextual sensors (GPS, accelerometry). Indoor daylight (window-filtered) mimics outdoor spectra, causing classification challenges.
Negative Evidence (transparent reporting): PCA aids interpretation but not required for optimal performance; SMOTE-Tomek underperforms under real-world priors compared to participant-wise evaluation without aggressive resampling.
Applications
Digital Biomarker Development: Framework enables derivation of natural light dose profiles, light regularity indices (phase stability/variability), and exposure timing metrics as potential correlates for sleep quality, mood, and metabolic health outcomes.
Clinical & mHealth: Real-time exposure classification enables personalized light intervention adherence monitoring, dose quantification for circadian therapies, and integration into LightSPAN-style mobile health applications.
Evidence-Based Policy: Informs urban planning (outdoor space design), workplace lighting standards, and educational environment optimization based on quantified natural vs. artificial light exposure patterns.
Sensor Optimization: Identifies critical spectral channels (broadband, NIR) for cost-effective wearable device design, reducing hardware complexity while maintaining classification performance.
Achievements & Recognition
Academic Contribution: First systematic application of deep learning to ActLumus wearable spectra for contextual light exposure classification, challenging intensity-centric assumptions in circadian research.
Methodological Innovation:
- Reproducible pipeline with transparent preprocessing and participant-wise generalization preventing data leakage
- Systematic comparison of raw sensor vs. α-opic vs. SPD domains, L2 normalization variants, and dimensionality reduction approaches
- Negative evidence reporting (PCA, SMOTE-Tomek limitations) to guide future research
Publications:
- MSc Thesis (expected submission Nov 2025): Zhou, Y. “AI-Driven Light Exposure Classification: A Reproducible Deep Learning Pipeline for Circadian Health Phenotyping from Wearable Spectral Data.” MSc Precision Health and Medicine, National University of Singapore, 2025
- Keywords: Circadian Physiology, Light Exposure Classification, Wearable Sensors, Deep Learning, Digital Biomarkers, Reproducible Research
Key Metrics:
- 88.1% classification accuracy with 0.938 AUC
- 26 participants, 7-day recordings per subject
- 288 hyperparameter configurations evaluated
- 10-second temporal resolution spectral data
Team & Collaboration
Principal Supervisor: Prof. Dr. Manuel Spitschan (Technical University of Munich, Head of LightSPAN Group, TUM-CREATE Principal Investigator)
Institution: TUM-CREATE (National Research Foundation Singapore, collaborating with Technical University of Munich, National University of Singapore, Nanyang Technological University)
Data Source: Guidolin et al. (2024) cyepi study public dataset (protocol approved by local ethics committee, data de-identified prior to release)
Degree Program: MSc Precision Health and Medicine, National University of Singapore
Timeline
Duration: August 2024 - October 2025 (14 months)
Thesis Defense: October 2025