This project explores the use of electrohysterogram (EHG) signal analysis combined with machine learning to predict and interpret different types of preterm birth: induced, cesarean, or spontaneous. It aims to support clinical decision-making by improving signal interpretation using feature engineering and model evaluation.
- Project Aim
- Objectives
- Background
- Methodology
- Tools & Libraries
- Data Recording Protocol
- Results
- Future Work
- Data Source
To improve the prediction and interpretability of preterm physiological EHG signals for classifying birth types using machine learning.
- Analyze preterm EHG signals from PhysioNet to identify underlying patterns.
- Extract and engineer features relevant to birth type classification.
- Train and evaluate ML models for predictive and interpretable outputs.
Preterm birth is defined as delivery before 37 weeks of gestation is a major global health challenge and the leading cause of under-five child mortality.
Malawi ranks among the highest globally in preterm birth rates, exacerbating neonatal deaths. Recent studies from the University Medical Center Ljubljana underscore the value of surface Electrohysterograph (EHG) signals as a non-invasive technique for early prediction.
This project utilizes the Ljubljana dataset to extract diagnostic features from abdominal EHG signals, addressing the clinical interpretability gap through signal decomposition, feature engineering, and predictive modeling.
-
Signal Preprocessing
- Wavelet decomposition for noise reduction and pattern isolation.
-
Frequency Analysis
- Fast Fourier Transform (FFT) for dominant component identification.
- Welch’s method for Power Spectral Density estimation.
-
Feature Extraction
- Time and frequency domain features using wavelet scattering transform.
-
Feature Engineering
- Normalization and feature selection.
-
Machine Learning
- Trained initial model using Random Forest classifier.
- Evaluation through confusion matrices and performance metrics.
- Python
- Jupyter Notebook
- WFDB: Reading physiological signals
- Wavelet Scattering: Feature transformation
- Welch Method: Spectral density
- FFT: Frequency analysis
Data was collected using four Ag₂Cl electrodes placed symmetrically above and below the navel (7 cm apart):
- Signal Length: ~30 minutes
- Channels: 3 bipolar signals
- Sampling Rate: 20 Hz
- Filter: Analog low-pass Butterworth, cutoff = 5 Hz
- Resolution: 16-bit, ±2.5 mV range (1.0 mV = 13,107 A/D units)
Model Tested: Random Forest Classifier
- Input: Extracted features from decomposed EHG signals
- Output: Birth type classification
- Apply PCA (Principal Component Analysis) to enhance feature selection and reduce noise.
- Experiment with additional models including logistic regression, SVMs, and ensemble techniques.
- Optimize hyperparameters and test generalization on new test sets.
Jager, F. (2023)
Induced Cesarean EHG DataSet (ICEHG DS): An open dataset with electrohysterogram records of pregnancies ending in induced and cesarean section delivery.
📥 PhysioNet DOI
- Amoss Robert
- Master of Data Science Candidate | Researcher