Case Study — Healthcare ML
Malaria Prediction
Using Satellite Data & ML
A machine learning system that predicts malaria outbreak risk by combining satellite-derived environmental indices — NDVI, rainfall, temperature, nightlight intensity, and proximity to water bodies — with epidemiological data to produce high-accuracy outbreak forecasts.
ML Pipeline
Overview
Predicting Outbreaks Before They Happen
Malaria remains one of the leading causes of mortality in sub-Saharan Africa and South Asia, yet outbreak prediction is still largely reactive. By the time case counts rise, the window for preventive intervention has already closed. This project set out to build a proactive system — one that uses satellite-derived environmental signals to identify high-risk conditions weeks before outbreaks materialise.
The Problem
Reactive, Not Predictive
- Disease surveillance systems report cases after infection — too late for targeted prevention
- Manual field surveys are expensive and have poor geographic coverage
- Environmental risk factors (standing water, humidity, temperature) are predictable but underused
The Solution
Satellite-Driven Risk Scoring
- Pull NDVI, LST, rainfall, nightlight, and water proximity from Google Earth Engine
- Engineer temporal and spatial lag features to capture transmission dynamics
- Train and compare multiple ML models; export risk scores as QGIS-ready raster layers
Technical Highlights
How the System Works
Remote Sensing Inputs
Five satellite-derived indices pulled via Google Earth Engine API — NDVI (vegetation), LST (land surface temperature), CHIRPS rainfall, DMSP nightlight, and JRC water occurrence layer.
Feature Engineering
Lag features (1–4 week lookbacks), rolling averages, and spatial aggregations at district level. Normalization and missing-value imputation applied to handle cloud-cover gaps in satellite imagery.
Model Comparison
Three architectures evaluated: Random Forest, Gradient Boosting (XGBoost), and a shallow LSTM for temporal sequences. Final model selected on AUC-ROC against held-out test regions.
Geospatial Outputs
Risk scores exported as GeoTIFF raster layers compatible with QGIS. District-level choropleth maps generated for at-a-glance risk visualization by public health teams.
Stack
Technologies & Libraries Used
Built By
Development Team
Mohsin Sabir
DeveloperAmber Razzaq
DeveloperFatima Abbas
DeveloperNeed a predictive model for your domain?
We build ML pipelines that turn raw environmental or operational data into actionable forecasts — from data collection through to deployment.


