A testament to the power of African-led scientific collaboration
- Apr 22
- 8 min read
Updated: 2 days ago

This report presents a comprehensive account of a six-week intensive research residency conducted under the FAR-LeaF Fellowship at the University of Pretoria. The primary objective of the residency was to design, implement, and validate a scalable machine learning framework for adaptive malaria forecasting in the Upper West Region of Ghana, a setting characterised by climatic variability and complex epidemiological dynamics. The research moved beyond conventional predictive modelling by introducing a competitive “Model Tournament” framework in which multiple algorithms, including ensemble tree-based methods and deep learning architectures, were systematically evaluated under rigorous time-series validation regimes. This ensured that model selection was empirically grounded in district-specific performance rather than generalised assumptions. A central methodological contribution of this work is the introduction of a Case Velocity Detrending Approach, in which malaria incidence data were transformed into first-order differences. This reframing shifted the modelling objective from predicting absolute case counts to predicting the rate of change in cases, thereby addressing non-stationarity and long-term structural declines in transmission. The transformation significantly improved predictive performance and stability across districts. In parallel, the integration of SHAP (SHapley Additive exPlanations) into the modelling pipeline enabled transparent interpretation of model outputs, allowing for the identification of localised environmental drivers of malaria transmission. Through collaboration with initiatives such as Lancet Countdown Africa, the research was further positioned within a broader continental context of climate-health resilience. The residency ultimately produced a robust, interpretable Early Warning System framework ready for operational deployment in Ghana.
1. INTRODUCTION
The increasing volatility of climate patterns has direct implications for the transmission dynamics of infectious diseases, necessitating a shift toward localised, data-driven interventions. While raw climate and health data are becoming more accessible, a significant gap remains in translating this information into useful predictive intelligence. To address this, a six-week research residency was undertaken from March 2026 at the Data Science for Social Impact (DSFSI) Research Group, led by Prof. Vukosi Marivate.
This visit was a strategic component of the FAR-LeaF II Fellowship, designed to enhance the technical and leadership dimensions of the project: “Harnessing Community Resilience and Machine Learning for Adaptive Malaria Control amid Climate Change in the Upper West Region of Ghana.”
1.1 Rationale and Research Motivation
The primary motivation for this residency was to leverage Prof Marivate's international expertise in developing Machine Learning (ML) and Artificial Intelligence (AI) applications tailored to African societal challenges. The success of a malaria early warning system (EWS) depends heavily on the development, validation, and deployment of models within low-data environments, a core speciality of the DSFSI lab.
By embedding within this multidisciplinary team of African data scientists, the residency focused on:
Technical Rigour: Building ensemble ML models (such as Random Forest and XGBoost) capable of predicting malaria outbreaks using integrated climate and epidemiological datasets.
Ethical AI: Adopting best practices in responsible AI to ensure that model outputs are fair, transparent, and community-sensitive.
Explainability: Moving beyond “black-box” models by utilising interpretability tools to identify the leading indicators of outbreaks, thereby making the findings useful for public health officials.
1.2 Objectives and Core Purpose
The residency was structured to move the project from conceptualisation to a functional technical foundation. The core purposes were:
Technical Advancement: To refine ML pipelines specifically optimised for the unique constraints of the African data landscape.
Model Interpretability: To employ eXplainable AI (XAI) tools, ensuring model outputs are transparent and trustworthy for non-technical health practitioners.
Research Leadership: To cultivate the competencies required to lead transdisciplinary teams across the “One Health”, climate science, and data science sectors.
Strategic Alignment: To fulfil the FAR-LeaF vision of nurturing African-led science that drives sustainable development and social transformation.
1.3 Summary of Activity Plan and Realised Outcomes
The six-week period followed a systematic progression from data preprocessing to model refinement and strategic planning.
Phase | Focus Area | Key Activities |
Phase 1 | Data Engineering | Defined modelling objectives and preprocessed climate-malaria datasets for the Upper West Region. |
Phase 2 | Model Development | Developed baseline models and conducted feature engineering tuning. |
Phase 3 | Evaluation & XAI | Validated models against historical data and applied interpretability tools for documentation. |
Phase 4 | Leadership & Transfer | Finalised implementation strategies for Ghana's health systems and outlined future joint publications. |
Table 1: Details of Phase, Focus Areas and Key Activities
This report details the technical milestones achieved, the refined predictive models developed, and the leadership insights gained during the residency, which now serve as the technical bedrock for the malaria Early Warning System (EWS) in Ghana.
2. DETAILED WEEKLY ACTIVITY SUMMARY
2.1 Week 1 (2 March – 6 March): Orientation, Strategic Alignment, and Framework Design
The first week of the residency was dedicated to orientation, strategic alignment, and establishing the computational framework. Initial meetings with Prof. Vukosi Marivate focused on refining the research objectives and outlining a modular modelling architecture capable of handling district-level heterogeneity. This phase involved deep integration into the Data Science for Social Impact (DSFSI) Lab and engagement with a multidisciplinary team working at the intersection of climate and health. A key technical outcome was the design of a cloud-based computational pipeline using Google Cloud infrastructure, specifically tailored to support the high-intensity modelling workload anticipated for the malaria Early Warning System (EWS).
2.2 Week 2 (9 March – 13 March): Infrastructure Deployment and Preliminary Validation
During the second week, attention shifted to infrastructure deployment and preliminary experimentation. A Google Cloud Virtual Machine (VM) was configured to enable scalable computation, and the necessary machine learning libraries were installed and optimised for performance. Given initial technical challenges associated with remote infrastructure latency, I adopted a hybrid approach: a lightweight version of the modelling pipeline was executed locally to validate data integrity and ensure consistency across the eleven districts of the Upper West Region. This dual approach of local and cloud execution proved essential for maintaining continuity in the research workflow and ensuring that the datasets were ready for the full-scale “Tournament” phase.
2.3 Week 3 (16 March – 20 March): The Model Tournament and Transdisciplinary Integration
The third week marked the initiation of full-scale modelling experiments through the implementation of the “Model Tournament” framework. Multiple algorithms, including Random Forest (RF), Gradient Boosting, XGBoost, Long Short-Term Memory (LSTM) networks, and Facebook Prophet, were evaluated under controlled conditions. This was a pivotal experimental period where various feature engineering techniques, ML architectures, and cross-validation strategies were tested. Initially, models were tested with a standard 80/20 train-test split using lagged features, where RF showed robust performance. However, traditional cross-validation initially yielded poor results due to data leakage and the time series' non-stationarity. To overcome this, I transitioned to predicting the rate of change (case velocity) rather than raw case counts and employed expanding-window cross-validation. These adjustments, which accounted for historical data dependencies, significantly improved model stability.

Concurrently, my participation in the launch of the Lancet Countdown Africa provided valuable insights into the policy relevance of climate-informed health forecasting.
1. Prof Wanda Markotter, 2. Dr Ashley Burke & co-fellow Dr Dina Coertzen, 3. A Visit to UP ISMC,
A highlight of the week was visiting the One Health Building at the University of Pretoria with Dr Dina Coertzen. Meeting with Professor Tiaan de Jager and Dr Ashley Burke at the UP Institute for Sustainable Malaria Control (ISMC) allowed for a critical exchange of ideas. Our discussions focused on integrating ML and climate data into entomological and clinical frameworks, ensuring the predictive models are grounded in the biological realities of malaria transmission.
2.4 Week 4 (23 March – 27 March): Refinement, Interpretability, and Spatial Heterogeneity
In the fourth week, the focus shifted toward model refinement and explainability. SHAP (SHapley Additive exPlanations) was integrated into the modelling pipeline, enabling a detailed analysis of feature contributions at both global and local levels. This interpretability layer revealed that the drivers of malaria transmission varied significantly across districts; some areas demonstrated high sensitivity to lagged rainfall, while others were more influenced by humidity or vegetation indices (NDVI). The inclusion of Facebook Prophet as a statistical baseline provided an additional benchmark, emphasising the importance of localised modelling approaches to capture the spatial heterogeneity of the Upper West Region. This phase turned the “black box” of AI into a transparent tool for identifying specific environmental triggers.
2.5 Week 5 (30 March – 2 April): Statistical De-trending and Model Optimisation
A critical turning point occurred in the fifth week during the rigorous cross-validation phase. Initial evaluation results revealed poor generalisation performance, primarily due to strong non-stationarity and a downward trend in the historical malaria incidence data. To address this, I implemented the Case Velocity transformation, redefining the target variable as the change in cases between consecutive time steps. This transformation effectively neutralised long-term trends, allowing the models to focus on the short-term fluctuations that indicate emerging outbreaks. The impact of this technical pivot was substantial, resulting in marked improvements in R2 scores and overall model reliability across all validation folds. This week solidified the technical bedrock of the predictive system.
2.6 Week 6 (7 April – 10 April): Synthesis, Dissemination, and Knowledge Transfer
The final week was devoted to synthesis, dissemination, and knowledge transfer. The research findings were presented in a public webinar titled “Comparative Performance of Ensemble Learning and Deep Sequential Networks for Malaria Incidence Forecasting in the Upper West Region of Ghana,” sparking dialogue between data scientists and health practitioners.
Hosting the webinar.
Efforts were concentrated on finalising the research manuscript, titled “Predicting Malaria Transmission in Ghana’s Upper West Region: A Machine Learning Tournament with Explainable AI,” which is currently under review by Prof. Marivate. This final phase ensured that the technical advancements achieved during the residency were translated into formats accessible to both academic and policy audiences, facilitating a clear path from research to practical implementation in Ghana’s health systems.
3. MODEL DEVELOPMENT
This section summarises the experimental procedures that led to the establishment of the model Tournament.
4. CONCLUSION
The six-week mentorship residency at the University of Pretoria represents a significant advancement in both the technical and applied dimensions of this research. By integrating competitive model evaluation, velocity-based forecasting, rigorous validation, and explainable AI, the project has produced a robust, interpretable framework for malaria early warning. Beyond its technical contributions, the residency has fostered meaningful collaboration between institutions and established a foundation for sustained innovation in climate-health analytics across Africa. The resulting system holds strong potential to transform malaria control strategies by enabling proactive, data-driven decision-making in vulnerable regions.
ACKNOWLEDGEMENT
I would like to express my profound gratitude to the Future Africa Research Leadership Fellowship (FAR-LeaF II) and the Future Africa platform at the University of Pretoria for providing the funding and the transdisciplinary environment that made this research stay possible. This work was supported by the Data Science for Social Impact (DSFSI) Research Group, and I am deeply indebted to the entire lab for their warm welcome and technical synergy. A very special thank you goes to my mentor, Professor Vukosi Marivate. His exceptional leadership, technical brilliance, and commitment to “AI for Social Good” have been instrumental in refining the machine learning architecture of this project. I am grateful for his patience in guiding me through the complexities of model interpretability and for his investment in my development as a research leader. I am also thankful to the technical team at the DSFSI lab, particularly Thapelo Sindane, for his invaluable support in navigating Google Cloud infrastructure and optimising the Virtual Machine environments required for our model tournament. My appreciation extends to Professor Wanda Markotter and the UP Institute of Sustainable Malaria Control, as well as Professor Rebecca Garland and Professor Nerhene Davis, for their insightful discussions on the intersections of One Health, climate science, and disease modelling. Their expertise helped ground my data-driven findings in biological and environmental reality. Finally, I wish to thank Professor Mike Osei Atweneboana, the Director of CSIR-Water Research Institute (CSIR-WRI), Ghana, my colleagues in the Surface Water and Climate Change Division (CSIR-WRI), and my family, whose unwavering support allowed me to fully immerse myself in this six-week intensive stay. This work is a testament to the power of African-led scientific collaboration.
REFERENCES
Cohuet, A., Harris, C., Robert, V., & Fontenille, D. (2010). Evolutionary forces on Anopheles: what makes a malaria vector?. Trends in parasitology, 26(3), 130-136.
Saeed, S. (2025). Predicting Daily Stock Price Movements Using Data Mining Techniques: A Comparative Analysis of Logistic Regression, Decision Tree, Random Forest, and XGBoost on Yahoo Finance Time-Series Data.


































