How data engineering strengthens public health initiatives

December 16, 2024 by Tara Newton

By providing clear, actionable data, stakeholders can make better-informed decisions, optimize interventions, and improve health outcomes.

Geshiyaro_La

A lab technician in Wolaita zone in Ethiopia analyzes samples collected as part of the Geshiyaro project. Photo: London Centre for Neglected and Tropical Disease Research.

In public health, the strength of decisions lies in the data behind them. Even with years of experience in administering and implementing health interventions, many organizations and health officials still face significant hurdles when it comes to efficiently collecting, managing, and interpreting data to inform decision-making.

This was the challenge faced by stakeholders in Ethiopia on the Geshiyaro project, funded by the Children’s Investment Fund Foundation (CIFF). Geshiyaro means “hygienic” in the Wolaita language. The project, implemented in the Wolaita zone, aimed at optimizing programs to eliminate soil-transmitted helminths (parasitic worms in soil) and schistosomiasis (parasites in freshwater) by improving sanitation and hygiene. This was achieved through interventions such as mass drug administration (MDA) and improved water access.

PATH played a crucial role in transforming how the project team (CIFF, Federal Ministry of Health in Ethiopia, Ethiopian Public Health Institute, and London Centre for Neglected and Tropical Disease Research) approached data management and reporting.

Through meticulous coordination and the development of a robust data dashboard system, PATH helped ensure that the project’s stakeholders—from government officials to international donors—had the clear, actionable data they needed to evaluate the progress of the program.

Organizing the data

PATH’s Senior Data Engineer, Doug Morris, and his team were responsible for wrangling the siloed, raw data into a structured, actionable, and compelling format that could be understood by all stakeholders.

The data sources were diverse, including disease prevalence data (e.g., stool egg counts), MDA logs (capturing drug administration via biometrics), and WASH intervention data from public health surveys at the household level. These datasets were collected by different teams, each using its own data management systems, resulting in challenges around integration and standardization.

Doug and his team designed interactive dashboards in Tableau that visualized key metrics mapped back to the various data sources. However, developing these dashboards required more than just making data visually appealing—it involved understanding how data points fed into key indicators and how those indicators would inform decision-making.

Key steps in the process included:

  • Collaborating with stakeholders: The team worked closely with government officials, donors, and implementers to ensure the dashboards met their needs. Clear communication was essential to align on how to define key metrics (e.g., what constitutes the "eligible population" for MDA) and how to calculate them.
  • Cleaning and standardizing data: Since the data came from multiple sources, the team had to address inconsistencies in frameworks, metrics, and calculations. Disagreements required careful attention to terminology and definitions to ensure everyone was on the same page.
  • Transitioning to local teams: A significant challenge was transitioning the dashboards to the Ethiopian team in just one month. This required not only transferring the technical know-how but also training local staff to use the system effectively.

Despite these challenges, the final dashboards had a significant impact and were positively received. They provided government officials and donors with timely, actionable insights into MDA coverage, disease prevalence, and the effectiveness of WASH interventions. This transparency aimed to foster greater trust and collaboration, enabling stakeholders to make data-driven decisions and adjust strategies as needed.

The value of data engineering skills

Reflecting on the experience, PATH's team emphasized the critical role of data engineering—a skill often overlooked in discussions about data science. In this project, data engineering involved not only building dashboards but also ensuring that data was clean, structured, and accessible to all stakeholders.

“Data engineers manage the messiness of working with real-world data and translate it into well-organized, shared datasets. This requires technical expertise, patience, and collaboration with stakeholders to ensure everyone is aligned.”
— Doug Morris, Senior Data Engineer, PATH

In general, there is a lack of investment in data engineering. While there is growing interest in adopting flashy analytics or AI-driven solutions, the foundational work required to prepare data for these tools is often overlooked. Without strong data management, even the most sophisticated models and dashboards are ineffective.

The Geshiyaro project is a prime example of how strong data engineering can strengthen public health initiatives. By providing clear, actionable data, PATH enabled the Ethiopian government and its partners to make better-informed decisions, optimize interventions, and improve health outcomes.