Bringing Machine Learning and Artificial Intelligence to Population Health

Low-income and middle-income countries experience nearly 90% of the global burden of disease, states the British Medical Journal. In Pakistan, with its population of 217 million, Tuberculosis (TB) is a major public health concern. Under-diagnosis and under-reporting of cases are considered key barriers to ending TB. Against this backdrop, the Pakistan National Tuberculosis Control Programme (NTP) partnered with epidemiologists at Royal Tropical Institute (KIT), Amsterdam to launch a virtual hackathon for the estimation of subnational TB. The results were reported in the Journal of Tropical Medicine and Infectious Disease.

Five teams, including a team from EPCON, were given the task of developing their own models for district-wise TB estimates among people aged 15 and over. There is a strong demand for accurate TB estimates at the subnational level to help with planning and resource optimisation. EPCON applied its Bayesian network approach - and was the only model to use macroeconomic and development data, such as gross national income.

EPCON _ Pakistan model

Beyond the traditional methodology

All models were assessed using measurable indicators of model performance such as completeness, pseudo accuracy, precision and cross validity; credibility was scored based on expert opinion in an anonymised way. While most other models used traditionally known statistical approaches based on Bayesian inference, EPCON utilised a novel machine learning approach with Bayesian neural networks, which is still relatively less common in the population health domain. Machine learning approaches offer tremendous computational power to overcome challenges of discrepancies in data and number of covariates while moving over the involvement of known assumptions for model fitting.

The predominant method for determining subnational TB data involves case notifications - i.e. a TB diagnosis is noted on the national surveillance system - but obviously many cases are missed. In fact, even with good health coverage, data may not reflect the disease burden. The outputs of this hackathon were of practical use for Pakistan’s national tuberculosis control programme. The predictions helped identify several at-risk districts - i.e. low notification-to-prevalence ratios.

Practical implications of predictive modelling

While differences in approaches produced estimates that were heterogeneous, EPCON’s artificial neural network approach was ranked as highly credible according to local expert’s opinion and knowledge of TB distribution in the country. This led to a successful collaboration between EPCON and NTP Pakistan, where real time digitally captured community screening data is used to train a predictive model for Pakistan. The outputs are used to select and plan locations of active case finding events using chest x-ray-equipped mobile vans.

The hackathon also provided a unique opportunity to compare different TB subnational prediction models in Pakistan, including novel modelling techniques which haven’t previously been applied in this domain.

There were, of course, challenges. These ranged from data issues, such as sparse or missing data, to more practical issues - for example, modellers had a very short turnaround time and limited human resources. The authors, Alba S et al, rightfully concluded, ‘the fact that the NTP Pakistan could use the outputs showed that, limitations notwithstanding, they are valued by decision makers and planning.’