When data is sparse and medical knowledge of a disease is limited, combining modelling approaches can lead to more accurate predictions of clinical outcomes
Big data in healthcare is nothing new. Hospital records, medical records, results of medical examinations and biomedical research generate vast quantities of information that need to be handled carefully and accurately. The more information clinicians have, the better they can care for their patients, but the information available is sometimes not sufficient.
Sometimes, clinicians and researchers don’t understand how a disease progresses or the biochemical mechanisms behind a disease. Other times the data available is sparse because it depends on when the patient physically attends a clinic or appointment.
In the first case, the predictive accuracy of clinical outcomes for any mathematical model is limited as the underlying biological mechanisms are only partly understood. Machine learning techniques do not require knowledge of the underlying interactions in biomedical problems but infrequent data does impact their use, restraining any algorithm from accurately inferring the corresponding disease dynamics.
Combining the two approaches could be the solution. Dr. Haralampos Hatzikirou, Associate Professor of Mathematics at Khalifa University, proposed a method to improve individualized predictions in cancer patients based on the Bayesian coupling of mathematical modelling and machine learning. This approach was tested on a simulated dataset for brain tumor patients and on two real cohorts of patients with leukemia and ovarian cancer. The results were closely aligned with the actual clinical data for individual patients, suggesting its potential use in enabling accurate personalized clinical predictions in healthcare.
Dr. Hatzikirou worked with Pietro Mascheroni, Michael Meyer-Hermann and Juan Carlos Lopez Alfonso from Braunschweig Integrated Center of Systems Biology and Helmholtz Centre for Infectious Research, Germany, and Dr. Symeon Savvopoulos, Mathematics Post-doctoral Fellow at Khalifa University.
The results were published in Nature Communications Medicine.
In clinical practice, a plethora of medical examinations are conducted to assess the state of a patient’s pathology, producing a variety of clinical data. In oncology, this clinical data is the cornerstone of providing personalized healthcare to the patient, but using the data is more challenging.
“Although mathematical models can be extremely powerful in proposing biological hypotheses, they require adequate knowledge of the underlying biological mechanisms of the analyzed system,” Dr. Hatzikirou explained. “Typically, this knowledge is not complete and we only know a limited amount of mechanistic interactions, such as molecular pathways, seen in cancer. Therefore, even though mathematical models provide a good description of an idealized version of what’s going on in cancer dynamics, they can’t always allow for accurate and quantitative predictions.
“On the other hand, machine learning techniques can handle the inherent complexity of biomedical problems. While mathematical models rely on causality, statistical learning methods identify correlations among data so they can systematically process large amounts of data and infer hidden data patterns. However, the data for each patient is limited to being collected whenever a patient is in the hospital or clinic. This doesn’t provide the algorithm with enough data to make meaningful individualized inferences.”
Dr. Hatzikirou and the research team proposed the first Bayesian method that combines the two techniques.
Bayes’ Theorem deals with probabilities based on prior experiences. These priors provide some information but once there is more data, the priors can be updated. It’s the law of probability governing the strength of evidence, saying how much to revise our probabilities when we observe new evidence. For example, if you know your patient has a positive cancer test result, and that’s all you know, you can look at how many people with a positive test result actually have cancer, and that is input to determining the probability that your patient has cancer.
The team first tested their approach on a simulated dataset of 500 virtual patients with brain cancer. Their model accounted for oxygen consumption by tumor cells, formation of new blood vessels due to the cancer spreading, and detecting the compression of other blood vessels by tumors growing and squashing them. For this demonstration, the team considered the tumor cell density to be the ‘modelable’ variable and the other variables as ‘unmodelable’ to represent the unknown mechanisms of disease progression. In the simulation, each patient attended appointments over a three-year period to serve as the sparse data collection opportunities.
First, a mathematical model for brain tumor growth was simulated, then a machine learning model, before finally the combined approach for comparison against the two individual models. The team found that their combined approach performs particularly well for prediction times larger than six months.
“Our method was able to correct the mathematical model predictions for most of the patients, particularly at later times,” Dr. Savvopoulos said. “We then tested our method on two cohorts of real life patients, using their data to more carefully test the effects of ‘unmodelable’ variables, or those unknowns. We used clinical datasets from patients with leukemia and patients with ovarian cancer.”
“Our proposed method aspires to solve a dire problem in personalized medicine that is related to the limited time-points of patient data collection and limited knowledge in cancer biology,” Dr. Hatzikirou said. “In all our tests, we found our model had excellent predictive capacity, but we did recognize some limitations that should be addressed when applying the methodology to real cases.”
Most importantly, this new combined approach is not restricted to Oncology. Most applications of clinical predictions concern data that is heterogeneous and sparse and there are always unknowns in our knowledge of disease mechanisms. A vast variety of medical problems could be addressed using this approach to provide accurate predictions for healthcare across the board.
24 October 2021