Research News

Accounting for the Randomness in Predictive Modeling

August 31, 2022

Weather forecasting is a complex and challenging science. Accurate forecasting will become only more vital in the coming years with severe weather events expected to be more frequent and more intense thanks to climate change and variability.


It’s this variability that Prof. Emilio Porcu, Professor of Mathematics at Khalifa University, sought to consider when forecasting all sorts of things, not just weather patterns.


Predictive modeling is a commonly used statistical technique to predict behavior. It analyzes historical and current data and generates a model to help predict outcomes. In all models, there’s an element of randomness that is often unaccounted for, leading to less accurate predictive outcomes.


Prof. Porcu developed a model with Dr. Philip White, Brigham Young University, to better account for the randomness that occurs in factors traditionally described as discrete. This model can be applied to any datasets with seasonality or periodic fluctuations that occur at specific regular intervals, such as monthly, daily, or weekly. Their results were published in the journal Environmetrics.


“Continuous-time data often exhibits multiple sources of seasonality — daily or weekly occurrences,” Prof. Porcu explained. “Many multi-annual global climate datasets exhibit annual seasonality and can be combined with a model to account for short-term variations.”


There are many reasons to study seasonal variation, not least in climate tracking and prediction, where using past patterns of the seasonal variations across a year of weather contributes to forecasting and prediction trends.


Predictive models for anything that changes over time involve some element of randomness, with that random component usually assigned to the changes seen as opposed to the time periods being examined. These models assume that the time factors in the equation are discrete — predictable with no random component. This is a false assumption made for modeling simplicity.


The researchers propose a new model that includes that random component to the time factors. This improves the model’s predictive capacity.


In the model, the cyclical time patterns are viewed as circles that can overlap. In predicting cloud cover over a city, for example, these time factors could be cloud frequency over a month and over a day. Visualizing this overlap creates the torus shape, which can help users of the model make predictions.


Mathematically, a torus is the simplest 3D shape with a hole in it. It’s a doughnut shape.


The researchers used three environmental situations to assess their model. First, they focused on ground-level ozone in Mexico City.


“Urban areas like Mexico City are closely monitored to protect the population from the short- and long-term health risks associated with ground-level ozone,” Prof. Porcu said. “Because ozone formation requires heat and sunlight, we use temperature as an explanatory variable for ozone, combining it with relative humidity and the time factors of hourly and daily cycles of ozone levels.”


The model accounts for short-term, daily seasonal, and weekly seasonal variations in ozone levels and effectively captures the seasonality present in the data.


The researchers then turned their attention to wind-speed forecasting, which is strongly connected with temperature, time of day, relative humidity, and other weather patterns. Rather than forecast wind speeds, the researchers used their model to determine the hourly average wind speeds from a single monitoring station in Utah, USA. Their model outperformed other models in its predictions.


Finally, the model was tested in predicting cloud cover around the world.


“Cloudiness is linked with global climate changes, and modeling it is important to estimating many downstream effects,” Prof. Porcu said.


Cloud coverage shows distinct trends depending on whether they form over land or over water, as well as over time, depending on the current season. The researchers’ new model was able to account for this seasonality and outperformed comparative models.


“To the best of our knowledge, wrapping time into the product of circles to account for multiple sources of seasonality remains unexplored,” Prof. Porcu said. “Our approach proved successful on various datasets but our work leaves many questions open for further research.


“This research shows that modern data science is twinned with mathematical creativity. Wrapping time into a fancy geometrical object is not a mere mathematical artifact, but the proper way to provide stochastic models that allow for a considerable improvement in prediction accuracy. Without mathematics, data science becomes a mere exercise of data mining or data engineering.”


Jade Sterling
Science Writer
31 August 2022