How Big Data Can Transform our Understanding of Microbial Communities

September 21, 2018

Our planet is teeming with countless microbes that play a pivotal role in sustaining life – they aid digestion, convert carbon and nitrogen into essential compounds for plants, create food that are staples of the human diet, and remove toxins from the environment.

No single microbe can perform these complex energy transactions alone, however, which is why they form communities – microbial communities, also known as microbiomes.

Unfortunately, scientific understanding of microbial communities lags behind understanding of individual bacteria, which has prompted one Masdar Institute faculty member and his team to leverage their expertise in bioinformatics and Big Data – the term for data sets that are extremely large, complex in variety and in some cases updated frequently – to develop the world’s largest database for microbial communities.

Dr. Andreas Henschel, Assistant Professor of Electrical Engineering and Computer Science at Masdar Institute, has developed one of the largest open-access, web-based community resources for analyzing and comparing over 20,000 microbial communities from different environments around the world, including the human body, oceans, soil and wastewater.

“The database provides an overview of the similarities and differences between microbial communities from all types of ecosystems,” Dr. Henschel explained.

The database maps all available environmental microbiomes with data collected from 2,426 different independent studies. The database server and an open source analysis tool developed with support from Masdar Institute Research Engineer Muhammad Anwar and former PhD student Vimitha Manohar were recently highlighted in a paper published in the leading scientific journal PLOS Computational Biology.

The database can provide insightful information on microbial community formation and adaptation, which could have profound impacts on human health, among other critical areas, such as wastewater treatment and bioenergy production, which rely on or are impacted by the metabolic processes of microbial communities.

The database can help scientists better understand whether communities assemble more predictably or more randomly, and how physical characteristics, like temperature, pH or salinity, are driving factors in microbial community development.

“Knowing the environment in which a certain microbial community forms is extremely valuable information,” Dr. Henschel said.

Motivated by a desire to learn more about the UAE’s microbiomes, Dr. Henschel is now scanning the database to identify ecosystems that produce microbiomes similar to the ones found in the UAE.

“The microbiomes that flourish in the UAE’s hot, dry climate and high saline waters have unique properties that enable them to survive in these extreme, harsh environments. Many scientists are interested in leveraging these bacteria for various industrial and health applications. One of the reasons why we created this database was to find similar microbiomes in different environments, to determine which environmental factors attract these resilient bacterial communities,” Dr. Henschel explained.

In addition to human health, microbes have also been playing a crucial role in industry, by improving the industrial processes used to produce valuable chemicals, plastics, bioenergy, food, and pharmaceutical products.

Not only has Dr. Henschel developed this extensive database, which he did by customizing an open-source bioinformatics tool suite known as Quantitative Insights Into Microbial Ecology (QIIME), but he is also applying high-performance computing methods to extract useful information from the data for further research.

“The wealth of data produced with deep DNA sequencing of entire microbial communities without the need of isolating and cultivating single bacteria provides an entirely new data intensive angle on microbiology and lends itself to the use of machine learning techniques, which are used to extract valuable information from the large datasets,” Dr. Henschel said.

When applied to his microbiome database, Dr. Henschel aims to identify biomarkers – which are informative combinations of bacteria that indicate the presence of diseases such as colorectal cancer and pre-diabetes – in stool samples, thus avoiding invasive diagnostic methods.

“If we can determine which communities form in the gut of a healthy person versus those that form in the gut of a person suffering from a disease, such as colorectal cancer or an autoimmune disorder, we can begin to determine the factors required for good bacteria to form in the person with the disease. This type of research will be instrumental in the development of effective medicines and alternative therapies that can prevent the over-use of antibiotics,” he explained.

Dr. Henschel’s leading research in the field of bioinformatics exemplifies Masdar Institute’s robust research capabilities in the fields of information science. Through innovative research projects such as this, Masdar Institute is contributing to the advancement of three critical sectors targeted by the UAE’s innovation strategy – water, healthcare and technology.

Erica Solomon
News and Features Writer
16 February 2016