A Genomic Data Science Framework for the 1,000 Arab Genome Project

Principal Investigator
Andreas Henschel
Department
Electrical & Computer Engineering
Focus Area
Healthcare
A Genomic Data Science Framework for the 1,000 Arab Genome Project

Data intensive Whole Genome Sequencing projects are central to the currently ongoing transformation in 21st century medicine. The challenge to utilize the collective genomic population data for personalized medicine in the country is threefold—lack of data, capabilities to Big Genomic Data, and finally making sense of it. The Genomics community has established a versatile analysis ecosystem, which, however, is based on flat-file data storage, thus bypassing decades of modern database development such as schema-free, scalable NoSQL. The envisioned Genomic Data Science framework constitutes will deploy and extend OpenCB, a Big Data system for genomic analysis, successfully deployed in very large scale applications. It bears potential for additional functionality such as extended hierarchical queries, integration with genotype array data, and the derivation of artificial intelligence applications. We aim to exemplify our novel methodology on the 1,000 Arab Genome project, addressing the lack of human sequences in the UAE.

A Genomic Data Science Framework for the 1,000 Arab Genome Project