EBTIC researchers have developed tools to collect and analyze tweets about coronavirus in four different languages to identify trends and patterns among the population of the UAE.
Read Arabic story here: http://researchku.com/news-extended/59
In a world where more than 500 million tweets are sent every day, there is little doubt that social media platforms such as Twitter are vital to understanding the zeitgeist. As the novel coronavirus swept the globe and developed into a pandemic, social media offers an instant insight into how Covid-19 is impacting a population.
There have been over 628 million tweets about coronavirus so far. Understanding their content and the sentiment behind their messages is crucial to assisting policymakers and, in the case of Covid-19, identifying where public health messaging can be improved. Social media is an instant way of spreading information and knowledge, and provides a snapshot of a population’s understanding and feelings about a topic for analysts to investigate.
Researchers from Emirates ICT Innovation Center (EBTIC) have developed tools to collect and analyze tweets about coronavirus in four different languages to identify trends and patterns among the population of the UAE. Mr. Ahmad Al-Rubaie, EBTIC Head of Strategy and social media research, collaborated with Dr. Di Wang, EBTIC Chief Researcher, Dr. Ahmed Al Dhanhani, EBTIC Senior Researcher, and Hamda Al Ali and Sara Al Shamsi, both EBTIC Research Associates, to produce a series of dashboards for monitoring Covid-19 tweets.
Extracting useful information and making use of it is the challenging part of any analysis, and being able to automate the processes using machine learning techniques will allow the approach taken to be used across many application areas and in difficult times, such as during a global health pandemic.
“The two main components we developed are harvesters and classifiers,” explained Mr. Al-Rubaie. “Then, we used open source visualization tools to show the outcome of the analyses we performed in real time and in short time scales.”
EBTIC has developed ‘harvesters’ that can harvest tweets in real time. As soon as someone tweets about coronavirus, the EBTIC harvester collects it. But recognizing the content of a tweet when it doesn’t actually contain the word ‘coronavirus’ can be difficult for a human, let alone a machine. So EBTIC also developed a number of classifiers using artificial intelligence to sort tweets into pre-defined categories.
“We can categorize content through machine learning and for different use cases, primarily Arabic and English text, but also Hindi and Urdu for our coronavirus analysis,” explained Mr. Al-Rubaie. “We focus on short text messages, such as tweets, because tweets provide near instantaneous insight into the population views and opinions. However, it’s much more challenging than longer text content. Short text tends to include a very limited amount of information, grammar errors, spelling mistakes, and sometimes content that is specific to a user. These challenges make accurate classification of short text difficult.”
The researchers used their harvesters to collect tweets and then trained their classifiers on these tweets to organise them into pre-defined categories: symptoms, health advice, health news, lockdown, and other. The classifiers used both shallow learning and deep learning AI techniques to classify the tweets, achieving high accuracy in a short time compared with other state-of-the-art techniques. The researchers also developed pre-processors to improve the classification accuracy even further.
The team also developed an intelligent method to detect and measure change in key terms used in social media per hour, per day and per week.
“Our method automatically detects the key terms with highest usage in social media to quickly identify any changes in user interest and discussion topics. We also identify sources driving discussions and whether they are official trusted sources, or private individuals, which is the foundation for our work on misinformation detection. This method is key for monitoring the impact that Covid-19 is having on the population through time,” said Dr. Al-Dhanhani.
This data is visualized through a number of dashboards, highlighting important patterns and trends in the information. Data visualization uses visual elements like charts, graphs and maps to provide an accessible way to see and understand information and data. In the world of Big Data, these tools and technologies are essential to analyzing massive amounts of information and understanding trends and patterns in the data.
The researchers also developed harvesters to gather relevant information from the internet, such as the number of Covid-19 cases reported and the price of various goods and products sold online.
“We also collected information on Covid-19 cases announced by official sources in the UAE and plotted those against the trends from social media,” said Dr. Wang. “We use the graphs to plot the key topics of interest and the change in conversations on social media through time. This helps to highlight what type of information is important to the public and at what time. We can also gauge the level of conversation in line with the cases being announced.”
Collecting information on the prices of food and other products sold online may seem incongruous to the thoughts and feelings of people on social media, but these data can highlight interesting trends among the population. A quick look at the dashboards not only shows correlations between social media discussions and the number of cases, but also valuable trends, such as the stability of the essential food market and the property market throughout the pandemic.
“We can provide an insight to the impacts of the pandemic on essential food prices and even property prices in the region, simply by plotting key trends against the number of tweets and their contents,” explained Mr. Al-Rubaie.
2 November 2020