In a world where over 600 million tweets are sent every day, making sense of it all requires specialized big data interpretation – especially when those tweets are in Arabic.
Every second, on average, around 7,800 tweets are sent. This equates to 473,000 tweets every minute, 681 million tweets per day, and around 248 billion tweets per year. Of these 500 million per day, Twitter users in the Arab region contribute 17.1 million.
While English is still by far the most prevalent language used on Twitter, Arabic is the fastest-growing language used to tweet and social media use has exploded in the Arab world.
“Social media has become not only a main way of communicating between individuals, entities and organizations, but also an instant way of spreading information and knowledge,” explained Dr. Nawaf I. Almoosa, Acting Director of Khalifa University’s Emirates ICT Innovation Center (EBTIC). “Social media provides richer information in all forms from text to images and networking interactions, and the way information flows through social networks is both dynamic and fascinating.”
EBTIC has developed a tool that measures sentiment across social media in the UAE, using Twitter to assess sentiment – or people’s attitudes and feelings – across the country and generating a detailed picture of happiness nationally.
“The aim of this project in the long term is to automatically integrate all available information we obtain from social media posts, user profiles, and networking to find out what is happening in the world and to predict what will happen next,” said Dr. Almoosa. “Posts range from text to images, videos and links and networking covers everything from interactions and conversations online to the way information emerges and flows through communities—it’s very comprehensive.
“Extracting useful information and making use of it is the challenging part. The aim of this project is to make use of all this rich information to automate the online analysis of social media using machine learning and deep learning techniques, and the resulting analysis will be used across many application areas, including sentiment analysis.”
Sentiment analysis is the systematic study of opinions expressed in text, looking at the polarity (whether the speaker is positive or negative about the topic), the subject and the opinion holder, in this case, the tweeter.
“EBTIC has been working on social media analysis for years. Machine learning and deep learning techniques are applied and improved to be useful in understanding what people are talking about in short messages or texts and their feelings on what is happening,” said Dr. Almoosa.
With the help of sentiment analysis systems, the opinions found on social media can be automatically transformed into structured data on public opinions on topics including products, events, and city services. EBTIC’s system uses machine learning techniques to detect opinions and feelings in social media message which are traditionally short, informal, and often unstructured. Currently, EBTIC is working on automatically detecting the underlying reasons driving sentiment changes across time, topics and demography and providing summary reports of the findings. The sentiment tool with this new character provides a brief and direct guide for deeper understanding of sentiment changes and potentially suggests possible solutions to improve happiness in the UAE.
“One of the big challenges for social media is informal speech and incomplete information—or even lack of information—presented in social media and short texts,” said Dr. Almoosa. “EBTIC has invented and filed a patent for a technique to enrich the short text and then let machine learning methods usually used for long formal documents work for the short texts without compromising their accuracy.”
This technique could be applied beyond social media, as it is estimated that 80 percent of the world’s data is unstructured and not organized in any pre-determined manner. These texts are usually difficult, time-consuming and expensive to sort through, understand and then analyze. This becomes even more complicated when an opinion is tweeted in any language or combination of languages other than English.
“More work has been done for English and other languages with simple structures and grammars, but the complexity of the Arabic language means much more work is needed for Arab world social media analysis,” explained Dr. Almoosa. “In the Arab world, social media messages are naturally written in Arabic, using a mixture of Arabic script and Latin script to spell out Arabic words phonetically, and this is much more popular amongst the younger generations. Arabic social media analysis, therefore, is recent and still limited. We continue to work on developing the project to work more efficiently and comprehensively across Arabic language social media posts.”
Tools like EBTIC’s sentiment analysis solution can provide unbiased insight into the thoughts and feelings of the UAE’s citizens and help the UAE government put the country among the top five happiest countries in the world by 2021.
“Right now, happiness is one of the most important national goals in the UAE and Twitter offers first-hand insight into the thoughts, feelings, and concerns of the population,” added Dr. Almoosa. “This is why developing a tool that analyzes the large volume of social media content in real-time and providing it in a visual form that enables interpretation is important.”
“The results from this work have been applied to different applications and delivered to UAE government entities and EBTIC partners, including Etisalat, Abu Dhabi Police, Statistic Center Abu Dhabi and the Ministry of Education. Our ongoing work is attracting attention and more projects and deliveries are expected.”
News and Features Writer
25 June 2019