Twitter Discussions and Emotions About the COVID-19 Pandemic: Machine Learning Approach

Jia Xue; Junxiang Chen; Ran Hu; Chen Chen; Chengda Zheng; Yue Su; Tingshao Zhu

doi:10.2196/20550

Twitter Discussions and Emotions About the COVID-19 Pandemic: Machine Learning Approach

J Med Internet Res. 2020 Nov 25;22(11):e20550. doi: 10.2196/20550.

Authors

Jia Xue^{1

2}, Junxiang Chen³, Ran Hu¹, Chen Chen⁴, Chengda Zheng², Yue Su^{5

6}, Tingshao Zhu⁵

Affiliations

¹ Factor-Inwentash Faculty of Social Work, University of Toronto, Toronto, ON, Canada.
² Faculty of Information, University of Toronto, Toronto, ON, Canada.
³ School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States.
⁴ Middleware System Research Group, University of Toronto, Toronto, ON, Canada.
⁵ CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China.
⁶ Department of Psychology, University of Chinese Academy of Sciences, Beijing, China.

PMID: 33119535
PMCID: PMC7690968
DOI: 10.2196/20550

Abstract

Background: It is important to measure the public response to the COVID-19 pandemic. Twitter is an important data source for infodemiology studies involving public response monitoring.

Objective: The objective of this study is to examine COVID-19-related discussions, concerns, and sentiments using tweets posted by Twitter users.

Methods: We analyzed 4 million Twitter messages related to the COVID-19 pandemic using a list of 20 hashtags (eg, "coronavirus," "COVID-19," "quarantine") from March 7 to April 21, 2020. We used a machine learning approach, Latent Dirichlet Allocation (LDA), to identify popular unigrams and bigrams, salient topics and themes, and sentiments in the collected tweets.

Results: Popular unigrams included "virus," "lockdown," and "quarantine." Popular bigrams included "COVID-19," "stay home," "corona virus," "social distancing," and "new cases." We identified 13 discussion topics and categorized them into 5 different themes: (1) public health measures to slow the spread of COVID-19, (2) social stigma associated with COVID-19, (3) COVID-19 news, cases, and deaths, (4) COVID-19 in the United States, and (5) COVID-19 in the rest of the world. Across all identified topics, the dominant sentiments for the spread of COVID-19 were anticipation that measures can be taken, followed by mixed feelings of trust, anger, and fear related to different topics. The public tweets revealed a significant feeling of fear when people discussed new COVID-19 cases and deaths compared to other topics.

Conclusions: This study showed that Twitter data and machine learning approaches can be leveraged for an infodemiology study, enabling research into evolving public discussions and sentiments during the COVID-19 pandemic. As the situation rapidly evolves, several topics are consistently dominant on Twitter, such as confirmed cases and death rates, preventive measures, health authorities and government policies, COVID-19 stigma, and negative psychological reactions (eg, fear). Real-time monitoring and assessment of Twitter discussions and concerns could provide useful data for public health emergency responses and planning. Pandemic-related fear, stigma, and mental health concerns are already evident and may continue to influence public trust when a second wave of COVID-19 occurs or there is a new surge of the current pandemic.

Keywords: COVID-19; Twitter; Twitter data; infodemic; infodemiology; infoveillance; machine learning; public discussion; public sentiment; social media; virus.

©Jia Xue, Junxiang Chen, Ran Hu, Chen Chen, Chengda Zheng, Yue Su, Tingshao Zhu. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 25.11.2020.

MeSH terms

COVID-19 / epidemiology*
COVID-19 / psychology*
COVID-19 / virology
Data Collection / methods
Emotions / physiology*
Humans
Machine Learning*
SARS-CoV-2 / isolation & purification
Social Media*