Twitter Discussions and Emotions About the COVID-19 Pandemic: Machine Learning Approach

J Med Internet Res. 2020 Nov 25;22(11):e20550. doi: 10.2196/20550.

Abstract

Background: It is important to measure the public response to the COVID-19 pandemic. Twitter is an important data source for infodemiology studies involving public response monitoring.

Objective: The objective of this study is to examine COVID-19-related discussions, concerns, and sentiments using tweets posted by Twitter users.

Methods: We analyzed 4 million Twitter messages related to the COVID-19 pandemic using a list of 20 hashtags (eg, "coronavirus," "COVID-19," "quarantine") from March 7 to April 21, 2020. We used a machine learning approach, Latent Dirichlet Allocation (LDA), to identify popular unigrams and bigrams, salient topics and themes, and sentiments in the collected tweets.

Results: Popular unigrams included "virus," "lockdown," and "quarantine." Popular bigrams included "COVID-19," "stay home," "corona virus," "social distancing," and "new cases." We identified 13 discussion topics and categorized them into 5 different themes: (1) public health measures to slow the spread of COVID-19, (2) social stigma associated with COVID-19, (3) COVID-19 news, cases, and deaths, (4) COVID-19 in the United States, and (5) COVID-19 in the rest of the world. Across all identified topics, the dominant sentiments for the spread of COVID-19 were anticipation that measures can be taken, followed by mixed feelings of trust, anger, and fear related to different topics. The public tweets revealed a significant feeling of fear when people discussed new COVID-19 cases and deaths compared to other topics.

Conclusions: This study showed that Twitter data and machine learning approaches can be leveraged for an infodemiology study, enabling research into evolving public discussions and sentiments during the COVID-19 pandemic. As the situation rapidly evolves, several topics are consistently dominant on Twitter, such as confirmed cases and death rates, preventive measures, health authorities and government policies, COVID-19 stigma, and negative psychological reactions (eg, fear). Real-time monitoring and assessment of Twitter discussions and concerns could provide useful data for public health emergency responses and planning. Pandemic-related fear, stigma, and mental health concerns are already evident and may continue to influence public trust when a second wave of COVID-19 occurs or there is a new surge of the current pandemic.

Keywords: COVID-19; Twitter; Twitter data; infodemic; infodemiology; infoveillance; machine learning; public discussion; public sentiment; social media; virus.

MeSH terms

  • COVID-19 / epidemiology*
  • COVID-19 / psychology*
  • COVID-19 / virology
  • Data Collection / methods
  • Emotions / physiology*
  • Humans
  • Machine Learning*
  • SARS-CoV-2 / isolation & purification
  • Social Media*