Sentiment of Emojis

PLoS One. 2015 Dec 7;10(12):e0144296. doi: 10.1371/journal.pone.0144296. eCollection 2015.

Abstract

There is a new generation of emoticons, called emojis, that is increasingly being used in mobile communications and social media. In the past two years, over ten billion emojis were used on Twitter. Emojis are Unicode graphic symbols, used as a shorthand to express concepts and ideas. In contrast to the small number of well-known emoticons that carry clear emotional contents, there are hundreds of emojis. But what are their emotional contents? We provide the first emoji sentiment lexicon, called the Emoji Sentiment Ranking, and draw a sentiment map of the 751 most frequently used emojis. The sentiment of the emojis is computed from the sentiment of the tweets in which they occur. We engaged 83 human annotators to label over 1.6 million tweets in 13 European languages by the sentiment polarity (negative, neutral, or positive). About 4% of the annotated tweets contain emojis. The sentiment analysis of the emojis allows us to draw several interesting conclusions. It turns out that most of the emojis are positive, especially the most popular ones. The sentiment distribution of the tweets with and without emojis is significantly different. The inter-annotator agreement on the tweets with emojis is higher. Emojis tend to occur at the end of the tweets, and their sentiment polarity increases with the distance. We observe no significant differences in the emoji rankings between the 13 languages and the Emoji Sentiment Ranking. Consequently, we propose our Emoji Sentiment Ranking as a European language-independent resource for automated sentiment analysis. Finally, the paper provides a formalization of sentiment and a novel visualization in the form of a sentiment bar.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Emotions*
  • Europe
  • Humans
  • Internet
  • Social Media*
  • Terminology as Topic

Grants and funding

This work was supported in part by the EC projects SIMPOL (no. 610704), MULTIPLEX (no. 317532) and DOLFINS (no. 640772), and by the Slovenian ARRS programme Knowledge Technologies (no. P2-103).