What effect do emojis have on our languages? Will they allow us to convey tone of voice in text? Have you ever wondered how heart and poop emojis work grammatically?
Social media, from a linguistic point of view, can shed some light on this. But first, let’s explore how emojis relate to sentiment analysis.
Emojis are any number of symbols, including faces, hearts, symbols, places, smiling poops, glasses, knives, fires, pets, vegetables, pizzas and maki rolls, which more and more people use together (and often instead of) words in social media. They’re also popular in instant messaging and, to a lesser extent, emails. They are icons interacting with grammar, and this draws attention to some theoretical issues in language studies that are worth exploring.
Understanding emojis beyond the abstraction
Although theoretical, the challenge emojis represent for Natural Language Processing (NLP) are anything but abstract. Emojis are a major phenomenon in social media. Catching up with their evolution, for brands and companies, is not a simple task.
For one, there’s the stigma. According to J. Jones, in The Guardian,,the “brainless little icons” are signs of illiteracy. And Jones promises he will be speaking the language of Shakespeare when the post emoji-apocalypse world falls apart.
But emojis aren’t going away.
Unicode Consortium is a non-profit organization responsible for the design, unification and the approval of emojis, so that all the different computing platforms will render them in the same way. The last release, Unicode Version 8.00, was in June, and there were immense additions to the emoji army.
Nowadays, over 6 billion emoticons in texts alone, are sent by 2 billion smartphones every single day, according to eMarketer.
Emojis around the world
According to SwiftKey, French mostly “❤️fy” (heart emoji used 4 times more than in any other language), Arabians “speak” of hot weather and plants (sprouts, in particular), and Spanish people party-emojify a lot.
In over a billion of emoji data, gathered in 4 months between October 2014 and January 2015 for 16 languages, emojis have been found to convey more positive than negative sentiment. The trend holds in all languages (70% positive vs. 15% negative). French are the happiest, so to speak, with 86% positive sentiment, mostly expressed with the heart emoji, while US Spanish speakers win the negative prize with 22%. Canadians use raunchy emojis the most; but, related to this, US English speakers use the eggplant generously. Eggplants are substituted by bananas for Italians, in these contexts.
Twitter has a mesmerizing tool to show us in real time which emojis are being used: emojitracker.
Towards a definition
What can sentiment analysis do with these observations? First of all, it is really important to define what emojis have to do with language, to be able to handle them in NLP.
They are iconic entities, pieces of extra-linguistic material. According to a father of Linguistics, F. de Saussure, natural language consists of linguistic signs. Every sign has two inseparable sides: the Signifier, the “shape” of a word (its sound component, or the sequence of letters/ideograms/logograms), and the Signified, the concept in the mind (when we hear or read the signifier). The relation between them is arbitrary, that is, it has no reason or justification outside language.
In this sense, Mr. Jones is right. Emojis are a pretty primitive device, possibly proto-linguistic, more like drawings in caves than like sonnets. Emojis have not much in common with natural language per se: they are not recursive, they cannot be organized in rules to infinitively produce other emojis; they are not grammatically distinctive.
In this respect, the case of the color of the heart emoji is very telling. In an Instagram blog post, 💙💚💛💜💖💗💌 have been semantically analyzed through a vectorized representation. A technique called word2vec is used to predict the context around an emoji, or a word, reading through text in a skip-gram mode.
Emojis, as well as hashtags, are placed with lexical material into a common metric space, with well-defined distances between elements. Similar words are represented in such a way to have a small distance. Floating point numbers are learnt through the Gensim library, re-implementing word2vec, resulting in a 100-dimensional representation for both words and emoji.
Back to the ❤️ of the matter, if the red heart is subtracted off, each heart of another color shows its own pattern.
For instance, 💙 minus ❤️ tends to associate with #goblue, #letsgoduke, #bleedblue, #ibleedblue, etc.; 💚 minus❤️ with #gogreen, loyals, #herballife, #happysaintpatricksday, 🍏, #stpats, 🍀, #jointhemovement, green, #hairskinnails, #happystpatricksday; 💛 minus ❤️ with 🌱 ,🍊 ,#springhassprung ,🔆 ,#springiscoming ,#springishere, #aprilshowers, #thinkspring, #hellospring, 🌻, #wildflower, #happyearthday, and so on.
The colors clearly differentiate each heart from the other. However, there is no formal reason to say that a green heart could not correspond to, say “spring is coming.” It could and it might. In language, instead, spring and sprint are discrete, and grammatically distinctive, no matter if what changes is just a phoneme. Sometimes, what distinguishes minimal pairs is even less than a phoneme, like in pan vs ban, where only the voiced feature of the first phoneme differs.
Emojis in natural language
We can say that emojis have a function, but not a form, unlike language. But they do mix and match with language, a bit like gestures or facial expressions do, a bit like tone of voice does: ambiguously, in a context-related fashion, and depending on background and culture… like eggplants vs. Italian bananas above show.
How many times have we added a smiley in an email, when we thought we were sounding aggressive? Or, simply, we add a smiling face to show pleasure in what we are writing, like a smile when we talk.
|1) English||a. I got a cute tan😍|
|2) Spanish||a. Que lindo dormiiiii😍😍 |
‘I slept so well 😍😍’b. Shawn mi amor😍❤
‘Shawn my love 😍❤’c. ❤ Burger King SAN LUIS ❤d. Sushiii 😍 Nos encanta!
‘Sushi 😍 we love it’e. Bon Jovi 💞
|3) French||a. On a mangé chez Dom ce midi👍🏼 |
‘We ate at Dom at noon👍🏼’
On the other hand, data are full of emoticons substituting words rather than accompanying them, becoming lexical units and therefore giving a hard time to Linguistics purists trying to claim that no iconicity enters language. While being non-formal material, they are mixed with grammar. If we compare them to onomatopoeia, like purr purr, meaning, indeed, the purring of cats, the surprise is minimal. Purr here, originally an iconic sound, becomes a word and is verbed, and afterwards inflected. Our emojis undergo analogous treatment:
|4) English||a. I ❤️ you |
Today is 💩
|5) French||a. J’ ❤️ m’exhiber‘I ❤️ exposing myself’|
|6) German||a. Meiner steht immer mit der 🔫 vor der Tür 😳 |
‘Mine stands always with the 🔫 in front of the door 😳’
|7) Portuguese||a. To 💩 e 🚶 pra você |
‘I’m 💩 and I’ll 🚶 to your place’
Emojis even get compounded. They become therefore not only words but constituents in word-formation processes, molding in language morphology.
|8) English||a. 💩 🐃 🐃 🐃 💩💩BULL SHIT |
Today is 💩
Truly speaking, one could argue that, more than a compound, example 8 is only a visual representation. This also happens a lot: emoji can be simply figurative.
|9) French||a. le réveille 💣💣💣💣💣💣 |
‘the alarm clock 💣💣💣💣💣💣’b. J’ai envie de marcher alors c’est ce que je vais faire 👉🚶 😘!
|10) German||a. heute so: 🔥👅💦💯 alle tage danach: 😷💩 |
‘today like this: 🔥👅💦💯 all the following days: 😷💩’b. Englischer Garten 🌳🌷 #münchen
‘English Garden 🌳🌷 #munich’
‘I feel like walking so that’s what I am going to do 👉🚶 😘!’
|11) Spanish||a. Traductor Google= 💩 |
Emojis are so versatile that they can combine all the shown possibilities. For example, visual representation with tone of voice:
|12) Italian||a. 🐟Salmone😋 |
‘🐟Salmon😋’b. Viaggio verso Machupicchu ❤️❤️❤️❤️Peru’ 🚂 🎶🎶🎶🔴⚪️🔴
‘Trip to Machu Picchu ❤️❤️❤️❤️Peru 🚂🎶🎶🎶🔴⚪️🔴’c. Non vedo l’ora di andare in Sardegna con loro💕👫💏🌊🔝
‘I can’t wait to go to Sardinia with them 💕👫💏🌊🔝’
|13) Spanish||a. 1 año ya😓👴🏼 |
‘1 year already 😓👴🏼’
Another instance is lexical use with tone of voice:
|14) English||a. Taco bell 🔥sauce 💯💯|
|15) Italian||a. Non ci hai portato il 🍷col 🍔! 😭 |
‘You did not bring us the 🍷 with 🍔! 😭’
The English one is definitely positive sentiment for Taco Bell fire sauce!
For sentiment analysis, once we have an (albeit multifaceted) idea of what emojis are, we are able to use them in NLP modules. Actually, they are already part of the NetBase rule system.
Brand names have always played with the delicate and subtle interaction between verbal and non-verbal language, with sounds and sound symbolism. Alternatively, brands decided to recall exotic scenarios, for instance. I. Piller has studied how brand names are “manufactured” for foreignness to appeal to a specific register-speaking audience, or to just be syntactically marked.
And then the exploration has been pushed further.
J. Lukin, Senior Director at Oreo North America, recently explained how crucial it is to speak like the fans. “So much meaning can be communicated with a single emoji, and we’ve been able to tap into that by using emojis to share our message of seeing the world with openness and curiosity.”
Oreo, as a matter of fact, launched one of its most successful campaigns, exploiting the emoji power. An Oreo branded account was set up on WeChat. There, an app allowed people to take family pictures with children and paste their faces into emoji templates. Moreover, Oreo bus shelters were present in the biggest cities, where users could project their emojis onto the shelters’ screens.
Another example is GE, which came up with the #EmojiScience project. People would send GE their favorite emoji and, in return, receive a video displaying a scientific experiment evoked by that emoji. Also, an educational “Emoji Table of Experiments” was put in place to win people’s love for chemistry – and for GE.
Social media offers plenty of cases of brands exploiting this potential. Bud Light tweeted an all-emoji American flag in 2014, while Ikea, Burger King and Comedy Central branded their own emojis. Coca-Cola uses “emoticokes” and Mentos speaks with “ementicons.” James Irish Whisky did something similar.
In sum, emojis are here to stay. At the dawn of time, we communicated through gestures, expressions, symbols, and other icons. Then the Verb came, to paraphrase the Bible, and it was the beginning: the Beginning of Natural Language. Structure became the ruling force of our communication. But icons never disappeared. We kept purring, murmuring, and mumbling, and, at the end, emojis boomed back into verbal exchanges.
Cognitive Linguistics founder Langacker defines language as a “compacting machine.” Icons serve expressivity. If, for practical, sarcastic or poetical reasons we need such a device and we have it, we will creatively use it. Maybe Shakespeare did not mix images and grammar, but other poets did. Apollinaire’s poems are sometimes represented in calligrams.
Well, if Apollinaire could draw-write that he loved Lou, we can say we 💘 Apollinaire. And NetBase NLP, both for English and Multilingual, already understands our love.
 Piller, I. (1999). “Iconicity in Brand Names”. Naenny, M. & O. Fischer (eds). Form Miming Meaning. Iconicity in Language and Literature. Amsterdam: Benjamins, 325-341.
 Langacker, Ronald W. (1977). “Syntactic Reanalysis”. In C. Li. (ed.). Mechanisms of Syntactic Change. Austin: University of Texas Press, 57-139.
 Apollinaire, G. (1918). Poèmes de paix et de guerre. “Flèche saignante. Je porte au cœur une blessure ardente et elle me vient de toi ma Lou. Lou m’a percé le cœur et j’aime Lou”.
Image from Wicker Paradise