iamcal.com

home | book | software | code | articles | public SVN

13

Using Emoji in Web Apps

By Cal Henderson, October 21st 2009.

iPhone Screenshot

If you haven't been locked in a small box for the past year, then you'll be familiar with the screenshot on the left - the Emoji keyboard on the iPhone. If you haven't unlocked yours yet, just go into the app store and search for 'emoji' to find an app.

Emoji is the Japanese term for small emoticon pictures. The word 'e moji' literally means 'picture letter'. They've seen widespread adoption in the Japanese mobile market, with multiple providers adding the ability to send these pictograms as part of text messages, emails and on web pages.

This is achieved via the Unicode Private Use character space, a large set of code points where vendors can define their own meanings. This is all well and good, but of course nobody agrees on what code points should mean which pictures.

The players in the space for the most part were NTT DoCoMo, KDDI/Au and Softbank/Vodafone. More recently, Apple have added emoji support to the iPhone and Google to the Android mobile operating system. While Apple chose to use the Softbank/Vodafone mappings, all of the others are distinct, creating four different standards.

Using Emoji

Exchanging little pictures in SMS is pretty fun for while, but why should that fun be limited to SMS? If we're building web apps which receive data from mobile devices or display data on them, we should be able to leverage these little critters. We can even create them ourselves as iconography without the cost of creating and serving image files.

But this isn't trivial - different devices use different code points (an iPhone can't display a smiley face entered on an Android phone) and people will be using your web app with a regular desktop browser too. Luckily, there's a reasonable solution to all these problems.

The Unified Set

There's a Unicode proposal from folks at Google and Apple to standardize emoji down to a single set of code points. While these code points haven't yet been formalized, this proposal contains a way to map each of the four different schemes. By using this unified mapping internally within our application, we can convert from any input scheme to any other output scheme.

At input time, we need to know where the data has come from, since some of the different sets overlap - both Softbank's 'frog face' and KDDI's 'small white square' are represented by U+E531. If we convert from the native encoding to the unified encoding for storage, we don't need to store the format. At output time, we can detect which platform we're delivering content to and convert from the unified set to the platform specific set.

Not all of the platform's emoji sets are equal though, so some symbols appear in only one set, or appear multiple times. For instance, KDDI has an emoji called 'large white square' with a code point of U+E548. DoCoMo doesn't have any equivalent at all, while Softbank has U+E21B meaning 'white rounded square' which can be used in its place. The proposal includes one-way mappings for some missing characters and replacement text for others - while all emoji are not supported by all platforms, there is a recommended replacement for them.

Back to the desktop

This is all well and good, but what should we do if someone enters emoji on their device which we want to display to non-mobile users? The unified set are Unicode code points which are defined by some fonts, but the pictures will be monotone and are missing from most common web fonts.

With a smart bit of output-processing, we can replace characters from the unified set with HTML to show inline images. At display time, our code can search for instances of the UTF-8 byte sequence 0xE2 0x98 0xBA (known to friends as U+263A, the smiley face) and replace them with <img src="/emoji/smiley_face.gif" class="emoji" />, or whatever HTML we wish to use.

Speed Concerns

The unified set is pretty big, covering 772 characters. Performing a search and replace at display time is going to take a couple of milliseconds, so for heavily trafficked applications, we might need to start worrying about performance. Luckily there are a few tricks.

Assuming most of your content doesn't contain emoji, you can add an extra field to your database rows called contains_emoji. By setting this at insert time, you can only spend time transcoding the emoji at output time if there are actually any in there.

If your application has lots and lots of emoji content, then storing multiple versions of your content with different emoji formats in might make sense. If you know you're primarily serving to DoCoMo users, then store both a unified and a DoCoMo copy. When you need to serve to DoCoMo, use that version, else convert from the unified version.

A library to make it all happen

I've put together a simple PHP library to abstract all of the messiness away. Just include it in your application and start detecting emoji right away. Instructions are included in the readme file.

Post-Script

While I'm a bit of a Unicode nerd, I'm no expert. The above article may contain mistakes or fail to mention very important things. If you spot any glaring mistakes or omissions, drop me an email and teach me: cal [at] iamcal.com

Copyright © 2009 Cal Henderson.

The text of this article is all rights reserved. No part of these publications shall be reproduced, stored in a retrieval system, or transmitted by any means - electronic, mechanical, photocopying, recording or otherwise - without written permission from the publisher, except for the inclusion of brief quotations in a review or academic work.

All source code in this article is licensed under the GPL v3.

13 people have commented

loranger
# December 17, 2009 - 2:16 pm PST
You did an awesome job by listing all the emoji codes. Thank you
Jean-Paul Horn
# December 25, 2009 - 2:39 pm PST
Is there any chance this could be made into a WordPress plugin? A lot of our users are putting emoji's in their comments and this could be a very welcome plugin. I'd like to commission this if this is feasible (please contact me by email to discuss)
Chris
# December 27, 2009 - 1:47 pm PST
For people using Chrome can also try Emoji for Chrome that is developed by us.

Install the extension and you can now see emoji for twitter or other website.

chrome.google.com/extensions/detail/fbgkphlalcbmifhkabdbodaghlhfcbbd

twitter.com/chrome_emoji
iPhoned
# February 24, 2010 - 11:39 am PST
Great way to standardize the Emoji, this PHP library is very worthfull. Have not yet seen the Emoij within the sms/email application of the (standard)iPhone software, but this is great step forward.
gumgl
# March 28, 2010 - 8:48 am PST
GREAT SUCCESS!

I was trying to encode emojis in my sql database using htmlentities() but it wasn't working.
Thank you for this library.
ccliana
# July 10, 2010 - 11:39 am PST
what do the japanese symbols in the colored squares mean?
Cal
# July 12, 2010 - 9:50 am PST
ccliana: They're all explained on this page: www.unicode.org/~scherer/emoji4unicode/20090804/utc.html
Timothy
# July 15, 2010 - 2:18 pm PST
First attack of the grammarnazi - it's worthwhile iPhoned. No need to make new words when established ones will suffice. :D

I'm looking forward to finding a way to use this in a web app.

Presuming that the squares that show up on phones when you send iPhone emojis to other models have some code behind them surely it would be possible to write an Android app to recieve/decode iPhone emojis and vice versa using the info from this page?
Timothy
# July 15, 2010 - 2:20 pm PST
Just so that I don't get accused of falling short of my own standards I'd like to point out that the first sentence of my last comment should read:

First: Attack of the Grammarnazi! It's "worthwhile", iPhoned.
William
# July 19, 2010 - 7:25 am PST
I used emoji on my ipod touch and it broke I know play those games and use PalRingo on my Droid x. How do I get the emoji to show up? Thank you for your assistance.
chuck
# July 23, 2010 - 3:25 pm PST
Great job! Just used this to render emoji images on a team site for an iPhone game. Thanks a million!
Dao Hoang Son
# August 9, 2010 - 3:13 pm PST
Anybody knows which sets are being used in Facebook?
wiley
# August 26, 2010 - 3:10 pm PST
These are really handy when you have an embeddable twitter feed and you like to use emoji when tweeting from your phone.

There's also a russian jquery plugin out there that attempts to do the same thing, but it's a little quirky and I needed to tinker with it a while before it worked for me.

plugins.jquery.com/project/emoji

Leave your own comment

Your name: Required
Email address: Optional, will be hidden
URL: Optional
Prove you're human: Enter the answer to 3 + 4