• Archive
  • RSS
banner

Storing hundreds of millions of simple key-value pairs in Redis

When transitioning systems, sometimes you have to build a little scaffolding. At Instagram, we recently had to do just that: for legacy reasons, we need to keep around a mapping of about 300 million photos back to the user ID that created them, in order to know which shard to query (see more info about our sharding setup). While eventually all clients and API applications will have been updated to pass us the full information, there are still plenty who have old information cached. We needed a solution that would:

  1. Look up keys and return values very quickly
  2. Fit the data in memory, and ideally within one of the EC2 high-memory types (the 17GB or 34GB, rather than the 68GB instance type)
  3. Fit well into our existing infrastructure
  4. Be persistent, so that we wouldn’t have to re-populate it if a server died

One simple solution to this problem would be to simply store them as a bunch of rows in a database, with “Media ID” and “User ID” columns. However, a SQL database seemed like overkill given that these IDs were never updated (only inserted), didn’t need to be transactional, and didn’t have any relations with other tables.

Instead, we turned to Redis, an advanced key-value store that we use extensively here at Instagram (for example, it powers our main feed). Redis is a key-value swiss-army knife; rather than just normal “Set key, get key” mechanics like Memcached, it provides powerful aggregate types like sorted sets and lists. It has a configurable persistence model, where it background saves at a specified interval, and can be run in a master-slave setup. All of our Redis deployments run in master-slave, with the slave set to save to disk about every minute.

At first, we decided to use Redis in the simplest way possible: for each ID, the key would be the media ID, and the value would be the user ID:

SET media:1155315 939
GET media:1155315
> 939

While prototyping this solution, however, we found that Redis needed about 70 MB to store 1,000,000 keys this way. Extrapolating to the 300,000,000 we would eventually need, it was looking to be around 21GB worth of data—already bigger than the 17GB instance type on Amazon EC2.

We asked the always-helpful Pieter Noordhuis, one of Redis’ core developers, for input, and he suggested we use Redis hashes. Hashes in Redis are dictionaries that are can be encoded in memory very efficiently; the Redis setting ‘hash-zipmap-max-entries’ configures the maximum number of entries a hash can have while still being encoded efficiently. We found this setting was best around 1000; any higher and the HSET commands would cause noticeable CPU activity. For more details, you can check out the zipmap source file.

To take advantage of the hash type, we bucket all our Media IDs into buckets of 1000 (we just take the ID, divide by 1000 and discard the remainder). That determines which key we fall into; next, within the hash that lives at that key, the Media ID is the lookup key *within* the hash, and the user ID is the value. An example, given a Media ID of 1155315, which means it falls into bucket 1155 (1155315 / 1000 = 1155):

HSET "mediabucket:1155" "1155315" "939"
HGET "mediabucket:1155" "1155315"
> "939"

The size difference was pretty striking; with our 1,000,000 key prototype (encoded into 1,000 hashes of 1,000 sub-keys each), Redis only needs 16MB to store the information. Expanding to 300 million keys, the total is just under 5GB—which in fact, even fits in the much cheaper m1.large instance type on Amazon, about 1/3 of the cost of the larger instance we would have needed otherwise. Best of all, lookups in hashes are still O(1), making them very quick.

If you’re interested in trying these combinations out, the script we used to run these tests is available as a Gist on GitHub (we also included Memcached in the script, for comparison—it took about 52MB for the million keys). And if you’re interested in working on these sorts of problems with us, drop us a note, we’re hiring!.


Comments? Questions? Discuss this post at Hacker News.


Mike Krieger, co-founder

  • 7 months ago
  • 130
  • Permalink
  • Share
    Tweet

130 Notes/ Hide

  1. best-over-ear-headphones-2012 reblogged this from instagram-engineering
  2. dennisschneider reblogged this from instagram-engineering
  3. likebugs reblogged this from instagram-engineering and added:
    Really nice article...key-value storing
  4. likebugs liked this
  5. new-lego-harry-potter-sets-2012 reblogged this from instagram-engineering
  6. save-my-marriage-today-review-x liked this
  7. maspri81 liked this
  8. auto-repair-miramar reblogged this from instagram-engineering
  9. blogbourse reblogged this from instagram-engineering
  10. secure-ip-address reblogged this from instagram-engineering
  11. web-privacy-proxy-ip-hide reblogged this from instagram-engineering
  12. webproxy-privacy-ip-hide reblogged this from instagram-engineering
  13. iluqman liked this
  14. best-sex-toys-for-couples reblogged this from instagram-engineering
  15. quitar-manchas-de-las-axilas liked this
  16. quitar-manchas-de-las-axilas reblogged this from instagram-engineering
  17. como-ganhar-dinheiro-internet reblogged this from instagram-engineering
  18. live-sport liked this
  19. play-free-online-games-now reblogged this from instagram-engineering
  20. breathalyzer-reviews reblogged this from instagram-engineering
  21. buy-poppers reblogged this from instagram-engineering
  22. cougar--dating reblogged this from instagram-engineering
  23. anime-hentai liked this
  24. dating-advice--for-women reblogged this from instagram-engineering
  25. kelci-jones reblogged this from instagram-engineering
  26. alldocs liked this
  27. mtwtfssundae liked this
  28. relationship--tips reblogged this from instagram-engineering
  29. kataskeuiistoselidwn reblogged this from instagram-engineering
  30. jerembt liked this
  31. nvq-uk reblogged this from instagram-engineering
  32. sell-my-house-quickly reblogged this from instagram-engineering
  33. top-rated-drivers reblogged this from instagram-engineering
  34. play-free-online-games-now liked this
  35. custom-tiles reblogged this from instagram-engineering
  36. vancouver-events reblogged this from instagram-engineering
  37. qgifs liked this
  38. edelemos reblogged this from instagram-engineering
  39. instagram-engineering posted this
← Previous • Next →
We're sharing the tools + techniques we've learned in bringing photo-sharing to millions of people

Pages

  • Instagram Blog

Instagram on the web

  • @instagram on Twitter
  • Facebook Profile
  • RSS
  • Random
  • Archive
  • Mobile

Effector Theme by Carlo Franco.

Powered by Tumblr