With last week’s Search and Explore launch, we introduced the ability to easily find interesting moments on Instagram as they happen in the world. The trending hashtags and places you see in Explore surface some of the best, most popular content from across the community, and pull from places and accounts you might not have seen otherwise. Building a system that can parse over 70m new photos each day from over 200m people was a challenge. Here’s a look at how we approached identifying, ranking and presenting the best trending content on Instagram.

Definition of a Trend

Intuitively, a trending hashtag should be one that is being used more than usual, as a result of something specific that is happening in that moment. For example, people don’t usually post about the Aurora Borealis, but on the day we launched, a significant group of people was sharing amazing photos using the hashtag #northernlights. You can see how usage of that hashtag increases over time  in the graph below.

image

And as we wrote this blog post, #equality was the top trending hashtag on Instagram.

image

Similarly, a place is trending whenever there is an unusual number of people who share photos or videos taken at that place in a given moment. As we were writing this, the U.S. Supreme Court was also trending, as there were hundreds of people physically there sharing their support of the recent decision in favor of same-sex marriage.

image

Given examples such as the above, we identified three main elements of a good trend:

  • Popularity – the trend should be of interest to many people in our community.
  • Novelty – the trend should be about something new. People were not posting about it before, or at least not with the same intensity.
  • Timeliness – the trend should surface on Instagram while the real event is taking place.

In this post we discuss the algorithms we use and the system we built to identify trends, rank them and show them in the app.

Identifying a Trend

Identifying a trend requires us to quantify how different the currently observed activity (number of shared photos and videos) is compared to an estimate of what the expected activity is. Generally speaking, if the observed activity is considerably higher than the expected activity, then we can determine that this is something that is trending, and we can rank trends by their difference from the expected values.

Let’s go back to our #equality example in the beginning of this post. Regularly, we observe only a few photos and videos using that hashtag on an hourly basis. For #equality, starting at 07:00AM PT, thousands of people shared content using that hashtag. That means that activity in #equality was well above what we expected. Conversely, more than 100k photos and videos are tagged with #love every day, as it is a very popular hashtag. Even if  we observe 10k extra posts in a day, it won’t be enough to exceed our expectations given its historical counts.

For each hashtag and place, we store counters of how many pieces of media were shared using the hashtag or place in a five-minute window over the past seven days. For simplicity, let us focus on hashtags for now, and let’s assume that C(h, t) is the counter for hashtag at time (i.e., it is the number of posts that were tagged with this hashtag from time t-5min to time t). Since this count varies a lot between different hashtags and over time, we normalize it and compute the probability P(h, t) of observing the hashtag at time t. Given the historical counters of a hashtag (aka the time series), we can build a model that predicts the expected number of observations: C’(h, t), and similarly, compute the expected probability P’(h, t). Given these two values for each hashtag, a common measure for the difference between probabilities is the KL divergence, which in our case is computed as:  

S(h, t) P(h, t) * ln(P(h, t)/P’(h, t))

Essentially, we consider both the currently observed popularity, which is captured by P(h, t), and the novelty, computed as the ratio between our current observations and the expected baseline, P(h, t)/P’(h, t). The natural log (ln) function is used to smooth the “strength” of novelty and make it comparable to the popularity. The timeliness role is played by the parameter, and by looking at the counters in the most recent time windows, trends will be picked up in real-time.

A Prediction Problem

How do you compute the expected baseline probability given past observations?

There are several facets that can influence the accuracy of the estimate and the time-and-space complexity of the computation. Usually those things don’t get along too well - the more accurate you want to be, the more time-and-space complexity the algorithm requires. We experimented with a few different alternatives, like simply taking the count of the same hour last week, and also regression models up to crazy all-knowing neural networks. Turns out that while fancy things tend to have better accuracy, simple things work out well, so we ended up with selecting the maximal probability over the past week’s worth of measurements. Why is this good?

  • Very easy to compute and relatively low memory demand.
  • Quite aggressive about suppressing non-trends with high variance.
  • Quickly identifies emerging trends.

There are two things we over-simplified in this explanation, so let us refine our model a bit more.

First, while some hashtags are extremely popular and have lots of media, most of the hashtags are not popular, and the five-minute counters are extremely low or zero. Thus, we keep an hourly granularity for older counts, as we don’t need a five-minute resolution when computing the baseline probability. We also look at a few hours worth of data so that we can minimize the “noise” caused by random usage spikes. We noted there is a trade-off between the need to get sufficient data and how quickly we can detect trends - the longer the timeframe is, the more data we have, but the slower it will be to identify a trend.

Second, if the predicted baseline P(h, t) is still zero, even after accumulating a few hours for each hashtag, we will not be able to compute the KL divergence measure (division by zero). Hence we apply smoothing, or put more simply: If we didn’t see any media for a given hashtag in the past, we mark it as if we saw three posts in that timeframe. Why three exactly? That allows us to save a large amount of memory (>90%) while storing the counters, as the majority of hashtags do not get more than three posts per hour, so we can simply drop the counters for all of those and assume every hashtag starts with at least three posts per hour.

Ranking and Blending

The next step is to rank the hashtags based on their “trendiness,” which we do by aggregating all the candidate hashtags for a given country/language (the product is enabled in the USA for now) and sorting them according to their KL divergence score, S(h, t). We noticed that some trends tend to disappear faster than the interest around them. For instance, the amount of posts using a hashtag that is trending at the moment will naturally decrease as soon as the event is finished. Therefore, its KL score will quickly decrease, then the hashtag won’t be trending anymore, even though people usually like to see photos and videos from the underlying event of a trend for a few hours after it is over.

In order to overcome those issues, we use an exponential decay function to define the time-to-live for previous trends, or how long we want to keep them for. We keep track of the maximal KL score for each trend, say SM(h), and the time tmax where S(h, tmax) = SM(h). Then, we also compute the exponential decayed value for SM(h) for each candidate hashtag at the moment so that we can blend it with their most recent KL scores.

Sd(h, t) = SM(h) * (½)^((t - tmax)/half-life)

We set the decay parameter half-life to be two hours, meaning that SM(h) is halved every two hours. This way, if a hashtag or a place was a big trend a few hours ago, it may still show up as trending alongside the most recent trends.

Grouping similar trends

People tend to use a range of different hashtags to describe the same event, and when the event is popular, multiple hashtags that describe the same event might all be trending. Since showing multiple hashtags that describe the same event can be an annoying user experience, we group together hashtags that are “conceptually” the same.

image

For example, the figure above  (kudos to Jason Sundram) illustrates all the hashtags that were used together with #equality. It shows that #equality was trending along with #lovewins, #love, #pride, #lgbt and many others. By grouping these tags together, we can show #equality as the trend, and save the need to sift through all the different tags until reaching other interesting trends.  

There are two important tasks that need to be achieved - first, understand which hashtags are talking about the same thing, and second, find the hashtag that is the best representative of the group. There are two challenges here - first we need to capture some notion of similarity between hashtags, and then we need to cluster them in an “unsupervised” way, meaning that we have no idea how many clusters there should be at any given time.

We use the following notion of similarity between hashtags:

  • Cooccurrences - hashtags that tend to be used together, for example #fashionweek, #dress and #model. Cooccurrences are computed by looking at recent media and counting the number of times each hashtag appears together with other hashtags.
  • Edit distance - different spellings (or typos) of the same hashtag - for example #valentineday and #valentinesday - these tend not to cooccur because people rarely use both together. Spelling variations are taken care of by using Levenshtein distance.
  • Topic distribution - hashtags that describe the same event - for example #gocavs, #gowarriors - these have different spelling and they are not likely to cooccur. We look at the captions used with these hashtags and run an internal tool that classifies them into a predefined set of topics. For each hashtag we look at the topic distribution (collected from all the media captions in which it appeared) and normalize it using TF-IDF.

Our hashtag grouping process computes the different similarity metrics for each pair of trending hashtags, and then decides which two are similar enough to be considered the same. During the merging process, clusters of tags emerge, and often these clusters will also be merged if they are sufficiently similar.

Now that we have talked about the process for identifying trends at Instagram, let us take a look at how the backend was implemented in order to incorporate each component described above.

System Design

Our trending backend is designed as a stream processing application with four nodes, which are connected in a linear structure like an assembly line, as depicted in the following diagram:

image

Each node consumes and produces a stream of “log” lines. The entry point receives  a stream of media creation events, and the last node outputs a ranked list of trending items (hashtags or places). Each node has a specific role as follows:

  • pre-processor - the original media creation events holds metadata about the content and its creator, and in the pre-processing phase we fetch and attach to it all the data that is needed in order to apply quality filters in the next step.
  • parser - extracts the hashtags or place used in a photo or video and applies quality filters. If a post violates our standards, it is not counted towards trending.
  • scorer - stores time-aggregated counters for each trend. This is also where our scoring function S(h, t) is computed. Every few minutes, a line with the current value for S(h, t) is emitted.
  • ranker - aggregates and ranks all the candidate trends and their trending scores.

Our system processes and stores a large amount of data in real-time, which should be efficient and tolerant to failures. This stream-lined architecture enables us to partition the trends and launch multiple instances of each node, such that each node stores a smaller amount of data, and the trends are processed in parallel. Furthermore, failures are isolated to specific partitions, so if one instance fails, trending would not be entirely compromised.

So far, we discussed the process that computes trends. The following diagram adds the components that are responsible for serving trends to requests comping from the app:

image

As the diagram illustrates, requests coming from the Instagram apps for trending hashtags and places need to be served without imposing load on our trending backend. Therefore, trends are served from a read-through caching layer powered by memcached, and a Postgres database, in case there is a cache miss. The database is populated with fresh trends by a periodical task that pulls trends from the ranker, performs the algorithm to group similar trends together and stores the result in Postgres. This way, the Instagram app is always served with the latest trends from our caching layer, which enables us to scale our storage and our stream processing infrastructures independently.

Conclusion

When we approached trending, we tried to break this project down into smaller problems that could be tackled separately by components with a very specific function. As a result, each individual on our team was able to focus on one problem at a time before moving onto the next one. We hope the community will enjoy this update, and be able to better connect to the world as it happens.

By Danilo Resende and Udi Weinsberg

This project wouldn’t be possible without the contributions of Maxime Boucher, Thomas Dimson, David Lee, George Sebastian, Bogdan State and Bai Xiao.