Usually, when we think of firms that are leveraging Big Data analytics and methods, we think of large retailers, stuffy insurance companies and maybe the occasional dot com Internet businesses like Netflix and eBay. Chances are, few of these places explicitly encourage their Hadoop developers to actually play online and video games during the workday
Welcome to Riot Games. You would think that a game development shop would be a more relaxed place, but they have a corporate policy to recruit people who like to play games, and even have a “playfund” where every employee gets an allowance to buy their own games, expense them and more importantly, play them during working hours. “When a big release of a game comes out, our productivity takes a nosedive,” says Barry Livingston, who is the director of engineering for the Big Data group of the company. “We take play seriously, it is an important part of our culture.” Imagine charting your build schedules around the next release of Halo!
Riot created the very successful League of Legends gaming franchise. The game is conducted online, and is free to play. First off, it is widely popular. On a peak day, the game has 3 million concurrent users out of more than 32 million registered players
“We were a scrappy startup and wanted to get our game out the door. Analytics wasn’t an afterthought, but we didn’t have many resources for it initially and so started with one mySQL instance, running queries and downloading them to Excel,” said Livingston. That was fine for the first year or so, but by the summer of 2011 they experienced rapid growth and weren’t prepared for how successful their game was going to be.
Once they opened up a European base of operations, they couldn’t fit all of their data into one instance of mySQL. “So we created a separate instance. That was a bad precedent and we needed to change that. We moved quickly to Hadoop as a scalable low-cost storage system. We use Hive to overlay an SQL-type interface on top of the Hadoop File System.” That helped scale up, but “the downside is that it takes a long time to spin up to do your queries, some taking a minute or more to complete, so it is difficult to iterate and build complex queries using Hive.”
When you think about all the millions of people playing the game in real time, then having to join three massive tables, with player data, game data, and session data – you begin to see how difficult a problem that Riot Games has. This activity generates more than 500 GB of structured data and over four TB of operational logs created every day.
What is interesting is that from humble beginnings, where Riot had a single analyst, they now have an entire BI team of a dozen people and a similar-sized engineering staff, spread between their headquarters office in Los Angeles and a remote office near St. Louis. “We now have tens of people here that can do Hive queries, and we want to enable more access to these kinds of ad hoc discoveries,” Livingston told me. Why St. Louis? Some of the founders grew up there, and they found that there is a lot of talent in the area. “Very big corporations based there, and we have had great luck attracting talented engineers who used to work at Mastercard or Anheuser Busch since our culture is very different. What makes it attractive is that our staff can work on something that millions of people see every day.”
Riot eventually ended up with a combination of tools that work a mix of SQL and Big Data. “We wanted to provide dashboards for our company. We want our people to think about our data when they are making decisions.” These dashboards are built using Tableau. “But it doesn’t interact with Hive very well, such as giving out stats on win rates per champion by game time. We have graphical sliders so you can interact with the data, and every time you move the slider, you get hundreds of different Map Reduce jobs. So we put mySQL in between,” Livingston said. With all this programming, note that the Riot developers have posted 60 different open source Chef and Opscode recipes among other code samples on GitHub.
All this BI work enables them to ask questions such as which game champions (or the higher-scoring players) and skins (character costumes) are popular in which particular geographic regions. Or what are the win rates of champions. “We had lots of unexpected results when we first started doing this analysis. One of the benefits of having all this data is we can be more scientific about it, and we can now check everything,” said Livingston.
They are also working on other tools that can make it easier for anyone to do their own queries and build out reports without having to know MapR and Hive query language. These dashboards aren’t just window dressing, because Riot Games is trying hard to “deeply understand our game and improve the experience for all the players,” Livingston said. “We look at our game as a living, breathing service. We are very player-focused.” Part of their challenge is to maintain a level playing field for all their players, yet constantly tweaking game play and game mechanics to make it more interesting for returning players. “We need lots of insight so that competitive play will continue to happen. We don’t want different versions of the game for pros and noobs for example.”
And when it comes to competitive play, don’t think that we are talking chump change. League of Legends has become perhaps the largest eSports competition around, according to game analysts at Forbes and others. Earlier this year, professional players competed for a three million dollar purse.
As a result, League of Legends popularity is increasing, and that means that the engineers have to plan for increasing their computing capacity far ahead of when they will actually need it. “It is very difficult to do. There is no easy way to do it. I like to try to think that far ahead, at least have some kind of plan for the next quarter. I know our needs are going to change. We try to guess and do a lot of ‘what ifs’ and give us some lead time for hardware purchases.”
If you are looking for more specifics on how Riot Games uses Hadoop and more of the technical choices they made, view their slide deck here. They told me they are hiring in both locations, provided you can get ready for some serious fun and games.