How SEGA’s Felix Baker delivered a data transformation
“We have our low latency stream which goes into MongoDB; we have our main engineering pipeline, which goes to S3 and gets encrypted for GDPR purposes. Then we have a Spark stream, which pulls the data into Delta Lake..."
Felix Baker has quietly achieved something quite special at SEGA Europe – transforming a data environment from “one SQL database, and a spattering of spreadsheets” into a cloud-native environment that serves 25,000 events per second to help support gamer communities.
The softly spoken Head of Data Services at SEGA Europe joined the company in 2016 with a background in .NET programming. At first, he admits, the job was basically to “come in and maintain some of the legacy applications that were backed up with the SQL Server backend.”
“It all blew up, maybe one or two years into my employment, when we started thinking about Big Data. We implemented this really basic solution on AWS that I hacked together in Java,” he recalls drily.
“That formed the back-end of Football Manager analytics for about two years” – he says, referring to the popular game developed by Sports Interactive and published by SEGA, which has a vibrant online fanbase.
Now he’s running an infrastructure that pulls in extensive data on games and gamers to inform product development and more – built on a far more sophisticated setup that includes a “big cluster” of EC2 machines, MongoDB Atlas, Apache Spark, Delta Lake and more.
Felix Baker joined The Stack at MongoDB.local London to chat through the setup and what it achieves for both SEGA and its community.
Data analytics at Sega: How things have changed
“We have a lot of data science use cases now that we didn't used to have,” Baker says: “For instance, we’re tracking custom customer churn potential; what's the likelihood of someone leaving a game? Potentially, what can we do to keep them playing and keep them engaged?”
This more sophisticated analytics infrastructure has also allowed SEGA to build out new interactive platforms and work with its network of studios to build engagement-rich features around their games.
Some simple wins: The Football Manager Fan Club (FMFC) for example, provides exclusive promotions and forums to gamers. Work by Baker’s team means players can link their profile to an Epic or Steam account – and use a new portal to share their gaming performance with peers.
That's why we chose MongoDB as the database, for its low latency and queryability
That platform and other equivalents, says Baker, “require very low latency data. You can't have someone playing a match going into the portal and having to wait 20 minutes for that match to appear as a statistic. That's why we chose MongoDB as the database, for its low latency and queryability,” he says, detailing SEGA’s technology choices.
SEGA Europe is building out another interactive platform with Parisian game studio Amplitude, which, Baker notes, works quite differently from other studios: “They have a platform called ‘Games2Gether’ that lets its gamers step behind the scenes of the development process and work to co-create new features. SEGA is working with Amplitude to create an interactive community platform within Games2Gether where members can compare gaming performance with friends and rivals.
“Amplitude are planning on not only allowing a user to log into this portal to look at how they're progressing through the game, but also to pull the data themselves and create their own websites and visuals via APIs – so again, that’s really enriching the community,” Baker says.
Examples like this, behind the scenes, require a lot more than the handful of spreadsheets and SQL Server database he first encountered.
"Our low latency stream goes into MongoDB"
Walking The Stack through the current setup, SEGA Europe’s Head of Data Services explains: “As people are playing Sega’s games, the events are getting sent to an API in AWS, which is basically a big cluster of EC2 machines. They take that data, do some data enrichment by translating the IP address into a physical location, so we know where that player is playing, and in what part of the world. That data then goes on to Amazon Kinesis like a ‘Big Data’ pipeline, then forks in two directions.
“We have our low latency stream which goes into MongoDB; we have our main engineering pipeline, which goes to S3 and gets encrypted for GDPR purposes. Then we have a Spark stream, which pulls the data into Delta Lake – basically a massive table that contains literally every single piece of data that Sega has ever collected, which is trillions of events.
Baker says: “We then have separate Spark streams that run on top of that Delta Lake, which populate what we call our ‘Silver tables’ – tidied up tables for each of the different events that we capture. And from those tables, we have lots of different use cases, including analytics.”
Sega Europe’s Head of Data Services is particularly warm about the support his team received from MongoDB, both early on and as they adopted the MongoDB Atlas managed cloud database offering.
He tells The Stack: “Atlas has proved to be very flexible in terms of scalability. [Users can scale up and down dramatically] and Atlas provides that scalability out of the box, which is exactly what we need.
“Before using MongoDB we were using Amazon DynamoDB, for the backend to these websites. But it really lacks the flexibility and we had some quite complex queries that it just wasn't suitable for. When we first started using MongoDB, we had no experience at all with it – we’d had a presentation and thought ‘wow, that looks great; we could use this’ but we had no internal expertise, which at the time was a bit of an issue, because we couldn't just jump in and crack on and work with it.
“So the MongoDB team came in, and spent a couple of weeks with us building a loose prototype, or proof of concept alongside us. After two weeks, we had this almost fully functioning database, which was populating as we wanted it, streaming in from Kinesis. It was almost where we were with DynamoDB already. We still didn't really have the expertise, but a few members of staff and our team went for MongoDB training. They're now pretty skilled up in MongoDB, and have taken where MongoDB left us and ran with it; we now have this platform that’s completely feature rich and where we wanted it to be” he enthuses.
Delivered in partnership with MongoDB