The Big Interview: Goldman Sachs’ Chief Data Officer (CDO) Neema Raphael
“We don’t have 200-person governance teams running around building PowerPoint; we have engineers actually understanding the data lineage…”
It’s 2008 and Lehman Brothers is collapsing in what will be the biggest corporate failure in history. There’s blood in the water and markets are in vertiginous freefall. In Goldman Sachs, software engineer Neema Raphael has been seconded to a team trying to assess Goldman’s exposure to the 164-year-old financial services firm. In every bank globally teams are doing the same thing. Some, he recalls, “are going through their contracts – literally in filing cabinets – like ‘did we do this trade with Lehman?’”
Goldman Sachs’ “SecDB” (securities database) – a proprietary software system first developed back in 1993 – means that “we had all of our derivatives trades available in one place and we were able to assess our exposure in a data-driven way” Raphael recalls in a call with The Stack. “That's what piqued my interest in data. It was a revelation to me, what data could do for the firm. I just thought, ‘Oh My God, this is the thing!”
The data-centric project to assess counterparty risk and exposure went on to win a prestigious internal Goldman Sachs award; one typically handed to trailblazing deal makers. For Raphael, it catapulted him further into a career at the intersection of engineering and data that now sees him sit as Chief Data Officer (CDO) and Head of Data Engineering at the bank.
It’s a role straddling a set of responsibilities that combines what in many organisations is still a traditionally defensive CDO role, with a more creative and technology-centric one leading a team of engineers and data architects who interface closely and creatively with lines of business.
He joined The Stack to talk engineering, governance, and asking “why?” (This conversation has been lightly edited for brevity and running order.)
What does your team do?
"We very much on purpose combined our data engineering and CDO roles to make sure that the CDO role is an engineering function. At a very macro level, I talk about my team in three stripes. One is building data platforms. So we have a platform team – we actually open-sourced our whole data platform called Legend that we've been building over the last 10 years at Goldman. It's on GitHub and we gave it to a non-profit, FINOS.
“The second stripe, I call it content and curation.
“We have a team that manages content and curates it for the firm. So things real-time market data; some of our core data services, like our securities master, client master; things that the whole firm uses. Then the third stripe with a CDO hat on is our data governance office, where we are obviously very focused on data quality and data governance."
How do you interface with other folks on platform engineering or what might traditionally be a CTO office?
The way that engineering is set up at Goldman is that we have a horizontal team called core engineering. So I sit in the core engineering horizontal. Then we interface extensively with the lines of businesses.
So that core team basically builds ‘tech for tech’ and that rolls up to the CTO and CIO. You can think of it as tech platforms mostly…
When you took on this role in 2018 what were your immediate priorities on the data front?
I wanted governance to be built into the tech and the platforms. That was my thesis: If we do data right at Goldman, if we build the right platforms, the right culture, the right tools, the right framework, the right policies, all that can be wrapped in engineering, so when people do the thing they need to do with data, the governance functions are just built in.
“That’s really the only way to scale these functions, right? We don't have 200-person governance teams running around building PowerPoint; we have engineers, actually understanding the data lineage; describing that in the platform; describing their data quality rules in engineering terms.
“Goldman management, David Solomon, John Waldron want to instil this OneGS culture, which is an incredible thing that they've done for the firm; that idea that you bring all of Goldman Sachs to your clients instead of business silos like, 'I'm a trader, I'm an investment manager' or whatever.
“But we didn't have the data architecture to get there, four or five years ago. We had some data of our clients here. Some data of our clients there, they didn't talk to each other. [Our priority was to] think about how we could power our salesforce or client-facing people with data to make this ethos a reality. So we went out, we worked with all the lines of business, understood their data, understood their data about clients, what a client even meant to them, and we codified all that in Legend.
"Now we have this 360 customer view, across the lines of businesses, which has been incredibly powerful. A similar example is counterparty risk. In 2008 it was a big deal. It's coming back a little bit now. And the fact that now we have that centralised data, with a push of a button can see what our exposure is to all of our clients and counterparties?
“Again, that’s super powerful for risk management purposes…”
Delivering that change must have been challenging. Institutional inertia and change fatigue are very real things... What's your approach to change management?
“There's engineering culture change, then there's non-engineering culture change. The marriage and the intersection of those two is not so simple… (Laughs) I think aligning incentives is most important: Showing/driving/executing on our real tangible business outcomes...
“Yes, we had to come in and put policies and procedures in place; the standard stuff that’s just table stakes. But then within that framework we went around and talked to the lines of business, asking ‘what's actually important to you?’ ‘What business outcome are you trying to drive?’
“That’s where the engineering comes in, because we get hands-on-keyboards with people. We're like, ‘hey, we could actually help you solve that problem within our framework’, and then [you] build trust that way; build some credibility across the business….”
What is Legend to you and why does it excite you?
Legend, to me is an ethos: treating data as a first class asset. When you talk to software engineers, no one questions ‘why do we have GitHub?’ ‘Why do you have to put your code in version control?’ ‘Why does your code need to be reproducible?’ ‘Why do you need to do CI/CD?’ ‘Why do you need these great IDEs?’ It's just baked in. It makes software engineers better and it makes them do their work faster. It makes them have the guardrails to do the right things at the right time. Legend is that for data.
Legend is about bringing a software engineering ethos and tools and capabilities like culture, to data. Data management, data governance, data curation, data pipelines, all the things that a data engineer does, right? That's the million foot view of what legend is. It gives a platform and tools and capabilities to work with data as a first class asset.
“Legend is thinking about the data upfront, thinking about your data flows, thinking about your data structures, thinking about your data architecture, thinking about the relationship of that data to other pieces of data, thinking about what that data actually means and then getting to single sources of truth. Or each what we call ‘data domain’. This has been super successful; like if you want all the credit counterparty risk for all of Goldman Sachs, here is the data set. It's been understood, it’s been given semantic meaning, it's been linked to the other important datasets…”
Talking of platforms, the ones you are using have evolved a lot. Are there any tools that stand out for you?
I’m going to stay away from naming companies, but there has been a major shift in data infrastructure, from the cloud hyperscalers to some very big SaaS players on top of those clouds that have really changed the game in data engineering, data science, data analytics – where we just didn't have those tools and capabilities at all on-prem. We were very bound by what we had in our data centres, what we could buy off the shelf, we didn't have the economies of scale that these players have… There was no hope that we could centralise all our data in an infinitely elastic data infrastructure 10 years ago, that just wasn't even a thing!”
How do you balance the adoption of SaaS and DIY?
“Someone much smarter than me once told me, ‘you hire software engineers, you get software.’ Software engineers want to build software! I think it's this incentive-aligning thing again. I tell my team that deleting one line of code is worth 5X more than writing one new line of code; we need to focus on quality over quantity and what differentiates ourselves.
What about legacy technology? SecDB for example has real heritage. Any major rebuilds underway across the estate your team uses particularly?
SecDB has been one of the biggest, most successful technology platforms, I think, not just within Goldman Sachs, but in all of finance, for 30 years.
There’s ongoing innovation and retooling all the time but there’s not some big project to modernise SecDB. It’s just a platform. It exists. We work on it. It's a huge competitive advantage for us. And we continue to invest in it. This ‘legacy versus non-legacy’ thing, I think is a misnomer.
Yes, we built our data platform 10 years ago, but we continue to invest in it, innovate, evolve. That's one of the reasons we open-sourced Legend, so that we can build a community around it to help with the innovation. To me, it's a perpetual cycle of investing in innovation and incrementally building as new infrastructure as new paradigms come about…”
Talking of new paradigms, LLMs: We’ve seen Bloomberg build one. With Goldman’s data in such good shape, is that something you will be doing too?
Our job is to get the data organised, safe, accessible, available, curated and high quality. The things people do on top of that? Today it might be LLMs. Tomorrow it will be some other AI thing. And we got to be ready for all of that. But my team’s job is focused on the data: Make Goldman Sachs data-driven, democratise data, make sure it's high quality. Sky's the limit!
Wrapping up, any lessons from your career you’d like to share?
Casting my mind back to 2008, this other team came to my team, they were like ‘hey, we need this data’ and I was the first to raise my hand.
“That’s not a tech skill, but also that's the biggest tech skill: jumping in, being able to solve problems that people care about… using technology as a tool in the toolbox to do that in a scalable, efficient, fast, easy way.
“That mindset is the thing that I really brought. The unsolicited advice that I give to new joiners, especially junior people, is ‘ask why’ and ask it a lot!’ Ask why and for whom. What is it going to do? Who is going to help? Is it going to help a client? Is it going to help us save some money? Is it going to help us reduce the risk? Understand why and for whom. And work incrementally, which is very hard. In technology, great software engineers want to build the best system created in all time, ever! You have to teach people that you could get there incrementally. You don't have to go into a cave, work for two years and emerge with something! Work on something with someone. Show them at 10%: ‘Hey, I know we're not done yet. But here's 10%. Am I on the right track?’ And finally, align incentives with the people that you work with, you work for and your users, so that your success is their success and importantly that your failure is their failure.
How do you switch off from work?
Outside of work: I have a one-and-a-half-year-old. So I now spend all of my free time hanging out with that dude, which I love. I'm also like a technologist at heart. So I like to tinker. I like to learn new things. I like to mess around with Python or the newest open source stuff. I like to figure out what is going on in the world of data. I'm a nerd at heart.”