The Big Interview - Pure Storage Founder and CTO John “Coz” Colgrove
“I've talked to some of the largest companies in the world who say they budget their data centres in watts not dollars. They’ve got money flowing out of their ears, but they cannot get the power.”
As always, we ask our latest interviewee for a headshot to accompany the article – the highest resolution possible please; no fuzzy thumbnails.
“We will have a photo from a modern camera that will be several megabytes,” he says, deadpan. “We’re in the storage business…”
It takes more than a few megabytes to trouble John “Coz” Colgrove, co-founder of Pure Storage, who since the company’s 2009 launch has grown it into a $17 billion-by-market capitalisation silverback of the storage world, specialising in all-flash data offerings for the enterprise.
Coz, who holds over 170 patents, was “always attracted to ‘hard problems’, hard science,” he tells The Stack. “Storage is a set of hard problems…” We blink and he’s away, talking merrily about “erase blocks”, “read/write heads”, “FTLs”, “tombstoned” data and “garbage collection.”
Pure didn’t get to the stage where it is signing eight-figure deals on exposition about the admittedly intriguing technological challenges at this layer of the stack however. The micro-level might require patents. The macro-level does not: Customers have a lot of data to store. They want it faster than ever. They want to make it easier to manage. With power expensive and emissions surging, they also want it to be efficient.
Pure Storage thinks it has nailed the brief, with a unique approach to how it manages flash (efficient solid-state storage using flash memory chips) both at the micro-level and the macro-level. When it comes to the former challenge, it uses “raw” flash to build its DirectFlash Modules, rather than relying on buying commodity solid-state drives (SSDs); tapping a very different point in the supply chain from other solid-state array vendors.
Coz says one key reason for this is to dodge the sheer amount of abysmal firmware commodity SSDs come with. Doing flash really, really well, with incredible resilience, I/O, et al is a software challenge, he tells The Stack.
“We originally used off-the-shelf SSDs. We'd study the firmware inside them, and try to write to them in only a very flash-friendly way.
“We couldn't get all of that software inside of the drive out of the way; we couldn't get the consistency of performance we wanted... So we started doing DirectFlash, where we've sucked all that remapping up into our main controllers.. So we get more consistent I/O, much more consistent performance, and much more flash-friendly performance.”
(Most flash arrays’ software “talks” to flash modules as if it were a legacy hard drive. DirectFlash lets Pure Storage talk to it at a system-level rather than drive-level and avoids the convoluted “flash translation layer” software that undermines, he says, the performance of rivals. “It’s time for disks to die” says Coz. “And for flash to stop pretending to be a disk!” )
At the macro-layer, Pure’s platform strategy provides customers with a unified data storage platform that has a single operating environment (Purity), a single management platform (Pure1), common storage components (DFMs), utilising a Cloud Operating Model (Pure Fusion). This can satisfy all of its customers' data storage needs (block, file and object), across the entire range of the price and performance spectrum. This is made possible by deep integration between hardware and software at the array level; leading to a great experience for the customer.
The Platform Effect
High-performance storage is essential for modern applications like AI; a point that was well illustrated in the company’s last earnings call, with CEO Charles Giancarlo explaining how current data storage environments inhibit AI deployments. Firstly, arrays are selected to provide just enough performance for their primary function, leaving little performance left for AI access. Secondly, the lack of cross-array networking limits access to apps that aren’t provisioned directly on their primary compute stack.
This is the challenge Pure claims to solve with its single operating and management environment across protocols, coupled with invaluable insight into performance from its Pure1 software.
Storage Intelligence
It’s all a far cry from Coz’s first foray into technology.
Growing up in New Jersey, just around the corner from Bell Labs, he early on developed a love of maths and science. Coz caught the computer programming bug too, and remembers tinkering with a PDP-11/40 at computer club; he speaks warmly about Bell Labs’ support of his school, which had the second-ever Unix licence ever outside Bell.
Back then, anyone talking about exabytes would have been met with a blank stare. Today, of course, the advent of AI is heaping enormous pressure on the storage industry. But Coz is sanguine about AI, and not just because it drives demand: “Traditionally one of the most labour-intensive jobs was micromanaging all the arrays to optimise performance,” he says. Pure’s top-down approach and machine learning models mean that’s no longer necessary; the systems can auto-tune.
Clearly, AI is a huge driver for providers like Pure: “People get very hung up on all the GPUs and the training models, but the bigger driver is all the inference that you do afterwards,” Coz notes. Doing inference well requires access to a lot of data. That needs to be easily accessible, not in cold storage on disk; again, good news for flash providers. That’s because today’s organisations are beginning to understand the storage uplift required to free the data currently locked in different silos, and that needs to be founded on flash.
Coz illustrates the challenge with the example of an (unnamed) aerospace company which generates exabytes of flight data. “Let’s say a plane experiences turbulence; they need to compare that to all of their data from previous flights to see if the plane performed as designed. When you store those exabytes on flash, you can do those inferences immediately, but if it’s on tape or cold disk, you can’t do it at all.”
The Energy Equation
With revenues approaching $3 billion, very strong growth in subscription-based revenue as customers adopt its storage-as-a-service portfolio, Pure is in a healthy financial position. So are a lot of its customers; in addition to the boom in as-a-Service, many are cash-rich and ploughing CapEx into data centres.
But there is a planetary and a power crisis happening. Pure Storage boasts heavily about its energy efficiency, touting metrics like a claimed 85% more energy efficiency than flash rivals and 96% less space required in a DC than hybrid disk providers. Do CIOs or other customers really procure based on such things, we venture? Coz is frank: “There are some where efficiency is the big driver. There are a lot more where it should be.
It’s good for their company and it’s good for the planet.
“You also have to recognise that all of these companies have energy efficiency goals and [many have] long term [climate] targets.
“CIOs should be more focused on understanding that their board of directors and their senior most executives actually care about hitting these targets and what they can do to help hit them,” he says.
“In some places, they’re not allowing new data centres for zoning purposes; there's not enough power. I've talked to some of the largest companies in the world who say they budget their data centre in watts, not dollars. They’ve got money flowing out of their ears, but they cannot get the power. Mark Zuckerberg talks about deploying a million GPUs… Where are we going to get those watts? There’s a climate crisis, but also a power crisis. Too many people are not paying enough attention to it.”
Coz resists the temptation to say that flash - ahh! - will save every one of us. But the implication is clear: Flash has cemented its place in the data centre but there is a lot more room for further growth and the company is not done innovating yet. Coz, one suspects, is ready to chip in.