Pure CEO: AI needs write speed and storage in place

We caught up with Pure Storage CEO Charlie Giancarlo at the company’s Accelerate event in Las Vegas this week. He gave us his view on why storage write speed is key for checkpointing in artificial intelligence (AI) workloads, why Pure aims to make corporate data available, rapidly, and in the place it does its day-to-day work, how his prediction of the demise of the hard disk drive (HDD) is on course, and why competitors can’t copy Pure’s high-capacity flash modules.

How is AI changing data storage?

What gets the most attention in the press is the big GPU [graphics processing unit] large-scale cloud AI-type environments, because they’re big deals and it’s kind of exciting to hear about tens of thousands of GPUs.

For storage it’s interesting but it’s one of the smaller opportunities. The next interesting opportunity is the AI inference market. It’s about enterprises taking these various models, LLMs [large language models] and so on, and applying them to their own environment and data – so pharma with protein folding or drug analysis, high-speed trading companies for stock analysis, big banks for frauds, telcos for network operations – and we’ve been working on building relevant integrated vertical solutions for those types of environments.

That means bundling it with software, the data preparation and curation, vector databases, etc, so their data scientists can get up and running quickly.

Also, there’s what we’re doing with Fusion. Traditionally, what a customer did was to process and curate data, copy it from relevant environments to a data warehouse or data lake and put it in Hadoop or similar to do the analysis. We think that’s silly. Why not have the data able to be accessed on the arrays on which it’s performing its primary mission just by having enough performance there and having the data be networked? 

How does Pure Storage help with AI workloads?

Now you get into the performance areas for the really large environments. Meta has 24,000 GPUs, for example. Another customer has 10,000 GPUs. And then you need to provide gigabytes or hundreds of gigabytes per second of throughput, and our FlashBlade product is able to do that.

Write speed is critically important [in AI workloads]. Not only do we have the best and most consistent write speeds, we don’t use caching models and we are extraordinarily good at write speeds
Charlie Giancarlo, Pure Storage

Something that hasn’t been talked about much – the key metric – is writes, because of checkpointing. Large LLM models can run for days and weeks, and if something goes wrong you don’t want to go right back to the beginning. So they take all the data that’s in the model and write it to non-volatile memory.

The writing is hundreds of gigabytes and the model stops at that point – the longer you stop, the longer you’re not moving forward.

So the write speed is critically important. Not only do we have the best and most consistent write speeds, we don’t use caching models and we are extraordinarily good at write speeds. It’s why Meta chose us for their research super cluster.  

At last year’s event, you predicted the demise of the spinning disk hard drive. How’s progress on that?

I can tell you a few things.

Our E family of high-density storage arrays that are priced and positioned against HDD storage environments are continuing to grow very rapidly.

I’ve been public saying I believe we will get our first design win at a hyperscaler this year, and that is specifically to replace their standard storage and in a very large storage environment.

The conversation started around replacing their disk-based storage, of which there are multiple tiers, but now they’re talking about all their online storage. 

And if we’re talking about multiple hyperscalers, as it is in any situation where there’s a lead horse, I’m reasonably confident we’ll sign that lead horse this year. You’d imagine that if we weren’t at a price point that could replace disk then a hyperscaler would be unlikely to do so.

I have literally no concern about my prior prediction.

Could the supply chain handle such a shift to flash?

A good question – and one the hyperscalers are asking. If all of them said, ‘We want to do this next year’, the supply chain couldn’t handle it. But there’s a ramp to everything. There’s a ramp to the first hyperscaler, and there’s a ramp to the second and third.

I believe the ability of the supply chain to immediately meet demand and ramp over, say, five years is there.

The HDD market’s very soft underbelly is the hyperscalers – 60% of their sales are to hyperscalers. If it starts to switch, it really starts to move the volume out of the market.

What’s to stop other suppliers making high-capacity flash modules? Huawei, for example, plans a 128TB [terabyte] flash drive for next year?

There’s nothing to stop anyone from copying our DFM [DirectFlash Module]. You can look at it, buy the same chips and build it. It’s not an SSD [solid-state drive]. It’s flash on a card with something to talk to more buses, a small micro-controller.

What’s not easy to build is the software that operates these DFMs because an SSD is something very different – it is a flash module that has hardware and software to make it look like a hard disk. 

There’s nothing to stop anyone from copying our DFM [but] what’s not easy to build is the software that operates these DFMs
Charlie Giancarlo, Pure Storage

Go back 15 years and the guys making flash said, this is great and we’d like to see it in laptops, desktops and servers. But there were many different operating systems, and flash operates very differently to spinning disk. They would have had to get all the suppliers to change their operating systems. So they decided to make flash look like a hard disk and they invented the SSD.

So the SSD has hardware and software to make flash, a semiconductor, look like a mechanical hard disk. By doing so, it doesn’t provide the same level of performance or efficiency as raw flash would.

So you have to make SSDs so you can write immediately, recover cells and make them readily available, protect against power failure and do all these things on the module itself. So it contains a lot of DRAM [dynamic random access memory], a microprocessor to do all the conversion to what a hard drive expects.

The secret sauce is in our software that operates on our controllers, or anywhere that communicates with DFMs.

You can think of it as a lot of extra translation that’s not necessary. That translation takes power, hardware that’s expensive and limits performance.