Western Digital & PEAK:AIO unveil record-breaking AI storage speeds
Western Digital has released results from its MLPerf Storage V2 benchmark submission, highlighting the performance capabilities of its OpenFlex Data24 4000 Series NVMe-oF Storage Platform in partnership with PEAK:AIO's AI Data Server.
The benchmarking exercise was designed to assess how well storage systems can support complex and scaled artificial intelligence workloads, which rely increasingly on the ability of storage to keep pace with accelerated GPU infrastructure.
Real-world AI deployment
The OpenFlex Data24 platform utilises NVMe flash technology over Ethernet fabric, delivering low-latency and shared storage for disaggregated AI infrastructure. This design allows for the independent scaling of storage and compute resources, aiming to simplify deployment and control costs as demand for GPU resources increases.
In the tests, Western Digital collaborated with PEAK:AIO, a firm providing software-defined storage with capabilities to serve large volumes of data to AI workflows. The validation submission specifically made use of KIOXIA CM7-V Series NVMe SSDs, noted by the companies for their performance with demanding AI tasks.
MLPerf Storage V2 benchmarks
MLPerf Storage benchmarks are intended to simulate AI server behaviour, generating typical input/output load patterns reflecting real-world GPU workloads in distributed environments. The tests involve two principal AI training models: the 3D U-Net workload and ResNet50 workload.
The 3D U-Net model is commonly used in medical imaging, imposing intensive data-streaming demands on storage systems due to its large 3D input datasets. According to the results, Western Digital's OpenFlex Data24 achieved sustained read throughput of 106.5 GB/s (99.2 GiB/s), saturating 36 simulated H100 GPUs across three client nodes. With the PEAK:AIO AI Data Server, the system delivered 64.9 GB/s (59.6 GiB/s) and saturated 22 simulated H100 GPUs from a single head server and client node.
The ResNet-50 workload, benchmarked for image classification tasks, measures training throughput associated with high-frequency data access. OpenFlex Data24, in this case, performed across 186 simulated H100 GPUs and three client nodes. With PEAK:AIO's AI Data Server, the solution was able to saturate 52 simulated H100 GPUs from a single head server and client node.
Executive commentary
"These results validate Western Digital's disaggregated architecture as a powerful enabler and cornerstone of next-generation AI infrastructure, maximising GPU utilisation while minimising footprint, complexity and overall total cost of ownership," said Kurt Chan, Vice President and General Manager, Western Digital Platforms Business. "The OpenFlex Data24 4000 Series NVMe-oF Storage Platform delivers near-saturation performance across demanding AI benchmarks, both standalone and with a single PEAK:AIO AI Data Server appliance, translating to faster time-to-results and reduced infrastructure sprawl."
"These MLPerf results spotlight the breakthrough efficiency achieved by combining PEAK:AIO's software-defined AI Data Server with the scalability of Western Digital's OpenFlex Data24 and the performance density of KIOXIA's CM7-V Series SSDs," said Roger Cummings, President and CEO at PEAK:AIO. "Together, we're delivering high-performance AI infrastructure that's faster to deploy, more efficient to operate, and easier to scale. It's a compelling proof point that high performance no longer requires high complexity."
Infrastructure and deployment
The OpenFlex Data24 platform incorporates Western Digital RapidFlex network adapters, supporting connectivity for up to 12 hosts without requiring a network switch. The platform is intended to deliver a combination of simplified, high-performance AI infrastructure growth alongside predictable scaling capabilities without the up-front costs or power demands associated with some comparable architectures.
Western Digital states that these characteristics are suitable for both organisations beginning their AI initiatives and those looking to scale operations to hundreds of GPUs with confidence.