Nvidia has introduced the BlueField-4 STX, a storage architecture designed to enhance AI inference performance by addressing the bottleneck in key-value cache data. The integration of a context memory layer between GPUs and traditional storage promises significant improvements in token throughput, energy efficiency, and data ingestion speed compared to conventional CPU-based storage solutions.
The STX architecture serves as a reference design for storage partners to develop AI-native infrastructure. By incorporating a dedicated context memory layer, STX optimizes the handling of KV cache data crucial for maintaining coherent working memory across AI sessions and reasoning steps.
Powered by the BlueField-4 processor, the architecture integrates Nvidia’s Vera CPU with the ConnectX-9 SuperNIC and Spectrum-X Ethernet networking. Nvidia’s DOCA software platform enables programmability, with the new CMX context memory storage platform extending GPU memory with a high-performance context layer tailored for large language models during inference.
Storage providers and cloud companies, including IBM, Dell Technologies, Oracle, and others, are collaborating on STX-based infrastructure to meet the demands of AI workloads. Nvidia’s move to position STX as the industry standard for enterprise AI deployments highlights the increasing importance of storage architecture in optimizing AI performance.
As enterprises plan for AI infrastructure upgrades, the arrival of STX-based platforms in the latter half of 2026 offers a compelling alternative to traditional storage solutions. With major storage vendors already onboard, businesses can expect tailored STX options to be available through existing vendor relationships, ushering in a new era of AI-optimized storage solutions.
Source: VentureBeat