AMD launches Instinct MI350P PCIe card for AI inference
Thu, 7th May 2026 (Today)
AMD has introduced the Instinct MI350P PCIe Card for on-premises AI inference, extending its Instinct line with a PCIe option aimed at enterprise deployments.
The dual-slot GPU is designed for standard air-cooled servers, and AMD is positioning it as a drop-in part for existing racks. Businesses can deploy it without major changes to power, cooling or rack infrastructure, a constraint that often shapes decisions on whether to run AI workloads in-house or in the cloud.
The pitch is aimed at companies that want more AI processing than CPUs can provide but do not want to invest in larger accelerator platforms that may require changes to data centre design. Systems can be configured with up to eight accelerator cards, targeting inference tasks across small, medium and large AI models, including retrieval-augmented generation pipelines.
Performance focus
AMD outlined a series of performance figures for the card, while noting that they are preliminary engineering estimates. The MI350P is estimated to deliver 2,299 teraflops and up to 4,600 peak teraflops at MXFP4, alongside 144GB of HBM3E memory and memory bandwidth of up to 4TB/s.
The company is also highlighting support for a range of AI precision formats used in inference workloads. Native support for MXFP6 and MXFP4 is included, while sparsity support is available for most mainstream 8-bit and 16-bit precisions. Higher-precision formats such as INT8 and BF16 also benefit from sparsity support on the GPU.
Those specifications matter in a market where buyers are weighing throughput, energy use and the practicality of fitting AI hardware into existing facilities. Air-cooled PCIe cards can appeal to organisations that want to avoid the cost and complexity of deploying denser accelerator systems that may require liquid cooling or electrical upgrades.
Software stack
Alongside the hardware, AMD tied the launch to its broader software strategy around ROCm and what it describes as an open AI ecosystem. The MI350P supports cross-platform interoperability through open standards and is intended to work with a range of established AI tools and frameworks.
Its enterprise AI stack includes the Kubernetes GPU Operator for lifecycle management, AMD Inference Microservices and native support for frameworks such as PyTorch. AMD says that combination is meant to help customers migrate inference workloads with limited code changes, an issue that remains central for companies trying to move from pilot projects into production systems without rewriting software.
The chipmaker also said it provides its open-source enterprise AI reference stack to partners at no licensing cost. That forms part of a broader competitive argument against dependence on a single supplier's software environment, an issue that has become more visible as enterprises seek to balance speed of deployment with longer-term flexibility.
Enterprise market
The launch reflects a broader shift in the AI infrastructure market. While hyperscale cloud providers remain a major route for AI deployment, many companies are still considering on-premises options because of data control, privacy requirements and the difficulty of forecasting cloud spending for inference workloads that can vary sharply over time.
For those customers, the choice is often between sticking with CPU-based infrastructure that may not deliver enough performance, moving workloads to the cloud, or upgrading data centres for larger GPU systems. AMD is trying to place the MI350P in the middle ground by offering a form factor that can be installed in conventional servers.
The approach also broadens AMD's position in AI beyond large accelerator systems. By adding a PCIe card to the Instinct family, it is targeting buyers that may want to start with smaller on-premises roll-outs and expand over time as demand rises. That could include businesses running internal assistants, document search, customer service tools and other inference-heavy applications where latency, data location and operating cost all matter.
AMD says support for FP8, MXFP8 and MXFP4 is one reason the card can handle current AI workloads within standard air-cooled data centres. It also says enterprises can migrate workloads without code rewrites, integrate the card with existing AI pipelines and scale systems as workloads change.
The MI350P enters a market where vendors are trying to persuade enterprises that AI infrastructure does not always require a full redesign of data centre operations. AMD's central claim is that the card offers a way to add GPU inference within existing infrastructure, with up to 144GB of HBM3E memory and estimated peak performance of 4,600 teraflops at MXFP4.