IBM, Red Hat, Google Donate LLM Inference Framework

IBM, Red Hat, and Google announced at KubeCon Europe this week that they are donating llm-d, their large language model inference framework, to the Cloud Native Computing Foundation as a sandbox project. The donation represents a major effort to establish vendor-neutral infrastructure standards for serving LLMs in production, with the framework designed specifically for Kubernetes deployment at scale.

The llm-d framework achieves throughput of 120,000 tokens per second, supporting inference workloads from multiple organizations simultaneously. By donating to CNCF rather than maintaining it as proprietary software, the three companies are signaling that LLM inference infrastructure should be as standardized and interoperable as container orchestration itself.

"You need the scale, distribution, and reliability of what Kubernetes provided for the previous era."

— Priya Nagpurkar, VP AI Platform, IBM Research

The donation arrives as enterprises increasingly deploy LLMs in production, creating urgent demand for standardized inference frameworks. Currently, organizations face fragmented choices between vendor-specific solutions and open-source tools lacking enterprise support. llm-d's journey to CNCF could establish it as the de facto standard for Kubernetes-native LLM serving.

The move reflects a broader industry recognition that AI infrastructure, like container technology before it, benefits from open standards and community governance. With 10+ organizations already supporting the project, llm-d has momentum to become foundational infrastructure for the AI era, similar to how Kubernetes revolutionized container deployment. KubeCon Europe 2026 marks the beginning of this standardization effort.

120K

Tokens per second

v0.5

Current release

KubeCon EU

Announcement venue

10+

Supporting organizations

IBM, Red Hat, and Google Donate LLM Inference Framework to CNCF

Sources