The artificial intelligence industry has found its next obsession, and it does not live in a chatbot. A new class of AI systems known as world models is rapidly moving from research curiosity to billion-dollar bet, as some of the field's most prominent figures wager that teaching machines to understand the physical world will unlock capabilities that large language models alone cannot deliver.
World models are AI systems trained to produce consistent, explorable, and often interactive representations of physical environments. Unlike the text-centric LLMs that have dominated the AI landscape since 2022, world models must internalize enough about physics, geometry, and dynamics that objects behave as they would in reality. Push something off a table, and it falls. Drive a car off a cliff, and it does not float gently to the ground. That distinction, which sounds trivially obvious to a human, represents a fundamental challenge for AI systems trained primarily on language.
"The ChatGPT moment for robotics is here," NVIDIA CEO Jensen Huang declared at a recent press event, unveiling the company's latest Cosmos Predict 2.5 platform. "Breakthroughs in physical AI -- models that understand the real world, reason and plan actions -- are unlocking entirely new applications." NVIDIA's Cosmos world foundation models are purpose-built for physical AI, trained on 20 million hours of real-world robotics and driving videos, and can generate physics-based simulations from text, image, video, or robot sensor data.
The investment numbers tell their own story. In February, Stanford professor Fei-Fei Li's World Labs raised $1 billion in a round backed by AMD, Autodesk, Fidelity, and NVIDIA, among others, to scale its spatial intelligence technology. Just weeks later, Turing Award winner Yann LeCun's AMI Labs closed a staggering $1.03 billion seed round at a $3.5 billion pre-money valuation, making it the largest seed funding in European history. The round was co-led by Cathay Innovation, Greycroft, Hiro Capital, and HV Capital, with strategic backing from NVIDIA, Samsung, Toyota Ventures, and Bezos Expeditions.
Why World Models Matter Now
The core argument for world models is that LLMs, for all their linguistic fluency, do not truly understand the physical world. They can describe gravity, but they cannot reliably predict what happens when a robotic arm encounters an unexpected obstacle. As a Nature feature published this month explained, training AI on data about physical environments could dramatically improve real-world capabilities in robotics, autonomous vehicles, and industrial automation -- domains where errors carry consequences measured in broken hardware and human safety rather than garbled text.
Google DeepMind's Genie 3, released in August 2025, demonstrated the potential by using simple text descriptions to generate photorealistic environments that users could explore in real time. The system represented a leap beyond video generation, producing interactive worlds that responded to user actions with physically plausible behavior.
Li, whose World Labs released its first model Marble in November 2025, has framed the challenge in terms that distinguish it sharply from the LLM paradigm. "Building spatially intelligent AI requires world models, a new type of generative models whose capabilities of understanding, reasoning, generation and interaction with the semantically, physically, geometrically and dynamically complex worlds are far beyond the reach of today's LLMs," she wrote in a recent essay. The practical implications are immediate: robotics companies need 3D environments with collisions, physics, and dynamics to train and evaluate their systems, and world models can generate those environments at a fraction of the cost of building physical test facilities.
The Technical Architecture
LeCun's AMI Labs is pursuing world models through JEPA, or Joint Embedding Predictive Architecture, a framework that learns by predicting representations of data rather than reconstructing raw pixels or text tokens. The approach is designed to capture abstract, structural knowledge about how the world works rather than memorizing surface-level patterns. AMI's executive chairman has long argued that LLMs represent a dead end for achieving genuine machine understanding, and his departure from Meta to lead AMI underscored his conviction that the world-model approach represents a fundamentally different path.
NVIDIA, meanwhile, is building the infrastructure layer. Its Cosmos platform, combined with the GR00T N1.7 humanoid robot foundation model -- a 3-billion-parameter open vision-language-action model built on the Cosmos-Reason2-2B backbone -- uses EgoScale pretraining on over 20,000 hours of human egocentric video to train robot dexterity. Partners including Boston Dynamics, Caterpillar, Franka Robots, LG Electronics, and NEURA Robotics are already building next-generation autonomous machines on these technologies, with products expected in the second half of 2026.
Bridging the Sim-to-Real Gap
One of the most persistent problems in robotics has been the sim-to-real gap: the performance drop robots experience when moving from virtual training environments to the physical world. Cadence Design Systems and NVIDIA recently announced a partnership combining Cadence's high-fidelity multiphysics simulation engines with NVIDIA's Isaac robotics libraries and Cosmos world models, specifically targeting this problem. If world models can generate training environments that are physically accurate enough, robots could arrive in the real world already prepared for conditions their designers never explicitly programmed.
What to Watch
The convergence of over $2 billion in fresh capital, open-source model releases from NVIDIA, and competitive pressure from Google DeepMind, AMI Labs, and World Labs suggests that 2026 will be a defining year for world models. Three developments deserve close attention. First, whether AMI Labs' JEPA architecture can demonstrate advantages over the transformer-based approaches that dominate LLMs, providing evidence for LeCun's long-standing critique. Second, whether NVIDIA's Cosmos ecosystem can close the sim-to-real gap enough to accelerate commercial robot deployments beyond controlled factory environments. And third, whether the massive funding flowing into spatial intelligence translates into revenue or remains, for now, a research bet on an AI paradigm that has yet to prove it can match LLMs' commercial trajectory.
The stakes extend well beyond the AI industry. If world models deliver on their promise, the implications reach into manufacturing, logistics, healthcare, construction, and any domain where machines must navigate and manipulate the physical world. If they do not, the billions being deployed will join a long history of AI bets that arrived a decade too early. Either way, the era of physical AI has officially begun.
"The ChatGPT moment for robotics is here."— Jensen Huang, CEO, NVIDIA