OpenAI Extends Responses API With Shell Tool, Agent Loop, and Hosted Workspaces

# OpenAI Extends Responses API With Shell Tool, Agent Loop, and Hosted Workspaces

OpenAI is turning its Responses API into a full-blown agent runtime. In a series of updates rolled out since February 2026 and expanded through April, the company has added a hosted shell tool, a built-in agent execution loop, container-based workspaces, server-side context compaction, and a reusable skills framework -- a stack of primitives that, taken together, amount to OpenAI's most aggressive move yet to own the infrastructure layer where autonomous agents actually do work.

The centerpiece is the shell tool, which gives any model called through the Responses API the ability to execute commands inside an OpenAI-managed Debian 12 container. Each container ships with Python 3.11, Node.js 22, Java 17, Go 1.23, PHP 8.2, and Ruby 3.1 pre-installed, along with a persistent `/mnt/data` working directory where files survive across tool calls. Developers who prefer to keep execution local can still do so, but the hosted option -- activated by setting the environment to `container_auto` -- means OpenAI is now in the managed-compute business, provisioning and tearing down sandboxed environments on behalf of agent workloads.

Wrapping the shell tool is a new agent execution loop baked directly into the API. Rather than requiring developers to write their own orchestration logic, the Responses API now repeats a propose-execute-observe cycle automatically: the model suggests an action, the action runs in the container, the result feeds back into the context, and the loop continues until the model determines the task is complete. It is the same pattern that powered Codex internally, now exposed as a first-class API primitive.

"The Responses API is the foundation of our next chapter of building agents," Atty Eleti, who leads API platform at OpenAI, wrote when the API first launched. The February and April extensions make good on that framing by eliminating the boilerplate that previously sat between a model and a working agent.

Long-running sessions expose a familiar problem: context windows fill up. OpenAI's answer is server-side compaction, a mechanism that compresses earlier turns of a conversation into a shorter, encrypted representation while preserving the information the model needs to keep working. Unlike naive truncation, compaction is model-aware -- the latest OpenAI models are trained to analyze prior conversation state and produce a compaction item that retains key details in a token-efficient format. The practical result is striking. E-commerce analytics platform Triple Whale reported that its agent, Moby, successfully navigated a session involving more than 5 million tokens and 150 tool calls without a drop in accuracy, a class of workload that would have been impossible with fixed context windows.

The final piece is skills, a new abstraction for packaging agent instructions into reusable, versioned bundles. A skill is a folder containing a `SKILL.md` manifest -- metadata, step-by-step instructions, and constraints -- plus any supporting resources such as API specs, UI assets, or reference data. "If system prompts are sticky notes, skills are formal recipe cards: structured, portable, and version-controlled," OpenAI wrote in its developer blog post on the feature. The framework aligns with an emerging open standard for agent skills, and early adopters are already reporting measurable gains. Enterprise AI search company Glean said it saw tool-use accuracy jump from 73 percent to 85 percent after adopting the skills framework across its agent fleet.

On April 15, OpenAI extended these ideas further into the Agents SDK with a formal sandbox abstraction, separating the harness -- the orchestration logic -- from the compute environment. Seven infrastructure providers ship as native integrations from day one: Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, and Vercel. The sandbox gives each agent an isolated Unix-like environment with a filesystem, installed packages, exposed ports, snapshots, and controlled network access, formalizing what the Responses API container pioneered into a portable, multi-provider standard.

The strategic picture is hard to miss. By bundling execution, persistence, compaction, and skills into a single API surface, OpenAI is collapsing a stack that agent developers previously had to assemble from half a dozen vendors. Container orchestration, context management, prompt engineering patterns, and infrastructure provisioning all move behind one endpoint. For startups building on the Responses API, that reduces time to a working agent from weeks to hours. For OpenAI's competitors, it raises the bar: matching the model is no longer enough if the model comes with an entire runtime attached.

The risk, as with any platform play, is lock-in. Skills, compaction, and hosted containers all create switching costs. Developers who build deeply on these primitives may find it expensive to migrate later. But in the current market, where the gap between a capable model and a capable agent remains wide, convenience is winning. OpenAI's bet is that developers will trade portability for velocity -- and that by the time they notice the trade-off, the Responses API will be too deeply embedded in their stack to leave.

What to watch: whether the skills framework gains traction as an open standard beyond OpenAI's ecosystem, whether compaction holds up under adversarial and safety-critical workloads, and whether the seven sandbox providers stay differentiated or consolidate around a single runtime. For now, OpenAI has drawn the clearest line yet between selling a model and selling an agent platform -- and it is pricing the platform into the model.

"The Responses API is the foundation of our next chapter of building agents."

— Atty Eleti, Head of API Platform, OpenAI

5M+ tokens

Max session length with compaction

150+

Tool calls in single compacted session

73% to 85%

Glean tool-use accuracy improvement

Sandbox infrastructure providers at launch

Sources