title: Pluggable Secure Container Runtime Support authors:
- "@hittyt" creation-date: 2026-02-05 last-updated: 2026-02-09 status: implementable
OSEP-0004: Pluggable Secure Container Runtime Support
- Summary
- Motivation
- Requirements
- Proposal
- Design Details
- Test Plan
- Drawbacks
- Alternatives
- Infrastructure Needed
- Upgrade & Migration Strategy
Summary
This proposal introduces secure container runtime support for OpenSandbox, enabling sandboxes to run in secure container runtimes such as gVisor, Firecracker, and Kata Containers. This provides hardware-level isolation for executing untrusted AI-generated code, protecting the host system from potential malicious behavior.
The secure runtime is configured at the server level: administrators choose a single secure runtime in the server configuration, and all sandboxes on that server transparently use it. SDK users and API callers require no code changes — the isolation upgrade is entirely an infrastructure-level decision.
Motivation
OpenSandbox is designed to execute untrusted code generated by AI models (such as Claude, GPT-4, Gemini). While standard container isolation (runc) provides process-level isolation, it may not be sufficient for scenarios where:
- Untrusted Code Execution: AI-generated code could potentially contain malicious behavior, including container escape attempts
- Multi-tenant Environments: Different users' sandboxes may require stronger isolation guarantees
- Compliance Requirements: Some industries require hardware-level virtualization for security compliance
Secure container runtimes like gVisor, Firecracker, and Kata Containers provide additional isolation layers:
| Runtime | Isolation Mechanism | Use Case |
|---|---|---|
| gVisor | User-space kernel (syscall interception) | General workloads, low overhead |
| Firecracker | Lightweight microVM | High isolation, fast boot |
| Kata Containers | Full VM with optimized hypervisor | Maximum isolation, compatibility |
Goals
- Server-Level Configuration: Secure runtime is configured once at the server level; all sandboxes use the same runtime
- Transparent to SDK Users: No SDK or API changes required — upgrading isolation is purely an infrastructure decision
- Dual-Mode Compatibility: Work seamlessly in both Local Docker and Kubernetes deployment modes
- Graceful Fallback: Default to standard runc when no secure runtime is configured
- Validation: Verify runtime availability at server startup and before sandbox creation, with clear error messages
Non-Goals
- Runtime Installation: OpenSandbox will not install or configure secure container runtimes; this is the responsibility of infrastructure administrators
- Per-Request Runtime Selection: SDK users cannot choose or override the secure runtime on a per-sandbox basis; this is an infrastructure-level decision managed by administrators
- Runtime-Specific Features: Exposing all features of each secure runtime (e.g., gVisor platforms, Kata hypervisors) is out of scope for the initial implementation
- Performance Optimization: Tuning secure runtimes for optimal performance is left to operators
- Multiple Runtimes on One Server: A single server instance supports exactly one secure runtime; mixed runtimes require separate server deployments
Requirements
| ID | Requirement | Priority |
|---|---|---|
| R1 | Server configuration defines the secure runtime for all sandboxes | Must Have |
| R2 | Support gVisor, Kata (including Firecracker backend) as runtime types | Must Have |
| R3 | Validate runtime availability at server startup | Must Have |
| R4 | Work in both Docker and Kubernetes modes | Must Have |
| R5 | Default to runc when no secure runtime is configured | Must Have |
| R6 | Clear error messages when configured runtime is unavailable | Should Have |
| R7 | No SDK or API changes required for existing users | Should Have |
Proposal
We propose adding a [secure_runtime] section to the server configuration file (~/.sandbox.toml). When configured, all sandboxes on that server transparently run in the specified secure runtime. No changes to the Sandbox Lifecycle API or SDKs are required.
Server Config Backend
┌──────────────────────┐ ┌─────────────────┐
│ [secure_runtime] │ │ Docker: │
│ type = "gvisor" │ ┌────→ │ --runtime= │
│ docker_runtime │ │ │ runsc │
│ = "runsc" │─────┤ ├─────────────────┤
│ k8s_runtime_class │ │ │ Kubernetes: │
│ = "gvisor" │ └────→ │ runtimeClass- │
│ │ │ Name: gvisor│
└──────────────────────┘ └─────────────────┘
▲
│ Infrastructure admin configures once
│ SDK users require NO code changesNotes/Constraints/Caveats
Infrastructure Dependency: Secure runtimes must be pre-installed and configured on the host (Docker) or cluster (Kubernetes) before use
Performance Overhead: Secure runtimes add latency and resource overhead compared to runc:
gVisor: ~10-50ms additional startup, minimal memory overhead
Kata Containers: Performance varies significantly by hypervisor backend:
Hypervisor Cold Start Memory Overhead Notes QEMU ~500ms ~20-50MB Default, most feature-complete Cloud Hypervisor (CLH) ~200ms ~10-20MB Lightweight, Rust-based Firecracker ~125ms ~5MB Minimal footprint, limited features Dragonball ~100-200ms ~10MB Optimized for cloud-native The actual hypervisor is determined by the
RuntimeClasshandler configured by the SRE administrator (e.g.,kata-qemu,kata-clh,kata-fc).Note: Firecracker is not a standalone OCI runtime. In this OSEP,
secure_runtime="firecracker"maps to Kata Containers with the Firecracker hypervisor backend (kata-fc). See Server Configuration for details.
Compatibility: Not all container images work with all secure runtimes:
- gVisor: Some syscalls may not be implemented; check gVisor compatibility
- Kata (QEMU/CLH): Generally most compatible but highest overhead
- Kata + Firecracker (
kata-fc): Limited device support; some workloads requiring specific kernel features may not work
execd Injection: The execd binary injection mechanism must work within secure runtime constraints
Pooled Sandbox Consistency (Kubernetes): In Kubernetes mode with resource pools (Pool CRD), the Pool's
runtimeClassNamemust match the server's[secure_runtime]configuration. Since both are managed by the same SRE administrator, this is an operational requirement validated at server startup.
Risks and Mitigations
| Risk | Impact | Mitigation |
|---|---|---|
| Runtime unavailable at creation time | Sandbox creation fails | Pre-validation with clear error messages |
| Syscall compatibility issues | Application may not work | Document known limitations per runtime |
| Performance degradation | Slower sandbox creation | Allow users to choose based on security/performance tradeoff |
| Configuration complexity | Operational burden | Provide sensible defaults and clear documentation |
Design Details
Note: Code snippets in this section are illustrative and demonstrate the design intent. Actual implementation may differ in structure and details.
API and SDK Impact
No changes to the Sandbox Lifecycle API or SDKs are required.
The CreateSandboxRequest schema remains unchanged. The secure runtime is applied transparently by the server based on its configuration. Existing SDK code works as-is:
# This code works identically whether the server uses runc or gVisor.
# The SDK user does not need to know or care about the secure runtime.
sandbox = await Sandbox.create(
image="python:3.11",
entrypoint=["python", "-c", "print('hello')"],
)This is a key advantage of server-level configuration: upgrading from runc to gVisor is a pure infrastructure change that requires zero application code modifications.
Server Configuration
Extension to ~/.sandbox.toml. A single [secure_runtime] section configures the secure runtime for all sandboxes on this server:
[runtime]
type = "docker" # or "kubernetes"
execd_image = "opensandbox/execd:v1.0.5"
# Secure container runtime configuration.
# When enabled, ALL sandboxes on this server use the specified runtime.
# Comment out or leave type empty to use standard runc.
[secure_runtime]
# Runtime type identifier. Supported values:
# "gvisor" - gVisor (runsc), user-space kernel isolation
# "kata" - Kata Containers (QEMU backend), VM-level isolation
# "firecracker" - Kata Containers with Firecracker backend (K8s only)
# "" - Standard runc (default, no secure runtime)
type = ""
# Docker mode: --runtime parameter name
# Ignored when runtime.type = "kubernetes"
docker_runtime = "runsc"
# Kubernetes mode: pod.spec.runtimeClassName value
# Ignored when runtime.type = "docker"
k8s_runtime_class = "gvisor"Configuration examples (pick ONE per server, these are separate config files):
Example 1 — gVisor on Docker:
# ~/.sandbox.toml
[runtime]
type = "docker"
execd_image = "opensandbox/execd:v1.0.5"
[secure_runtime]
type = "gvisor"
docker_runtime = "runsc"
k8s_runtime_class = "gvisor"Example 2 — Kata Containers (QEMU) on Kubernetes:
# ~/.sandbox.toml
[runtime]
type = "kubernetes"
execd_image = "opensandbox/execd:v1.0.5"
[secure_runtime]
type = "kata"
docker_runtime = "kata-runtime"
k8s_runtime_class = "kata-qemu"Example 3 — Kata + Firecracker on Kubernetes:
Firecracker is a VMM, not an OCI runtime. It cannot serve as a CRI implementation directly. This OSEP recommends using Firecracker via Kata Containers (
kata-fchandler), which is the mature, production-ready approach. The alternative (firecracker-containerd) is less actively maintained and not recommended.
# ~/.sandbox.toml
[runtime]
type = "kubernetes"
execd_image = "opensandbox/execd:v1.0.5"
[secure_runtime]
type = "firecracker"
docker_runtime = "" # Not supported in Docker mode
k8s_runtime_class = "kata-fc"Infrastructure Prerequisites
OpenSandbox does not install secure runtimes. The following must be configured by infrastructure administrators.
Docker Mode - gVisor Setup
Step 1: Install gVisor runsc
# Ubuntu/Debian
curl -fsSL https://gvisor.dev/archive.key | sudo gpg --dearmor -o /usr/share/keyrings/gvisor-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/gvisor-archive-keyring.gpg] https://storage.googleapis.com/gvisor/releases release main" | \
sudo tee /etc/apt/sources.list.d/gvisor.list
sudo apt-get update && sudo apt-get install -y runscStep 2: Configure Docker daemon
// /etc/docker/daemon.json
{
"runtimes": {
"runsc": {
"path": "/usr/bin/runsc",
"runtimeArgs": [
"--platform=systrap",
"--network=host"
]
}
}
}sudo systemctl restart dockerStep 3: Verify installation
docker run --runtime=runsc hello-worldDocker Mode - Kata Containers Setup
# Install Kata Containers
bash -c "$(curl -fsSL https://raw.githubusercontent.com/kata-containers/kata-containers/main/utils/kata-manager.sh) install-docker-system"// /etc/docker/daemon.json
{
"runtimes": {
"kata-runtime": {
"path": "/usr/bin/kata-runtime"
}
}
}Kubernetes Mode - RuntimeClass Setup
Cluster administrators must create RuntimeClass resources:
# gVisor RuntimeClass
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: gvisor
handler: runsc # Matches containerd handler name
---
# Kata Containers (QEMU backend) RuntimeClass
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: kata-qemu
handler: kata-qemu
---
# Kata Containers (Firecracker backend) RuntimeClass
# This is what secure_runtime="firecracker" maps to
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: kata-fc
handler: kata-fccontainerd configuration (/etc/containerd/config.toml):
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc]
runtime_type = "io.containerd.runsc.v1"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu]
runtime_type = "io.containerd.kata-qemu.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-fc]
runtime_type = "io.containerd.kata-fc.v2"Runtime Resolver
The server reads [secure_runtime] at startup and resolves it to the backend-specific identifier based on the deployment mode:
class SecureRuntimeResolver:
"""Resolves secure runtime config to backend-specific parameters."""
def __init__(self, config: AppConfig):
self.secure_runtime = config.secure_runtime # may be None
self.runtime_mode = config.runtime.type # "docker" or "kubernetes"
def get_docker_runtime(self) -> Optional[str]:
"""Return Docker --runtime value, or None for runc."""
if not self.secure_runtime or not self.secure_runtime.type:
return None
if not self.secure_runtime.docker_runtime:
raise ConfigError(
f"Secure runtime '{self.secure_runtime.type}' is not supported "
f"in Docker mode (docker_runtime is empty)."
)
return self.secure_runtime.docker_runtime
def get_k8s_runtime_class(self) -> Optional[str]:
"""Return K8s runtimeClassName, or None for cluster default."""
if not self.secure_runtime or not self.secure_runtime.type:
return None
return self.secure_runtime.k8s_runtime_classStartup Validation
The server validates the configured secure runtime at startup, failing fast if the runtime is unavailable:
def validate_secure_runtime_on_startup(config: AppConfig, docker_client=None, k8s_client=None):
"""Validate secure runtime availability at server startup."""
sr = config.secure_runtime
if not sr or not sr.type:
logger.info("No secure runtime configured; using standard runc.")
return
if config.runtime.type == "docker":
if not sr.docker_runtime:
raise ConfigError(
f"secure_runtime.type='{sr.type}' but docker_runtime is empty. "
f"This runtime is not supported in Docker mode."
)
info = docker_client.info()
available = info.get("Runtimes", {}).keys()
if sr.docker_runtime not in available:
raise ConfigError(
f"Docker runtime '{sr.docker_runtime}' is not available. "
f"Available runtimes: {list(available)}. "
f"Please install and configure it in /etc/docker/daemon.json."
)
else: # kubernetes
try:
k8s_client.read_runtime_class(sr.k8s_runtime_class)
except ApiException as e:
if e.status == 404:
raise ConfigError(
f"RuntimeClass '{sr.k8s_runtime_class}' does not exist. "
f"Please create it in the cluster."
)
raise
logger.info(f"Secure runtime '{sr.type}' validated successfully.")Docker Mode Implementation
Changes to server/src/services/docker.py. The runtime is read from server config, not from the request:
class DockerSandboxService(SandboxService):
def __init__(self, config: Optional[AppConfig] = None):
# ... existing initialization ...
self.resolver = SecureRuntimeResolver(self.app_config)
# Runtime is resolved once at init; already validated at startup
self.docker_runtime = self.resolver.get_docker_runtime()
async def create_sandbox(self, request: CreateSandboxRequest) -> CreateSandboxResponse:
# ... existing code ...
container = self.docker_client.containers.run(
image=request.image.uri,
# ... other parameters ...
runtime=self.docker_runtime, # "runsc", "kata-runtime", or None
)Kubernetes Mode Implementation
Both Kubernetes workload providers inject runtimeClassName from server config. The runtimeClassName is resolved once at service initialization (already validated at startup).
BatchSandboxProvider
Changes to server/src/services/k8s/batchsandbox_provider.py:
- CRD:
sandbox.opensandbox.io/v1alpha1BatchSandbox - Pod spec path:
spec.template.spec
class BatchSandboxProvider:
def __init__(self, config: AppConfig, ...):
# ... existing initialization ...
self.resolver = SecureRuntimeResolver(config)
self.runtime_class = self.resolver.get_k8s_runtime_class()
def create_workload(self, request: CreateSandboxRequest, ...):
# ... existing code ...
if self.runtime_class:
runtime_manifest["spec"]["template"]["spec"]["runtimeClassName"] = self.runtime_class
# ... template merge ...AgentSandboxProvider
Changes to server/src/services/k8s/agent_sandbox_provider.py:
- CRD:
agents.x-k8s.io/v1alpha1Sandbox - Pod spec path:
spec.podTemplate.spec
class AgentSandboxProvider:
def __init__(self, config: AppConfig, ...):
# ... existing initialization ...
self.resolver = SecureRuntimeResolver(config)
self.runtime_class = self.resolver.get_k8s_runtime_class()
def create_workload(self, request: CreateSandboxRequest, ...):
# ... existing code ...
pod_spec = self._build_pod_spec(request, ...)
if self.runtime_class:
pod_spec["runtimeClassName"] = self.runtime_class
runtime_manifest["spec"]["podTemplate"]["spec"] = pod_spec
# ... template merge ...Provider Comparison
| Aspect | BatchSandboxProvider | AgentSandboxProvider |
|---|---|---|
| CRD Kind | BatchSandbox | Sandbox |
| Pod Spec Path | spec.template.spec | spec.podTemplate.spec |
| Pool Support | Yes (poolRef) | No |
| Runtime Source | Server config | Server config |
Pooled Sandbox Consistency
In Kubernetes mode with resource pools (Pool CRD), the Pool's runtimeClassName must match the server's [secure_runtime] configuration. Since both are managed by the same SRE administrator, this is an operational requirement.
Pool configuration by SRE administrator:
apiVersion: sandbox.opensandbox.io/v1alpha1
kind: Pool
metadata:
name: gvisor-pool
spec:
template:
spec:
runtimeClassName: "gvisor" # Must match server's secure_runtime.k8s_runtime_class
containers:
- name: sandbox-container
image: python:3.11
capacitySpec:
bufferMax: 10
bufferMin: 2
poolMax: 20
poolMin: 5The server validates this consistency at startup. If the Pool's runtimeClassName does not match the server config, the server logs a warning and refuses to use that pool.
Compatibility Matrix
| Secure Runtime | Local Docker | Kubernetes | Notes |
|---|---|---|---|
| gVisor (runsc) | Full support | Full support | Docker --runtime=runsc; K8s via RuntimeClass |
| Kata Containers | Full support | Full support | Docker --runtime=kata-runtime; K8s via RuntimeClass |
| Firecracker | Not supported | Via Kata (kata-fc) | Not a Docker OCI runtime; use Kata with Firecracker hypervisor backend in K8s |
| Custom runtimes | Via config | Via RuntimeClass | Requires pre-installation |
Test Plan
Unit Tests
| Test Case | Description |
|---|---|
| Config parsing | Verify SecureRuntimeConfig correctly parses TOML |
| Resolver (Docker) | Verify get_docker_runtime() returns correct value or None |
| Resolver (K8s) | Verify get_k8s_runtime_class() returns correct value or None |
| Empty type handling | Verify fallback to runc when type = "" |
| Firecracker in Docker | Verify error when docker_runtime is empty in Docker mode |
Integration Tests
| Test Case | Description |
|---|---|
| Startup validation (Docker) | Server fails to start when configured runtime not in Docker daemon |
| Startup validation (K8s) | Server fails to start when RuntimeClass doesn't exist |
| Docker + gVisor | Create sandbox on Docker host with [secure_runtime] type = "gvisor" |
| Docker + Kata | Create sandbox on Docker host with [secure_runtime] type = "kata" |
| K8s + gVisor | Create sandbox in cluster with gVisor RuntimeClass |
| K8s + kata-fc | Create sandbox in cluster with kata-fc RuntimeClass |
| Pool consistency | Server warns when Pool runtimeClassName doesn't match config |
E2E Tests
| Test Case | Description |
|---|---|
| SDK unaware of runtime | SDK creates sandbox without any runtime parameter; runs in gVisor |
| Runtime isolation verification | Verify syscall interception in gVisor sandbox |
| Fallback behavior | Verify standard runc when [secure_runtime] not configured |
| execd injection under gVisor | Verify execd binary injection works within gVisor runtime |
Drawbacks
- Operational Complexity: Administrators must install and configure secure runtimes
- Performance Overhead: Secure runtimes add startup latency and memory overhead
- Compatibility Issues: Some workloads may not work with certain runtimes
- Documentation Burden: Requires comprehensive setup guides for each runtime
Alternatives
Alternative 1: Per-Request Runtime Selection
Approach: Add a secureRuntime field to CreateSandboxRequest, allowing SDK users to choose the runtime per sandbox (e.g., secure_runtime="gvisor").
Pros:
- Maximum flexibility for users
- Different sandboxes can use different runtimes on the same server
- Supports mixed security levels (trusted vs untrusted workloads)
Cons:
- Secure runtime is fundamentally an infrastructure decision, not a per-request decision
- API callers could potentially downgrade security
- Adds complexity to SDK and API surface
- Most deployments only use one runtime; per-request selection is rarely needed
Decision: Rejected. Secure runtime selection is an infrastructure-level concern that should be managed by administrators, consistent with how Docker (daemon.json) and Kubernetes (RuntimeClass) handle runtime configuration. Per-request selection may be revisited as a future enhancement if demand arises.
Alternative 2: Automatic Runtime Detection
Approach: Automatically detect and use the most secure available runtime.
Pros:
- Zero configuration
- Always uses best available isolation
Cons:
- Unpredictable behavior across environments
- May break workloads with runtime incompatibilities
- Performance impact without administrator consent
Decision: Rejected. Explicit administrator choice is preferred for security/performance tradeoffs.
Infrastructure Needed
Testing Environments:
- Docker host with gVisor (runsc) configured
- Docker host with Kata Containers (kata-runtime) configured
- Kubernetes cluster with gVisor RuntimeClass (
runsc) - Kubernetes cluster with Kata QEMU RuntimeClass (
kata-qemu) - Kubernetes cluster with Kata + Firecracker RuntimeClass (
kata-fc)
CI/CD Updates:
- Add integration tests for secure runtime validation
- Add E2E tests with gVisor-enabled environment
Documentation:
- User guide: How to use secure runtimes
- Admin guide: How to set up gVisor/Kata/Firecracker
- API reference updates
Upgrade & Migration Strategy
Backward Compatibility
- No API breaking changes:
CreateSandboxRequestschema is unchanged - No SDK changes: Existing SDK code works as-is
- Default behavior unchanged: Without
[secure_runtime]config, sandboxes use standard runc - Existing configurations work: The new
[secure_runtime]section is optional
Migration Path
- Phase 1: Install and configure secure runtime on infrastructure (Docker daemon or K8s RuntimeClass)
- Phase 2: Add
[secure_runtime]section to server configuration - Phase 3: Restart server — all sandboxes now use the secure runtime
- No SDK or application code changes required at any phase
Documentation Updates
- Add infrastructure setup guide for gVisor/Kata/Firecracker
- Add server configuration reference for
[secure_runtime] - Add troubleshooting guide for runtime compatibility issues
This page is sourced from:
oseps/0004-secure-container-runtime.md