Skip to content

title: Pluggable Secure Container Runtime Support authors:

  • "@hittyt" creation-date: 2026-02-05 last-updated: 2026-02-09 status: implementable

OSEP-0004: Pluggable Secure Container Runtime Support

Summary

This proposal introduces secure container runtime support for OpenSandbox, enabling sandboxes to run in secure container runtimes such as gVisor, Firecracker, and Kata Containers. This provides hardware-level isolation for executing untrusted AI-generated code, protecting the host system from potential malicious behavior.

The secure runtime is configured at the server level: administrators choose a single secure runtime in the server configuration, and all sandboxes on that server transparently use it. SDK users and API callers require no code changes — the isolation upgrade is entirely an infrastructure-level decision.

Motivation

OpenSandbox is designed to execute untrusted code generated by AI models (such as Claude, GPT-4, Gemini). While standard container isolation (runc) provides process-level isolation, it may not be sufficient for scenarios where:

  1. Untrusted Code Execution: AI-generated code could potentially contain malicious behavior, including container escape attempts
  2. Multi-tenant Environments: Different users' sandboxes may require stronger isolation guarantees
  3. Compliance Requirements: Some industries require hardware-level virtualization for security compliance

Secure container runtimes like gVisor, Firecracker, and Kata Containers provide additional isolation layers:

RuntimeIsolation MechanismUse Case
gVisorUser-space kernel (syscall interception)General workloads, low overhead
FirecrackerLightweight microVMHigh isolation, fast boot
Kata ContainersFull VM with optimized hypervisorMaximum isolation, compatibility

Goals

  1. Server-Level Configuration: Secure runtime is configured once at the server level; all sandboxes use the same runtime
  2. Transparent to SDK Users: No SDK or API changes required — upgrading isolation is purely an infrastructure decision
  3. Dual-Mode Compatibility: Work seamlessly in both Local Docker and Kubernetes deployment modes
  4. Graceful Fallback: Default to standard runc when no secure runtime is configured
  5. Validation: Verify runtime availability at server startup and before sandbox creation, with clear error messages

Non-Goals

  1. Runtime Installation: OpenSandbox will not install or configure secure container runtimes; this is the responsibility of infrastructure administrators
  2. Per-Request Runtime Selection: SDK users cannot choose or override the secure runtime on a per-sandbox basis; this is an infrastructure-level decision managed by administrators
  3. Runtime-Specific Features: Exposing all features of each secure runtime (e.g., gVisor platforms, Kata hypervisors) is out of scope for the initial implementation
  4. Performance Optimization: Tuning secure runtimes for optimal performance is left to operators
  5. Multiple Runtimes on One Server: A single server instance supports exactly one secure runtime; mixed runtimes require separate server deployments

Requirements

IDRequirementPriority
R1Server configuration defines the secure runtime for all sandboxesMust Have
R2Support gVisor, Kata (including Firecracker backend) as runtime typesMust Have
R3Validate runtime availability at server startupMust Have
R4Work in both Docker and Kubernetes modesMust Have
R5Default to runc when no secure runtime is configuredMust Have
R6Clear error messages when configured runtime is unavailableShould Have
R7No SDK or API changes required for existing usersShould Have

Proposal

We propose adding a [secure_runtime] section to the server configuration file (~/.sandbox.toml). When configured, all sandboxes on that server transparently run in the specified secure runtime. No changes to the Sandbox Lifecycle API or SDKs are required.

Server Config                              Backend
┌──────────────────────┐                ┌─────────────────┐
│ [secure_runtime]     │                │ Docker:         │
│ type = "gvisor"      │     ┌────→     │   --runtime=    │
│ docker_runtime       │     │          │     runsc       │
│   = "runsc"          │─────┤          ├─────────────────┤
│ k8s_runtime_class    │     │          │ Kubernetes:     │
│   = "gvisor"         │     └────→     │   runtimeClass- │
│                      │                │     Name: gvisor│
└──────────────────────┘                └─────────────────┘

         │ Infrastructure admin configures once
         │ SDK users require NO code changes

Notes/Constraints/Caveats

  1. Infrastructure Dependency: Secure runtimes must be pre-installed and configured on the host (Docker) or cluster (Kubernetes) before use

  2. Performance Overhead: Secure runtimes add latency and resource overhead compared to runc:

    • gVisor: ~10-50ms additional startup, minimal memory overhead

    • Kata Containers: Performance varies significantly by hypervisor backend:

      HypervisorCold StartMemory OverheadNotes
      QEMU~500ms~20-50MBDefault, most feature-complete
      Cloud Hypervisor (CLH)~200ms~10-20MBLightweight, Rust-based
      Firecracker~125ms~5MBMinimal footprint, limited features
      Dragonball~100-200ms~10MBOptimized for cloud-native

      The actual hypervisor is determined by the RuntimeClass handler configured by the SRE administrator (e.g., kata-qemu, kata-clh, kata-fc).

      Note: Firecracker is not a standalone OCI runtime. In this OSEP, secure_runtime="firecracker" maps to Kata Containers with the Firecracker hypervisor backend (kata-fc). See Server Configuration for details.

  3. Compatibility: Not all container images work with all secure runtimes:

    • gVisor: Some syscalls may not be implemented; check gVisor compatibility
    • Kata (QEMU/CLH): Generally most compatible but highest overhead
    • Kata + Firecracker (kata-fc): Limited device support; some workloads requiring specific kernel features may not work
  4. execd Injection: The execd binary injection mechanism must work within secure runtime constraints

  5. Pooled Sandbox Consistency (Kubernetes): In Kubernetes mode with resource pools (Pool CRD), the Pool's runtimeClassName must match the server's [secure_runtime] configuration. Since both are managed by the same SRE administrator, this is an operational requirement validated at server startup.

Risks and Mitigations

RiskImpactMitigation
Runtime unavailable at creation timeSandbox creation failsPre-validation with clear error messages
Syscall compatibility issuesApplication may not workDocument known limitations per runtime
Performance degradationSlower sandbox creationAllow users to choose based on security/performance tradeoff
Configuration complexityOperational burdenProvide sensible defaults and clear documentation

Design Details

Note: Code snippets in this section are illustrative and demonstrate the design intent. Actual implementation may differ in structure and details.

API and SDK Impact

No changes to the Sandbox Lifecycle API or SDKs are required.

The CreateSandboxRequest schema remains unchanged. The secure runtime is applied transparently by the server based on its configuration. Existing SDK code works as-is:

python
# This code works identically whether the server uses runc or gVisor.
# The SDK user does not need to know or care about the secure runtime.
sandbox = await Sandbox.create(
    image="python:3.11",
    entrypoint=["python", "-c", "print('hello')"],
)

This is a key advantage of server-level configuration: upgrading from runc to gVisor is a pure infrastructure change that requires zero application code modifications.

Server Configuration

Extension to ~/.sandbox.toml. A single [secure_runtime] section configures the secure runtime for all sandboxes on this server:

toml
[runtime]
type = "docker"  # or "kubernetes"
execd_image = "opensandbox/execd:v1.0.5"

# Secure container runtime configuration.
# When enabled, ALL sandboxes on this server use the specified runtime.
# Comment out or leave type empty to use standard runc.
[secure_runtime]
# Runtime type identifier. Supported values:
#   "gvisor"      - gVisor (runsc), user-space kernel isolation
#   "kata"        - Kata Containers (QEMU backend), VM-level isolation
#   "firecracker" - Kata Containers with Firecracker backend (K8s only)
#   ""            - Standard runc (default, no secure runtime)
type = ""

# Docker mode: --runtime parameter name
# Ignored when runtime.type = "kubernetes"
docker_runtime = "runsc"

# Kubernetes mode: pod.spec.runtimeClassName value
# Ignored when runtime.type = "docker"
k8s_runtime_class = "gvisor"

Configuration examples (pick ONE per server, these are separate config files):

Example 1 — gVisor on Docker:

toml
# ~/.sandbox.toml
[runtime]
type = "docker"
execd_image = "opensandbox/execd:v1.0.5"

[secure_runtime]
type = "gvisor"
docker_runtime = "runsc"
k8s_runtime_class = "gvisor"

Example 2 — Kata Containers (QEMU) on Kubernetes:

toml
# ~/.sandbox.toml
[runtime]
type = "kubernetes"
execd_image = "opensandbox/execd:v1.0.5"

[secure_runtime]
type = "kata"
docker_runtime = "kata-runtime"
k8s_runtime_class = "kata-qemu"

Example 3 — Kata + Firecracker on Kubernetes:

Firecracker is a VMM, not an OCI runtime. It cannot serve as a CRI implementation directly. This OSEP recommends using Firecracker via Kata Containers (kata-fc handler), which is the mature, production-ready approach. The alternative (firecracker-containerd) is less actively maintained and not recommended.

toml
# ~/.sandbox.toml
[runtime]
type = "kubernetes"
execd_image = "opensandbox/execd:v1.0.5"

[secure_runtime]
type = "firecracker"
docker_runtime = ""              # Not supported in Docker mode
k8s_runtime_class = "kata-fc"

Infrastructure Prerequisites

OpenSandbox does not install secure runtimes. The following must be configured by infrastructure administrators.

Docker Mode - gVisor Setup

Step 1: Install gVisor runsc

bash
# Ubuntu/Debian
curl -fsSL https://gvisor.dev/archive.key | sudo gpg --dearmor -o /usr/share/keyrings/gvisor-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/gvisor-archive-keyring.gpg] https://storage.googleapis.com/gvisor/releases release main" | \
  sudo tee /etc/apt/sources.list.d/gvisor.list
sudo apt-get update && sudo apt-get install -y runsc

Step 2: Configure Docker daemon

json
// /etc/docker/daemon.json
{
  "runtimes": {
    "runsc": {
      "path": "/usr/bin/runsc",
      "runtimeArgs": [
        "--platform=systrap",
        "--network=host"
      ]
    }
  }
}
bash
sudo systemctl restart docker

Step 3: Verify installation

bash
docker run --runtime=runsc hello-world

Docker Mode - Kata Containers Setup

bash
# Install Kata Containers
bash -c "$(curl -fsSL https://raw.githubusercontent.com/kata-containers/kata-containers/main/utils/kata-manager.sh) install-docker-system"
json
// /etc/docker/daemon.json
{
  "runtimes": {
    "kata-runtime": {
      "path": "/usr/bin/kata-runtime"
    }
  }
}

Kubernetes Mode - RuntimeClass Setup

Cluster administrators must create RuntimeClass resources:

yaml
# gVisor RuntimeClass
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: gvisor
handler: runsc  # Matches containerd handler name

---
# Kata Containers (QEMU backend) RuntimeClass
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: kata-qemu
handler: kata-qemu

---
# Kata Containers (Firecracker backend) RuntimeClass
# This is what secure_runtime="firecracker" maps to
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: kata-fc
handler: kata-fc

containerd configuration (/etc/containerd/config.toml):

toml
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc]
  runtime_type = "io.containerd.runsc.v1"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu]
  runtime_type = "io.containerd.kata-qemu.v2"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-fc]
  runtime_type = "io.containerd.kata-fc.v2"

Runtime Resolver

The server reads [secure_runtime] at startup and resolves it to the backend-specific identifier based on the deployment mode:

python
class SecureRuntimeResolver:
    """Resolves secure runtime config to backend-specific parameters."""
    
    def __init__(self, config: AppConfig):
        self.secure_runtime = config.secure_runtime  # may be None
        self.runtime_mode = config.runtime.type       # "docker" or "kubernetes"
    
    def get_docker_runtime(self) -> Optional[str]:
        """Return Docker --runtime value, or None for runc."""
        if not self.secure_runtime or not self.secure_runtime.type:
            return None
        if not self.secure_runtime.docker_runtime:
            raise ConfigError(
                f"Secure runtime '{self.secure_runtime.type}' is not supported "
                f"in Docker mode (docker_runtime is empty)."
            )
        return self.secure_runtime.docker_runtime
    
    def get_k8s_runtime_class(self) -> Optional[str]:
        """Return K8s runtimeClassName, or None for cluster default."""
        if not self.secure_runtime or not self.secure_runtime.type:
            return None
        return self.secure_runtime.k8s_runtime_class

Startup Validation

The server validates the configured secure runtime at startup, failing fast if the runtime is unavailable:

python
def validate_secure_runtime_on_startup(config: AppConfig, docker_client=None, k8s_client=None):
    """Validate secure runtime availability at server startup."""
    sr = config.secure_runtime
    if not sr or not sr.type:
        logger.info("No secure runtime configured; using standard runc.")
        return
    
    if config.runtime.type == "docker":
        if not sr.docker_runtime:
            raise ConfigError(
                f"secure_runtime.type='{sr.type}' but docker_runtime is empty. "
                f"This runtime is not supported in Docker mode."
            )
        info = docker_client.info()
        available = info.get("Runtimes", {}).keys()
        if sr.docker_runtime not in available:
            raise ConfigError(
                f"Docker runtime '{sr.docker_runtime}' is not available. "
                f"Available runtimes: {list(available)}. "
                f"Please install and configure it in /etc/docker/daemon.json."
            )
    else:  # kubernetes
        try:
            k8s_client.read_runtime_class(sr.k8s_runtime_class)
        except ApiException as e:
            if e.status == 404:
                raise ConfigError(
                    f"RuntimeClass '{sr.k8s_runtime_class}' does not exist. "
                    f"Please create it in the cluster."
                )
            raise
    
    logger.info(f"Secure runtime '{sr.type}' validated successfully.")

Docker Mode Implementation

Changes to server/src/services/docker.py. The runtime is read from server config, not from the request:

python
class DockerSandboxService(SandboxService):
    def __init__(self, config: Optional[AppConfig] = None):
        # ... existing initialization ...
        self.resolver = SecureRuntimeResolver(self.app_config)
        # Runtime is resolved once at init; already validated at startup
        self.docker_runtime = self.resolver.get_docker_runtime()
    
    async def create_sandbox(self, request: CreateSandboxRequest) -> CreateSandboxResponse:
        # ... existing code ...
        
        container = self.docker_client.containers.run(
            image=request.image.uri,
            # ... other parameters ...
            runtime=self.docker_runtime,  # "runsc", "kata-runtime", or None
        )

Kubernetes Mode Implementation

Both Kubernetes workload providers inject runtimeClassName from server config. The runtimeClassName is resolved once at service initialization (already validated at startup).

BatchSandboxProvider

Changes to server/src/services/k8s/batchsandbox_provider.py:

  • CRD: sandbox.opensandbox.io/v1alpha1 BatchSandbox
  • Pod spec path: spec.template.spec
python
class BatchSandboxProvider:
    def __init__(self, config: AppConfig, ...):
        # ... existing initialization ...
        self.resolver = SecureRuntimeResolver(config)
        self.runtime_class = self.resolver.get_k8s_runtime_class()
    
    def create_workload(self, request: CreateSandboxRequest, ...):
        # ... existing code ...

        if self.runtime_class:
            runtime_manifest["spec"]["template"]["spec"]["runtimeClassName"] = self.runtime_class
        
        # ... template merge ...

AgentSandboxProvider

Changes to server/src/services/k8s/agent_sandbox_provider.py:

  • CRD: agents.x-k8s.io/v1alpha1 Sandbox
  • Pod spec path: spec.podTemplate.spec
python
class AgentSandboxProvider:
    def __init__(self, config: AppConfig, ...):
        # ... existing initialization ...
        self.resolver = SecureRuntimeResolver(config)
        self.runtime_class = self.resolver.get_k8s_runtime_class()
    
    def create_workload(self, request: CreateSandboxRequest, ...):
        # ... existing code ...

        pod_spec = self._build_pod_spec(request, ...)
        if self.runtime_class:
            pod_spec["runtimeClassName"] = self.runtime_class

        runtime_manifest["spec"]["podTemplate"]["spec"] = pod_spec
        # ... template merge ...

Provider Comparison

AspectBatchSandboxProviderAgentSandboxProvider
CRD KindBatchSandboxSandbox
Pod Spec Pathspec.template.specspec.podTemplate.spec
Pool SupportYes (poolRef)No
Runtime SourceServer configServer config

Pooled Sandbox Consistency

In Kubernetes mode with resource pools (Pool CRD), the Pool's runtimeClassName must match the server's [secure_runtime] configuration. Since both are managed by the same SRE administrator, this is an operational requirement.

Pool configuration by SRE administrator:

yaml
apiVersion: sandbox.opensandbox.io/v1alpha1
kind: Pool
metadata:
  name: gvisor-pool
spec:
  template:
    spec:
      runtimeClassName: "gvisor"  # Must match server's secure_runtime.k8s_runtime_class
      containers:
      - name: sandbox-container
        image: python:3.11
  capacitySpec:
    bufferMax: 10
    bufferMin: 2
    poolMax: 20
    poolMin: 5

The server validates this consistency at startup. If the Pool's runtimeClassName does not match the server config, the server logs a warning and refuses to use that pool.

Compatibility Matrix

Secure RuntimeLocal DockerKubernetesNotes
gVisor (runsc)Full supportFull supportDocker --runtime=runsc; K8s via RuntimeClass
Kata ContainersFull supportFull supportDocker --runtime=kata-runtime; K8s via RuntimeClass
FirecrackerNot supportedVia Kata (kata-fc)Not a Docker OCI runtime; use Kata with Firecracker hypervisor backend in K8s
Custom runtimesVia configVia RuntimeClassRequires pre-installation

Test Plan

Unit Tests

Test CaseDescription
Config parsingVerify SecureRuntimeConfig correctly parses TOML
Resolver (Docker)Verify get_docker_runtime() returns correct value or None
Resolver (K8s)Verify get_k8s_runtime_class() returns correct value or None
Empty type handlingVerify fallback to runc when type = ""
Firecracker in DockerVerify error when docker_runtime is empty in Docker mode

Integration Tests

Test CaseDescription
Startup validation (Docker)Server fails to start when configured runtime not in Docker daemon
Startup validation (K8s)Server fails to start when RuntimeClass doesn't exist
Docker + gVisorCreate sandbox on Docker host with [secure_runtime] type = "gvisor"
Docker + KataCreate sandbox on Docker host with [secure_runtime] type = "kata"
K8s + gVisorCreate sandbox in cluster with gVisor RuntimeClass
K8s + kata-fcCreate sandbox in cluster with kata-fc RuntimeClass
Pool consistencyServer warns when Pool runtimeClassName doesn't match config

E2E Tests

Test CaseDescription
SDK unaware of runtimeSDK creates sandbox without any runtime parameter; runs in gVisor
Runtime isolation verificationVerify syscall interception in gVisor sandbox
Fallback behaviorVerify standard runc when [secure_runtime] not configured
execd injection under gVisorVerify execd binary injection works within gVisor runtime

Drawbacks

  1. Operational Complexity: Administrators must install and configure secure runtimes
  2. Performance Overhead: Secure runtimes add startup latency and memory overhead
  3. Compatibility Issues: Some workloads may not work with certain runtimes
  4. Documentation Burden: Requires comprehensive setup guides for each runtime

Alternatives

Alternative 1: Per-Request Runtime Selection

Approach: Add a secureRuntime field to CreateSandboxRequest, allowing SDK users to choose the runtime per sandbox (e.g., secure_runtime="gvisor").

Pros:

  • Maximum flexibility for users
  • Different sandboxes can use different runtimes on the same server
  • Supports mixed security levels (trusted vs untrusted workloads)

Cons:

  • Secure runtime is fundamentally an infrastructure decision, not a per-request decision
  • API callers could potentially downgrade security
  • Adds complexity to SDK and API surface
  • Most deployments only use one runtime; per-request selection is rarely needed

Decision: Rejected. Secure runtime selection is an infrastructure-level concern that should be managed by administrators, consistent with how Docker (daemon.json) and Kubernetes (RuntimeClass) handle runtime configuration. Per-request selection may be revisited as a future enhancement if demand arises.

Alternative 2: Automatic Runtime Detection

Approach: Automatically detect and use the most secure available runtime.

Pros:

  • Zero configuration
  • Always uses best available isolation

Cons:

  • Unpredictable behavior across environments
  • May break workloads with runtime incompatibilities
  • Performance impact without administrator consent

Decision: Rejected. Explicit administrator choice is preferred for security/performance tradeoffs.

Infrastructure Needed

  • Testing Environments:

    • Docker host with gVisor (runsc) configured
    • Docker host with Kata Containers (kata-runtime) configured
    • Kubernetes cluster with gVisor RuntimeClass (runsc)
    • Kubernetes cluster with Kata QEMU RuntimeClass (kata-qemu)
    • Kubernetes cluster with Kata + Firecracker RuntimeClass (kata-fc)
  • CI/CD Updates:

    • Add integration tests for secure runtime validation
    • Add E2E tests with gVisor-enabled environment
  • Documentation:

    • User guide: How to use secure runtimes
    • Admin guide: How to set up gVisor/Kata/Firecracker
    • API reference updates

Upgrade & Migration Strategy

Backward Compatibility

  • No API breaking changes: CreateSandboxRequest schema is unchanged
  • No SDK changes: Existing SDK code works as-is
  • Default behavior unchanged: Without [secure_runtime] config, sandboxes use standard runc
  • Existing configurations work: The new [secure_runtime] section is optional

Migration Path

  1. Phase 1: Install and configure secure runtime on infrastructure (Docker daemon or K8s RuntimeClass)
  2. Phase 2: Add [secure_runtime] section to server configuration
  3. Phase 3: Restart server — all sandboxes now use the secure runtime
  4. No SDK or application code changes required at any phase

Documentation Updates

  • Add infrastructure setup guide for gVisor/Kata/Firecracker
  • Add server configuration reference for [secure_runtime]
  • Add troubleshooting guide for runtime compatibility issues

此页内容来自仓库源文件:oseps/0004-secure-container-runtime.md