OpenSandbox Kubernetes Controller
OpenSandbox Kubernetes Controller is a Kubernetes operator that manages sandbox environments through custom resources. It provides automated sandbox lifecycle management, resource pooling for fast provisioning, batch sandbox creation, and optional task orchestration capabilities in Kubernetes clusters.
Key Features
- Flexible Sandbox Creation: Choose between pooled and non-pooled sandbox creation modes
- Batch and Individual Delivery: Support both single sandbox (for real-user interactions) and batch sandbox delivery (for high-throughput agentic-RL scenarios)
- Optional Task Scheduling: Integrated task orchestration with optional shard task templates for heterogeneous task distribution and customized sandbox delivery (e.g., process injection)
- Resource Pooling: Maintain pre-warmed resource pools for rapid sandbox provisioning
- Pause and Resume: Persist sandbox filesystem state via rootfs snapshots, releasing cluster resources between sessions
- Comprehensive Monitoring: Real-time status tracking of sandboxes and tasks
Features
Batch Sandbox Management
The BatchSandbox custom resource allows you to create and manage multiple identical sandbox environments. Key capabilities include:
- Flexible Creation Modes: Support both pooled (using resource pools) and non-pooled sandbox creation
- Single and Batch Delivery: Create single sandboxes (replicas=1) or batches of sandboxes (replicas=N) as needed
- Scalable Replica Management: Easily control the number of sandbox instances through replica configuration
- Automatic Expiration: Set TTL (time-to-live) for automatic cleanup of expired sandboxes
- Optional Task Scheduling: Built-in task execution engine with support for optional task templates
- Detailed Status Reporting: Comprehensive metrics on replicas, allocations, and task states
Resource Pooling
The Pool custom resource maintains a pool of pre-warmed compute resources to enable rapid sandbox provisioning:
- Configurable buffer sizes (minimum and maximum) to balance resource availability and cost
- Pool capacity limits to control overall resource consumption
- Automatic resource allocation and deallocation based on demand
- Real-time status monitoring showing total, allocated, and available resources
Pod Eviction
Pool supports graceful pod eviction for scenarios like node maintenance or resource reclamation:
How it works:
- Users label a pod with
pool.opensandbox.io/evictto request eviction - The controller skips pods already allocated to BatchSandbox (protecting in-use workloads)
- Idle pods are deleted, triggering the pool to replenish capacity
- Pods marked for eviction are excluded from new allocations
Custom eviction behavior: You can implement custom eviction strategies by:
- Setting
pool.opensandbox.io/eviction-handlerlabel on the Pool to select your handler - Implementing the
EvictionHandlerinterface withNeedsEviction()andEvict()methods - Registering your handler in the factory function
Task Orchestration
Integrated task management system that executes custom workloads within sandboxes:
- Optional Execution: Task scheduling is completely optional - sandboxes can be created without tasks
- Process-Based Tasks: Support for process-based tasks that execute within the sandbox environment
- Heterogeneous Task Distribution: Customize individual tasks for each sandbox in a batch using shardTaskPatches
Advanced Scheduling
Intelligent resource management features:
- Minimum and maximum buffer settings to ensure resource availability while controlling costs
- Pool-wide capacity limits to prevent resource exhaustion
- Automatic scaling based on demand
Pause and Resume (Rootfs Snapshot)
OpenSandbox supports pause and resume for Kubernetes sandboxes by persisting the container root filesystem as an OCI image.
Time ---------------------------------------------------------------->
Sandbox lifecycle: [Running]--[Pausing]--[Paused]--[Resuming]--[Running]
| |
commit rootfs rewrite template images
push to registry recreate runtime from snapshot
release pods/allocHow it works
- Pause: The server patches
BatchSandbox.spec.pause=true. The controller creates an internalSandboxSnapshot, runs a commit Job on the same node, commits the container rootfs, and pushes it to the configured OCI registry. After the snapshot is ready, the controller transitions the sameBatchSandboxtoPausedand releases runtime Pods / pooled allocations. - Resume: The server patches
BatchSandbox.spec.pause=false. The controller reads the latestSandboxSnapshot, rewrites theBatchSandboxtemplate images to the snapshot image URIs, recreates the runtime, and transitions the sandbox back toRunning. The publicsandboxIdremains stable across pause/resume cycles.
WARNING
Current pause/resume support is limited to BatchSandbox.spec.replicas=1. The OpenSandbox server creates Kubernetes sandboxes with replicas: 1; direct BatchSandbox CRs with any other replica count are rejected by the controller pause entry because the internal pause snapshot records one source Pod's container images.
The SandboxSnapshot CRD
The SandboxSnapshot CR is the central resource for pause/resume lifecycle:
| Field | Location | Description |
|---|---|---|
spec.sandboxName | Spec | Target BatchSandbox name in the same namespace |
status.phase | Status | Pending -> Committing -> Succeed / Failed |
status.conditions | Status | Ready / Failed conditions with reason and message |
status.containers | Status | Committed image URIs per container |
status.sourcePodName | Status | Pod name resolved by controller |
status.sourceNodeName | Status | Node selected for the commit Job |
Prerequisites
- OCI Registry: An accessible container registry for storing snapshot images.
- Kubernetes Secrets: Docker config secrets for push and pull access.
- Controller configuration: Configure the controller manager with snapshot registry and secret flags.
- Controller RBAC: The controller requires
secrets: getpermission (included in the Helm chart andmake manifestsoutput).
Controller Configuration
The snapshot controller supports the following command-line flags:
| Flag | Default | Description |
|---|---|---|
--snapshot-registry | "" | OCI registry prefix used for snapshot images |
--snapshot-push-secret | "" | Secret name used by commit Jobs to push snapshots |
--resume-pull-secret | "" | Secret name injected into resumed sandboxes for image pulls |
--image-committer-image | image-committer:dev | Image used for commit operations (must contain nerdctl tool) |
--commit-job-timeout | 10m | Timeout duration for commit jobs |
--snapshot-registry-insecure | false | Pass insecure registry mode to snapshot commit Jobs |
These flags are configured at controller startup. The image-committer-image must be a trusted container image with nerdctl to perform rootfs commit and push operations. Commit Jobs mount the host containerd socket on the source node, so the image effectively has node-level runtime access. Pin the image by digest or enforce a trusted registry/admission policy in production.
Quick Setup
# Create push secret
kubectl create secret docker-registry registry-snapshot-push-secret \
--docker-server=<your-registry> \
--docker-username=<user> \
--docker-password=<token>
# Create pull secret (can reuse push secret)
kubectl create secret docker-registry registry-pull-secret \
--docker-server=<your-registry> \
--docker-username=<user> \
--docker-password=<token>Then configure the controller manager with:
- --snapshot-registry=<your-registry>/sandboxes
- --snapshot-registry-insecure=true # only for HTTP/self-signed local registries
- --snapshot-push-secret=registry-snapshot-push-secret
- --resume-pull-secret=registry-pull-secretINFO
Snapshot image retention is registry-managed. Deleting a SandboxSnapshot removes the Kubernetes commit/unpause Jobs, but it does not delete pushed OCI images from the registry. Configure registry retention/GC for tags such as snap-gen<N> according to your environment.
Getting Started

Prerequisites
- go version v1.24.0+
- docker version 17.03+
- kubectl version v1.11.3+
- Access to a Kubernetes v1.21.1+ cluster
If you don't have access to a Kubernetes cluster, you can use kind to create a local Kubernetes cluster for testing purposes. Kind runs Kubernetes nodes in Docker containers, making it easy to set up a local development environment.
To install kind:
- Download the release binary for your OS from the releases page and move it into your
$PATH - Or use a package manager:
- macOS (Homebrew):
brew install kind - Windows (winget):
winget install Kubernetes.kind
- macOS (Homebrew):
After installing kind, create a cluster with:
kind create clusterKind Users
If you're using a kind cluster, you need to load the controller and task-executor images into the kind node after building them with make docker-build. Kind runs Kubernetes nodes in Docker containers and cannot directly access images from your local Docker daemon.
kind load docker-image <controller-image-name>:<tag>
kind load docker-image <task-executor-image-name>:<tag>Deployment
This project requires two separate images - one for the controller and another for the task-executor component.
Option 1: Deploy with Helm (Recommended)
Install from GitHub Release:
You can install OpenSandbox Controller directly from GitHub Releases. Check the Releases page for all available versions.
# Replace <version> with the desired version (e.g., 0.1.0)
helm install opensandbox-controller \
https://github.com/opensandbox-group/OpenSandbox/releases/download/helm/opensandbox-controller/<version>/opensandbox-controller-<version>.tgz \
--namespace opensandbox-system \
--create-namespaceCustomize Installation:
Use --set flags to customize the configuration:
helm install opensandbox-controller \
https://github.com/opensandbox-group/OpenSandbox/releases/download/helm/opensandbox-controller/0.1.0/opensandbox-controller-0.1.0.tgz \
--namespace opensandbox-system \
--create-namespace \
--set controller.replicaCount=2 \
--set controller.resources.limits.cpu=1000m \
--set controller.resources.limits.memory=512MiOr use a values file for complex configurations:
cat > custom-values.yaml <<EOF
controller:
replicaCount: 2
resources:
limits:
cpu: 1000m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
logLevel: debug
EOF
helm install opensandbox-controller \
https://github.com/opensandbox-group/OpenSandbox/releases/download/helm/opensandbox-controller/0.1.0/opensandbox-controller-0.1.0.tgz \
--namespace opensandbox-system \
--create-namespace \
-f custom-values.yamlInstall from source (for development):
Build and push your images:
shcd kubernetes make docker-build docker-push CONTROLLER_IMG=<some-registry>/opensandbox-controller:tag make docker-build-task-executor docker-push-task-executor TASK_EXECUTOR_IMG=<some-registry>/opensandbox-task-executor:tagInstall with Helm:
shhelm install opensandbox-controller ./charts/opensandbox-controller \ --set controller.image.repository=<some-registry>/opensandbox-controller \ --set controller.image.tag=<tag> \ --namespace opensandbox-system \ --create-namespace
Verify Installation:
kubectl get pods -n opensandbox-system
kubectl get deployment -n opensandbox-system
kubectl logs -n opensandbox-system -l control-plane=controller-manager -fUpgrade:
helm upgrade opensandbox-controller \
https://github.com/opensandbox-group/OpenSandbox/releases/download/helm/opensandbox-controller/<new-version>/opensandbox-controller-<new-version>.tgz \
--namespace opensandbox-systemUninstall:
helm uninstall opensandbox-controller -n opensandbox-systemOption 2: Deploy with Kustomize
Build and push your images:
shmake docker-build docker-push CONTROLLER_IMG=<some-registry>/opensandbox-controller:tag make docker-build-task-executor docker-push-task-executor TASK_EXECUTOR_IMG=<some-registry>/opensandbox-task-executor:tagInstall the CRDs into the cluster:
shmake installDeploy the Manager to the cluster:
shmake deploy CONTROLLER_IMG=<some-registry>/opensandbox-controller:tag
WARNING
make deploy only rewrites the controller image. Build and publish TASK_EXECUTOR_IMG separately if your Pool / BatchSandbox templates refer to it. You may also need cluster-admin privileges before running the commands.
Creating BatchSandbox and Pool Resources
Basic Example
Create a simple non-pooled sandbox without task scheduling:
apiVersion: sandbox.opensandbox.io/v1alpha1
kind: BatchSandbox
metadata:
name: basic-batch-sandbox
spec:
replicas: 2
template:
spec:
containers:
- name: sandbox-container
image: nginx:latest
ports:
- containerPort: 80Apply and check status:
kubectl apply -f basic-batch-sandbox.yaml
kubectl get batchsandbox basic-batch-sandbox -o wideAfter the sandboxes are ready, find the endpoint information in the annotations:
kubectl get batchsandbox basic-batch-sandbox -o jsonpath='{.metadata.annotations.sandbox\.opensandbox\.io/endpoints}'Pooled Sandbox Without Task
First, create a resource pool:
apiVersion: sandbox.opensandbox.io/v1alpha1
kind: Pool
metadata:
name: example-pool
spec:
template:
spec:
containers:
- name: sandbox-container
image: nginx:latest
ports:
- containerPort: 80
capacitySpec:
bufferMax: 10
bufferMin: 2
poolMax: 20
poolMin: 5Optional: add scaleStrategy to limit the pace of scaling:
scaleStrategy:
maxUnavailable: "20%" # or absolute number like 5Create a batch of sandboxes using the pool:
apiVersion: sandbox.opensandbox.io/v1alpha1
kind: BatchSandbox
metadata:
name: pooled-batch-sandbox
spec:
replicas: 3
poolRef: example-poolPooled Sandbox with Heterogeneous Tasks
Create a batch of sandboxes with process-based heterogeneous tasks. For task execution to work properly, the task-executor must be deployed as a sidecar container in the pool template and share the process namespace with the sandbox container:
apiVersion: sandbox.opensandbox.io/v1alpha1
kind: Pool
metadata:
name: task-example-pool
spec:
template:
spec:
shareProcessNamespace: true
containers:
- name: sandbox-container
image: ubuntu:latest
command: ["sleep", "3600"]
- name: task-executor
image: <task-executor-image>:<tag>
securityContext:
capabilities:
add: ["SYS_PTRACE"]
capacitySpec:
bufferMax: 10
bufferMin: 2
poolMax: 20
poolMin: 5apiVersion: sandbox.opensandbox.io/v1alpha1
kind: BatchSandbox
metadata:
name: task-batch-sandbox
spec:
replicas: 2
poolRef: task-example-pool
taskTemplate:
spec:
process:
command: ["echo", "Default task"]
shardTaskPatches:
- spec:
process:
command: ["echo", "Custom task for sandbox 1"]
- spec:
process:
command: ["echo", "Custom task for sandbox 2"]
args: ["with", "additional", "arguments"]Monitoring Resources
kubectl get pools
kubectl get batchsandboxes
kubectl describe pool example-pool
kubectl describe batchsandbox example-batch-sandboxPerformance
When both use resource pools, the total time comparison for delivering 100 Sandboxes:
| Test Scenario | Total Time (seconds) |
|---|---|
| SIG Agent-Sandbox (concurrency=1) | 76.35 |
| SIG Agent-Sandbox (concurrency=10) | 23.17 |
| SIG Agent-Sandbox (concurrency=50) | 33.85 |
| BatchSandbox | 0.92 |
The time complexity of SIG Agent-Sandbox and BatchSandbox for batch delivery of N Sandboxes is O(N) and O(1) respectively.
Project Structure
kubernetes/
api/v1alpha1/ # Custom resource definitions (BatchSandbox, Pool)
cmd/controller/ # Main controller manager entry point
cmd/task-executor/ # Task executor binary
config/crd/ # CRD manifests
config/manager/ # Controller manager configuration
config/rbac/ # RBAC manifests
config/samples/ # Sample YAML manifests
internal/controller/ # Core controller implementations
internal/scheduler/ # Resource allocation and scheduling logic
internal/task-executor/# Task execution engine internals
pkg/task-executor/ # Shared task executor packages
test/ # Test suites and utilitiesLicense
This project is open source under the Apache 2.0 License.