title: Pause and Resume via Rootfs Snapshot authors:
- "@fengcone" creation-date: 2026-03-11 last-updated: 2026-03-13 status: implementing
OSEP-0008: Pause and Resume via Rootfs Snapshot
- Summary
- Motivation
- Requirements
- Proposal
- Design Details
- Test Plan
- Drawbacks
- Alternatives
- Infrastructure Needed
- Upgrade & Migration Strategy
Summary
This proposal introduces pause and resume semantics for Kubernetes-backed sandboxes by persisting the sandbox root filesystem as an OCI image. On pause, the server creates a SandboxSnapshot CR for the running sandbox Pod, a dedicated controller creates a commit Job on the same node, and the rootfs is committed and pushed to a registry. After the snapshot becomes ready, the original BatchSandbox is removed so compute resources are released.
Resume is intentionally simpler. The server resolves the single retained snapshot for the stable sandboxId, then creates a new BatchSandbox with replicas = 1 from the snapshot image. The public sandboxId remains stable across pause and resume.
Time ------------------------------------------------------------------------>
Sandbox lifecycle: [Running]--[Pausing]--[Paused]--[Resuming]--[Running]
| |
commit + push create new BatchSandbox
delete old BatchSandbox from snapshot imageMotivation
OpenSandbox users often need to temporarily stop a sandbox without losing the filesystem state that has accumulated during a long-running task. Typical cases include releasing cluster resources overnight, pausing an agent before a risky step, or resuming a workspace later from the same working directory.
Today, Kubernetes runtime returns HTTP 501 Not Implemented for both pause and resume. Docker supports cgroup freeze, but that does not survive restart or migration. Rootfs snapshot is the practical middle ground in the persistence roadmap:
- Phase 1: persistent volumes preserve explicit mounts.
- Phase 2: rootfs snapshot preserves the container filesystem.
- Phase 3: VM or process checkpoint preserves memory and execution state.
This OSEP deliberately chooses a simple architecture:
- keep
BatchSandboxas the runtime workload resource used by the server today - add a single
SandboxSnapshotCR persandboxId - do not introduce a new per-instance lifecycle CR
- do not support multiple retained snapshots in v1
Goals
- Implement
pausefor Kubernetes sandboxes by committing a running sandbox Pod rootfs into an OCI image and pushing it to a configurable registry. - Keep the public
sandboxIdstable across pause and resume. - Release compute resources after pause by deleting the original
BatchSandbox. - Implement
resumeby creating a newBatchSandboxwithreplicas = 1from the retained snapshot image. - Expose
Pausing,Paused, andResumingthrough the existing Lifecycle API. - Keep the design minimal by retaining only one snapshot per sandbox.
Non-Goals
- Preserving in-memory process state, open sockets, or CPU registers.
- Supporting multiple historical snapshots per sandbox.
- Adding
GET /sandboxes/{sandboxId}/snapshotsin v1. - Designing a general multi-instance pause model for
BatchSandboxwithreplicas > 1. - Extending Docker runtime to rootfs snapshot.
- Implementing automatic scheduled snapshots.
Requirements
- Public
sandboxIdmust remain unchanged after pause and resume. - A sandbox has at most one retained snapshot in v1.
- Pause must work from the currently running sandbox Pod and record the concrete
podName,containerName, andnodeNamethat are being snapshotted. - The commit Job must run on the same node as the source Pod.
- Pause must complete
commit -> pushbefore the originalBatchSandboxis deleted. - At most one pause operation may be in progress for a given
sandboxId. - Resume must work when the original
BatchSandboxno longer exists. GET /sandboxes/{sandboxId}must still return200and statePausedwhile the sandbox is represented only by aSandboxSnapshot.DELETE /sandboxes/{sandboxId}must delete both the live workload and any retainedSandboxSnapshotfor that sandbox.- Registry credentials must be referenced via Kubernetes Secret, not inline API credentials.
SandboxSnapshotmust carry enough policy and workload reconstruction data to resume even after the originalBatchSandboxhas been deleted.- The API shape must leave room for future snapshot backends, especially VM snapshot, even though this revision only implements rootfs snapshot.
- The design must remain compatible with the current server behavior where Kubernetes sandboxes are created as
BatchSandboxwithreplicas = 1.
Proposal
Pause and resume are modeled around two resources:
BatchSandbox: runtime workload resource used for the live sandboxSandboxSnapshot: persisted snapshot state for one stablesandboxId
The public API stays sandbox-oriented, and the server remains the orchestrator. The snapshot controller only handles snapshot execution.
API Overview
POST /sandboxes/{sandboxId}/pause -> create or update SandboxSnapshot, return 202
POST /sandboxes/{sandboxId}/resume -> create new BatchSandbox from snapshot, return 202
GET /sandboxes/{sandboxId} -> returns Running / Pausing / Paused / ResumingThere is no GET /sandboxes/{sandboxId}/snapshots endpoint in this version because each sandbox retains only one snapshot.
Kubernetes Resource Overview
BatchSandbox (existing)
|- used by Server as the live workload resource
|- created with replicas = 1 for public sandbox lifecycle API
`- deleted after pause succeeds
SandboxSnapshot (new, one per sandboxId)
|- metadata.name = <sandboxId>
|- spec.sandboxId
|- spec.policy.type # Rootfs today, reserved for VMSnapshot later
|- spec.sourceBatchSandboxName
|- spec.sourcePodName
|- spec.sourceContainerName
|- spec.sourceNodeName
|- spec.imageUri
|- spec.snapshotPushSecretName
|- spec.resumeImagePullSecretName
|- spec.resumeTemplate
|- status.phase # Pending | Committing | Pushing | Ready | Failed
|- status.readyAt
`- status.messageThe SandboxSnapshot name is deterministic and equal to sandboxId, which enforces the “one sandbox, one snapshot” rule.
Component Interaction Overview
Pause flow:
sequenceDiagram
participant Client
participant Server
participant Batch as BatchSandbox
participant Snapshot as SandboxSnapshot
participant Ctrl as SandboxSnapshotController
participant Job as Commit Job Pod
participant Registry
Client->>Server: POST /sandboxes/{id}/pause
Server->>Batch: Read live BatchSandbox and Pod info
Server->>Snapshot: Create/Update SandboxSnapshot\n(sandboxId, podName, containerName, nodeName, imageUri, pushSecretRef, resumePullSecretRef)
Server-->>Client: 202 Accepted
Ctrl->>Job: Create same-node commit Job Pod
Job->>Registry: Push snapshot image
Job-->>Ctrl: Commit/push succeeded
Ctrl->>Snapshot: status.phase = Ready
Server->>Batch: Delete original BatchSandbox after snapshot ReadyResume flow:
sequenceDiagram
participant Client
participant Server
participant Snapshot as SandboxSnapshot
participant Batch as New BatchSandbox
participant Ctrl as BatchSandboxController
participant Pod as Sandbox Pod
Client->>Server: POST /sandboxes/{id}/resume
Server->>Snapshot: Lookup SandboxSnapshot by sandboxId
Server->>Snapshot: Validate snapshot status.phase == Ready
Server->>Batch: Create new BatchSandbox\n(replicas=1, image=snapshot.imageUri, sandboxId unchanged)
Server-->>Client: 202 Accepted
Ctrl->>Pod: Create sandbox Pod
Pod-->>Ctrl: Pod becomes Running and Ready
Server->>Server: Aggregate state as Resuming -> RunningNotes/Constraints/Caveats
BatchSandboxstill supports broader semantics in the platform, but this proposal only targets the current public server path where a sandbox maps to aBatchSandboxwithreplicas = 1.- The old
BatchSandboxis deleted after a successful pause, so the paused state exists only inSandboxSnapshot. - The server remains the orchestration owner for pause and resume. The snapshot controller is not responsible for creating or deleting
BatchSandbox. SandboxSnapshot.spec.policy.typeis reserved for future snapshot backends. This revision only supportsRootfs.- Snapshot image URI should be stable for the single retained snapshot, for example
<snapshotRegistry>/<sandboxId>:snapshot. This v1 design therefore assumes a registry/tag policy that allows replacing the retained snapshot image for a sandbox. - Snapshot push authentication and resume-time image pull authentication are modeled separately. They may reference the same Kubernetes Secret in some deployments, but the design must not assume they are identical.
- Because the original
BatchSandboxis deleted, resume cannot rely onimageUrialone.SandboxSnapshotmust retain enoughresumeTemplateinformation for the server to reconstruct a newBatchSandbox. - Registries with immutable tags are not compatible with this simplified single-snapshot design unless the implementation changes the tag strategy in a future revision.
- Resume creates a new
BatchSandbox; it does not resurrect the previous one.
Risks and Mitigations
| Risk | Mitigation |
|---|---|
| Pause succeeds in commit but old workload is deleted too early | Delete the original BatchSandbox only after SandboxSnapshot.status.phase == Ready. |
| Commit job lands on the wrong node | Store sourceNodeName in SandboxSnapshot.spec and pin the commit Job Pod to that node. |
Server cannot represent a paused sandbox once BatchSandbox is gone | Use SandboxSnapshot as the source of truth for paused state in GET /sandboxes/{sandboxId}. |
| Repeated pause requests cause inconsistent state | Allow only one in-flight pause per sandboxId; return 409 if snapshot phase is already Pending, Committing, or Pushing. |
| Snapshot image is unavailable on resume | Require status.phase == Ready before resume and surface image-pull failures through normal sandbox startup state. |
| Single-snapshot design loses rollback ability | Accept as an intentional simplification for v1; multi-snapshot support is a future extension. |
Design Details
1. Public Lifecycle API changes
This OSEP keeps the public API minimal:
CreateSandboxRequest.pausePolicyis added as an optional field.POST /sandboxes/{sandboxId}/pausePOST /sandboxes/{sandboxId}/resumeGET /sandboxes/{sandboxId}
There is no snapshots listing API in this version.
Suggested request shape:
pausePolicy:
snapshotType: Rootfs
snapshotRegistry: registry.example.com/sandbox-snapshots
snapshotPushSecretName: snapshot-registry-push-secret
resumeImagePullSecretName: snapshot-registry-pull-secretpausePolicy.snapshotType is reserved for future expansion and currently only supports Rootfs. A later revision can add VMSnapshot without breaking the API shape.
2. PausePolicy on BatchSandbox
Pause policy remains part of the live sandbox workload definition:
type PausePolicy struct {
SnapshotType string `json:"snapshotType,omitempty"` // Rootfs today, VMSnapshot reserved
SnapshotRegistry string `json:"snapshotRegistry"`
SnapshotPushSecretName string `json:"snapshotPushSecretName,omitempty"`
ResumeImagePullSecretName string `json:"resumeImagePullSecretName,omitempty"`
}
type BatchSandboxSpec struct {
// existing fields...
PausePolicy *PausePolicy `json:"pausePolicy,omitempty"`
}This policy is used by the server when constructing SandboxSnapshot.
3. SandboxSnapshot CRD
Introduce SandboxSnapshot under sandbox.opensandbox.io/v1alpha1.
type SandboxSnapshotPhase string
const (
SandboxSnapshotPhasePending SandboxSnapshotPhase = "Pending"
SandboxSnapshotPhaseCommitting SandboxSnapshotPhase = "Committing"
SandboxSnapshotPhasePushing SandboxSnapshotPhase = "Pushing"
SandboxSnapshotPhaseReady SandboxSnapshotPhase = "Ready"
SandboxSnapshotPhaseFailed SandboxSnapshotPhase = "Failed"
)
type SandboxSnapshotSpec struct {
SandboxID string `json:"sandboxId"`
Policy SnapshotPolicy `json:"policy"`
SourceBatchSandboxName string `json:"sourceBatchSandboxName"`
SourcePodName string `json:"sourcePodName"`
SourceContainerName string `json:"sourceContainerName"`
SourceNodeName string `json:"sourceNodeName"`
ImageURI string `json:"imageUri"`
SnapshotPushSecretName string `json:"snapshotPushSecretName,omitempty"`
ResumeImagePullSecretName string `json:"resumeImagePullSecretName,omitempty"`
ResumeTemplate *runtime.RawExtension `json:"resumeTemplate,omitempty"`
PausedAt metav1.Time `json:"pausedAt"`
}
type SandboxSnapshotStatus struct {
Phase SandboxSnapshotPhase `json:"phase,omitempty"`
Message string `json:"message,omitempty"`
ReadyAt *metav1.Time `json:"readyAt,omitempty"`
ImageDigest string `json:"imageDigest,omitempty"`
}
type SnapshotPolicy struct {
Type string `json:"type"` // Rootfs today, VMSnapshot reserved
}Key rules:
metadata.name = sandboxId- one namespace contains at most one
SandboxSnapshotfor a givensandboxId - creating a new pause request overwrites the retained snapshot
policy.typemust be set toRootfsin this revisionSourcePodName,SourceContainerName, andSourceNodeNameare mandatory because the commit workflow is bound to a concrete live containerSourceContainerNameidentifies the main sandbox workload container whose rootfs is being snapshotted; init containers and sidecars are not committedSnapshotPushSecretNameis used only for the in-container registry push performed by the commit JobResumeImagePullSecretNameis used only when reconstructing the resumed workload so kubelet can pull the retained snapshot imageResumeTemplatemust preserve enough information to reconstruct a newBatchSandboxafter the original workload has been deleted
4. Pause state model
State is derived from resource presence:
BatchSandboxexists and is ready, and no matching pause cleanup is pending ->RunningBatchSandboxexists and snapshot phase isPending|Committing|Pushing|Ready, and the live workload still matchessnapshot.spec.sourceBatchSandboxName->PausingBatchSandboxis absent and snapshot phase isReady->PausedBatchSandboxexists and was created from snapshot but is not ready yet ->ResumingSandboxSnapshot.status.phase == Failedand no live replacement workload ->Failed
This means GET /sandboxes/{sandboxId} must consult both BatchSandbox and SandboxSnapshot.
5. Pause flow
The pause flow is:
1. Client POST /sandboxes/{sandboxId}/pause
2. Server Resolve current BatchSandbox and running Pod for sandboxId
3. Server Validate:
- workload exists
- replicas == 1 for this server path
- pausePolicy is configured
- no existing snapshot for sandboxId is already in phase
Pending|Committing|Pushing
4. Server Create or replace SandboxSnapshot(name=sandboxId) with:
- policy.type = Rootfs
- sourceBatchSandboxName
- sourcePodName
- sourceContainerName
- sourceNodeName
- target imageUri
- snapshotPushSecretName
- resumeImagePullSecretName
- resumeTemplate
- pausedAt
- status.phase = Pending
5. Snapshot controller creates a same-node commit Job Pod
6. Job Pod commits container rootfs and pushes image
7. Snapshot controller updates phase:
Pending -> Committing -> Pushing -> Ready
8. Server-side pause orchestration deletes the original BatchSandbox
9. GET /sandboxes/{sandboxId} now returns Paused from SandboxSnapshotFailure behavior:
- If commit or push fails,
SandboxSnapshot.status.phase = Failed - The original
BatchSandboxis not deleted - The sandbox remains
Runningor transitions toFailedbased on the final server policy; this OSEP recommends keeping the workload running and exposing the snapshot failure in the message - If another pause is already in progress for the same
sandboxId, the server returns409 Conflict
6. Commit Job Pod
The snapshot controller creates one short-lived Job Pod:
apiVersion: batch/v1
kind: Job
metadata:
name: sbxsnap-commit-<sandboxId>
spec:
ttlSecondsAfterFinished: 300
template:
spec:
restartPolicy: Never
nodeName: <sourceNodeName>
containers:
- name: committer
image: <committerImage>
command: ["/bin/sh", "-c"]
args:
- |
snapshot-committer \
--containerd-namespace k8s.io \
--container-id <containerID> \
--target-image <imageUri> \
--registry-auth-file /var/run/opensandbox/registry/.dockerconfigjson
volumeMounts:
- name: containerd-sock
mountPath: /run/containerd/containerd.sock
- name: snapshot-push-auth
mountPath: /var/run/opensandbox/registry
readOnly: true
volumes:
- name: containerd-sock
hostPath:
path: /run/containerd/containerd.sock
type: Socket
- name: snapshot-push-auth
secret:
secretName: <snapshotPushSecretName>The controller resolves the source container ID from SourcePodName and SourceContainerName.
snapshot-committer in this example is a logical role, not a required product name. The implementation may be a small in-house binary, a thin wrapper around existing container tooling, or another committer client, as long as it performs the following responsibilities explicitly:
- commit the source container rootfs into a snapshot image
- read the mounted registry auth config from the Secret volume
- push the snapshot image to
spec.imageUri - return a clear success/failure signal so the controller can update
SandboxSnapshot.status.phase
Important auth semantics:
imagePullSecretson the Job Pod, if needed for thecommitterImage, only affects kubelet pulling the Job image. It does not authenticate registry operations performed by the process inside the container.snapshotPushSecretNameis mounted into the committer container and must be consumed explicitly by the snapshot push client as registry auth config.resumeImagePullSecretNameis not used by the commit Job. It is propagated to the resumed workload template so kubelet can pullsnapshot.spec.imageUriduring resume.
7. Resume flow
The resume flow is:
1. Client POST /sandboxes/{sandboxId}/resume
2. Server Resolve SandboxSnapshot(name=sandboxId)
3. Server Validate:
- snapshot exists
- snapshot status.phase == Ready
4. Server Create a new BatchSandbox:
- metadata.name reuses the same public sandbox identity mapping
- replicas = 1
- template reconstructed from snapshot.spec.resumeTemplate
- template image = snapshot.spec.imageUri
- template imagePullSecrets = snapshot.spec.resumeImagePullSecretName
- labels preserve sandboxId
5. Server Aggregate sandbox state as Resuming while the new BatchSandbox is
starting
6. BatchSandbox controller creates the new Pod
7. Once the new Pod is running and ready, GET /sandboxes/{sandboxId} returns RunningThe snapshot is retained after resume so the sandbox can be paused and resumed again later, but only the latest snapshot is kept.
8. Stable sandbox ID
The public sandboxId is stable across three states:
- live workload exists: identify by
BatchSandboxlabelopensandbox.io/id - paused workload: identify by
SandboxSnapshot.metadata.name == sandboxId - resumed workload: identify by the new
BatchSandboxlabel
The workload object identity may change, but the public sandbox identity does not.
9. List and get semantics
GET /sandboxes/{sandboxId} must:
- first resolve the live
BatchSandbox - then resolve
SandboxSnapshot - merge both views into one lifecycle status
GET /sandboxes should include:
- running sandboxes from live
BatchSandboxobjects - paused sandboxes from
SandboxSnapshotobjects that have no liveBatchSandbox
This keeps paused sandboxes visible even though their workloads have been deleted.
10. Delete semantics
DELETE /sandboxes/{sandboxId} must remove all Kubernetes state associated with the public sandbox identity:
- delete the live
BatchSandboxif it exists - delete
SandboxSnapshot(name=sandboxId)if it exists
Registry cleanup is best-effort in this revision:
- if the implementation can safely delete the retained snapshot image from the registry, it may do so
- registry image deletion failure must not block sandbox deletion success
- operators may rely on registry retention or garbage collection if image deletion is unavailable or undesirable
11. Configuration
Add a new server config section:
[pause]
default_snapshot_registry = ""
committer_image = "containerd/containerd:1.7"Semantics:
default_snapshot_registryis used whenpausePolicy.snapshotRegistryis not explicitly set.committer_imageis the image used by the commit Job Pod.
12. Security considerations
The commit Job mounts the node's container runtime socket to resolve the source container and commit its root filesystem. This is a privileged operation with node-level runtime access.
Operational constraints for this design:
committer_imageis selected by server or operator configuration, not by the public sandbox API- the commit Job spec is not user-extensible in this revision
- operators should treat the snapshot controller and committer image as trusted infrastructure components, with tighter RBAC and supply-chain controls than ordinary sandbox workloads
Test Plan
Unit tests
- Pause request creates or replaces
SandboxSnapshot(name=sandboxId). SandboxSnapshotcontainssourcePodName,sourceContainerName, andsourceNodeNamefrom the live workload.- Snapshot controller creates a Job pinned to the correct node.
- Server returns
PausedwhenBatchSandboxis absent and snapshot isReady. - Server returns
Pausingwhen snapshot isReadybut the sourceBatchSandboxstill exists. - Server returns
Resumingafter newBatchSandboxis created from snapshot but before readiness. - Pause returns
409when another pause is already in progress for the samesandboxId. - Resume fails with
409when snapshot is absent or notReady. - Delete removes
SandboxSnapshotfor paused sandboxes.
Integration tests
- End-to-end pause:
- running
BatchSandbox - snapshot becomes
Ready - original
BatchSandboxis deleted GET /sandboxes/{id}returnsPaused
- running
- End-to-end resume:
- server finds snapshot by
sandboxId - creates new
BatchSandbox - new Pod comes up from snapshot image
GET /sandboxes/{id}returnsRunning
- server finds snapshot by
- Repeat pause after resume:
- the same
SandboxSnapshotresource is reused or replaced - only one snapshot remains
- the same
- Delete after pause:
- paused sandbox is removed even when no live
BatchSandboxexists - retained
SandboxSnapshotis removed
- paused sandbox is removed even when no live
Manual and operator validation
- Confirm the committed image is present in the registry after pause.
- Confirm working directory contents survive pause and resume.
- Confirm CPU and memory are released after the old
BatchSandboxis deleted. - Confirm the commit Job Pod actually runs on the source node.
- Confirm the committed rootfs comes from the intended sandbox container rather than a sidecar.
Drawbacks
- Only one snapshot is retained, so rollback to older states is impossible.
- The design assumes the server-side Kubernetes path uses
replicas = 1. - The paused state is split from the live workload and must be reconstructed by the server from multiple resources.
- Registries that enforce immutable tags are a poor fit for the simplified single-snapshot design.
- Commit still requires node-local runtime access.
Alternatives
Introduce a dedicated SandboxInstance CR
A more general design is possible, but rejected here because the user goal is a simpler architecture aligned with the current server path. For v1, the single snapshot CR plus existing BatchSandbox is sufficient.
Store pause state directly on BatchSandbox
Rejected because the paused state must survive after the workload is deleted. Once pause succeeds, the original BatchSandbox no longer exists.
Support multiple snapshots and /snapshots API in v1
Rejected to keep the architecture minimal. Multi-snapshot history can be added later by changing SandboxSnapshot naming and list semantics.
Restore the old BatchSandbox instead of creating a new one
Rejected because pause deletes the original workload to release resources. Resume is cleaner if it always creates a fresh BatchSandbox from the retained image.
Infrastructure Needed
- An OCI registry reachable from cluster nodes.
- A registry credential Secret of type
kubernetes.io/dockerconfigjson. - A committer image that can access
containerd.sockon the source node. - RBAC for
SandboxSnapshot, Jobs, and reads on Pods andBatchSandbox.
Upgrade & Migration Strategy
This change is additive for the public API and simple for operators.
- Existing clients keep using the same sandbox lifecycle endpoints.
- Existing Kubernetes deployments without the new
SandboxSnapshotCRD continue to return501for pause and resume. - Rollout sequence:
- install the
SandboxSnapshotCRD - deploy the
SandboxSnapshotController - deploy the updated server with pause/resume orchestration
- install the
- Existing running sandboxes do not require migration. Only new pause/resume operations use the new flow.
此页内容来自仓库源文件:
oseps/0008-pause-resume-rootfs-snapshot.md