title: Developer Console for Sandbox Operations authors:
- "@divyamagrawal06" creation-date: 2026-03-05 last-updated: 2026-03-06 status: implementable
OSEP-0006: Developer Console for Sandbox Operations with Phased Auth Model
- Summary
- Motivation
- Requirements
- Proposal
- Design Details
- Test Plan
- Drawbacks
- Alternatives
- Infrastructure Needed
- Upgrade & Migration Strategy
Summary
This proposal is based on #348, which outlines the need for a Developer Console for sandbox lifecycle operations with a phased auth model.
The idea is to add a console/ web app for day-to-day sandbox management (list, create, renew, delete, get endpoint, filtering) and a server-side auth/authz layer that does not break existing API key automation.
Phase 2: OIDC JWT validation, PostgreSQL RBAC bindings, and durable audit logs.
Motivation
Today OpenSandbox exposes lifecycle APIs and Swagger docs, but developers/operators still need to manage sandbox resources via APIs. This creates friction for common workflows (search/create/renew/delete), weakens governance in multi-user environments, and raises onboarding cost for teams that are not API-first.
- Server auth today is global API key only (
server/opensandbox_server/middleware/auth.pywithOPEN-SANDBOX-API-KEY). - Lifecycle operations already exist and are stable (
server/opensandbox_server/api/lifecycle.py,specs/sandbox-lifecycle.yml). - Filtering by state/metadata already exists (
GET /sandboxesandmatches_filter). - Sandbox metadata already maps to labels in Docker/Kubernetes services and is returned in list/get responses.
This means a console can be built on top of what exists without touching core runtime behavior.
Goals
- Add a standalone React app under
console/for sandbox lifecycle operations. - Cover the MVP flows called out in #348:
- list and detail views
- create sandbox from image + basic runtime options
- renew expiration, delete sandbox, get endpoint
- filtering by state and metadata
- Enforce role boundaries server-side (not only hidden in UI):
read_onlyrole for read operationsoperatorrole for mutating operations
- Keep existing API key automation and SDK behavior backward compatible.
- Ensure browser clients never receive server API keys.
- Ensure that it is feasible and easy to scale to Phase 2:
- OIDC login and JWT validation in server
- PostgreSQL-backed RBAC and durable audit events
Non-Goals
Per the issue discussion, the following are out of scope:
- Billing or chargeback portal.
- Approval workflows for every operation.
- Replacing existing SDK/CLI/API workflows.
- Changing Docker/Kubernetes runtime internals.
- Full enterprise IAM policy language in the MVP.
Requirements
| ID | Requirement | Priority |
|---|---|---|
| R1 | Console users use core lifecycle operations from UI | Must Have |
| R2 | Role-based authorization on server for each lifecycle action | Must Have |
| R3 | Existing OPEN-SANDBOX-API-KEY flow continues unchanged | Must Have |
| R4 | No server API key is exposed to browser code | Must Have |
| R5 | Phase 1 works without introducing a database | Must Have |
| R6 | Ownership/team scoping via existing metadata/labels | Should Have |
| R7 | OIDC JWT validation and PostgreSQL RBAC/audit in Phase 2 | Should Have |
Proposal
Following the phased strategy suggested in #348 ("Ship MVP fast, no DB, validate usage and workflows"):
Phase 1 (MVP):
- Add a
console/React app. - Add a user-auth path in server (config-gated) suitable for console access without API keys in browser.
- Add authorization checks on lifecycle operations.
- Add metadata-based scoping using reserved metadata keys for owner/team.
- Emit audit logs for mutating operations.
- No new database dependency.
- Add a
Phase 2 (Hardening):
- Add OIDC JWT validation in server.
- Add PostgreSQL tables for RBAC bindings and audit events.
- Add richer operational UX (bulk safeguards, failure insights).
flowchart LR
A[Developer Browser] --> B[console React App]
B --> C[OpenSandbox Server]
C --> D[Lifecycle Service Layer]
D --> E[Docker/Kubernetes Runtime]flowchart LR
A[SDK/Automation] -->|OPEN-SANDBOX-API-KEY| C[OpenSandbox Server]
B[Console User] -->|User Auth Path| C
C --> D[AuthN + AuthZ Enforcement]
D --> E[Lifecycle Operations]Notes/Constraints/Caveats
- Metadata values must satisfy label constraints already enforced in
ensure_metadata_labels; owner/team values require canonicalization. - Kubernetes runtime currently does not support pause/resume (
501); console must reflect runtime capability. - API key requests remain privileged for backward compatibility.
- Phase 1 audit is log-based (non-durable). Durable queryable audit is planned for Phase 2.
Risks and Mitigations
| Risk | Impact | Mitigation |
|---|---|---|
| Scope creep from simple console into full control plane | Delivery delay | Strict phase gates; MVP only core operations |
| Header-spoofing if pre-auth mode is misconfigured | Security | Config-gated user auth mode, trusted deployment guidance, Phase 2 JWT validation |
| Metadata-based scoping collisions | Authorization bugs | Reserve keys for access control and enforce server-side overwrite rules |
| Claim values incompatible with label format | Provisioning/authz mismatch | Canonicalization to label-safe owner/team tokens |
| Breaking API automation | Adoption risk | Keep API key path as-is; add compatibility tests |
| Lack of durable audit in MVP | Governance gap | Structured mutation logs in Phase 1 + Phase 2 audit table plan |
Design Details
Current State
Quick summary of the relevant server code as it stands today:
- Auth middleware: API key only (
server/opensandbox_server/middleware/auth.py, headerOPEN-SANDBOX-API-KEY). - Lifecycle routes:
server/opensandbox_server/api/lifecycle.py. - Service implementations:
server/opensandbox_server/services/docker.py(Docker),server/opensandbox_server/services/k8s/kubernetes_service.py(Kubernetes). - Filtering:
stateandmetadatafilters in route parsing,matches_filterhelper. - Metadata: already stored as Docker/Kubernetes labels.
There is no database for RBAC or audit today.
Phase 1 (MVP): Console + Server-Side RBAC Without DB
- Standalone React + TypeScript app under
console/. - Config-gated user-auth mode on the server (no API key in the browser).
- Authorization checks in the lifecycle API path.
- Reuse metadata labels for owner/team scoping.
- Structured audit logs for mutations (create, delete, renew, etc.).
Phase 2 (Hardening): OIDC/JWT + PostgreSQL RBAC and Audit
- Validate OIDC-issued JWT in server (issuer, audience, signature/JWKS, exp/nbf).
- Replace static role mapping with PostgreSQL RBAC bindings.
- Persist mutation audit events in PostgreSQL.
- Add query APIs for audit and governance.
Role and Permission Model
Three roles, matching the separation called for in #348:
read_only: list/get/get endpoint.operator: read_only + create/renew/delete (+ pause/resume where runtime supports).service_admin: API key automation role with full access (compatibility role).
Here's a table for ref:
| Endpoint | read_only | operator | service_admin |
|---|---|---|---|
GET /sandboxes | yes | yes | yes |
GET /sandboxes/{id} | yes | yes | yes |
GET /sandboxes/{id}/endpoints/{port} | yes | yes | yes |
POST /sandboxes | no | yes | yes |
POST /sandboxes/{id}/renew-expiration | no | yes | yes |
DELETE /sandboxes/{id} | no | yes | yes |
POST /sandboxes/{id}/pause | no | yes | yes |
POST /sandboxes/{id}/resume | no | yes | yes |
Ownership and Team Scoping Without Database
Phase 1 scope source:
metadata["access.owner"]metadata["access.team"]
How it works:
- On create, server injects/overwrites reserved scope metadata from authenticated principal.
- Non-admin users can only act on resources within their owner/team scope.
service_adminbypasses scope checks.- Existing user-provided metadata remains supported, but reserved keys are server-controlled.
Canonicalization:
- Principal identifiers from user auth claims/headers are transformed into label-safe tokens (length and charset compatible with existing metadata-label validators).
- Canonicalization must be deterministic to keep scope matching stable across requests.
Server Changes
1. Configuration
Extend server/opensandbox_server/config.py with auth/authz sections.
auth.mode controls high-level authentication behavior:
"api_key_only": current behavior; onlyOPEN-SANDBOX-API-KEYauth is accepted."api_key_and_user": dual path; API key auth remains for SDK/automation, and user-authenticated requests are also accepted for Console access.
user_mode controls how user identity is extracted when auth.mode = "api_key_and_user":
- Phase 1 supports
"trusted_header"only. - When
auth.mode = "api_key_only",user_modeis ignored.
[auth]
# Allowed values:
# - "api_key_only"
# - "api_key_and_user"
mode = "api_key_only"
# Used only when auth.mode = "api_key_and_user".
# Phase 1 supports "trusted_header".
user_mode = "trusted_header"
[auth.trusted_header]
# Used when user_mode = "trusted_header".
user_header = "X-OpenSandbox-User"
team_header = "X-OpenSandbox-Team"
roles_header = "X-OpenSandbox-Roles"
[authz]
default_role = "read_only"
owner_metadata_key = "access.owner"
team_metadata_key = "access.team"
operator_subjects = []
read_only_subjects = []Trusted-header failure behavior (Phase 1):
- Applies when
auth.mode = "api_key_and_user"anduser_mode = "trusted_header". - Requests on the user-auth path that are missing required trusted identity headers are treated as unauthenticated and rejected with
401 Unauthorized. - The server must NOT fall back to anonymous/default user access when trusted headers are missing.
- The server must NOT silently switch to another auth path UNLESS that credential is explicitly provided (for example,
OPEN-SANDBOX-API-KEYfor API key auth).
Phase 2 adds:
[auth.oidc]
issuer = "https://accounts.google.com" # or any OIDC provider
audience = "opensandbox-console"
jwks_url = "https://www.googleapis.com/oauth2/v3/certs"2. Authentication Middleware
Changes to server/opensandbox_server/middleware/auth.py:
- Preserve current API key path exactly.
- Add user principal extraction path (phase-gated by config).
- Attach normalized principal to
request.state.principal. - If trusted-header mode is active and required headers are missing, return
401 Unauthorized(unauthenticated), not403(authenticated but forbidden). - Keep proxy path exemptions behavior unchanged for sandbox proxy route.
3. Authorization Enforcement
New module server/opensandbox_server/middleware/authorization.py with a single entry point:
authorize_action(principal, action, sandbox=None).- Scope checks for owner/team.
Integrate into server/opensandbox_server/api/lifecycle.py per route before invoking mutating service operations.
For list operations: Apply server-side scope filter in addition to client-provided filters.
For get/delete/renew/endpoint: Resolve sandbox resource and evaluate scope before action.
4. Mutation Audit Logging (Phase 1)
For mutating actions, log:
- request_id
- principal subject/team/role
- action
- sandbox_id
- outcome (success/error code)
- timestamp
This extends existing request-id logging without DB dependency.
Console Application Design
Standalone React app living under console/. Pages map directly to the MVP scope from #348:
- Sandbox List: state + metadata filters, pagination.
- Sandbox Detail: status, metadata, image, entrypoint, expiration.
- Create Sandbox: image, entrypoint, timeout, resource limits, env vars, metadata.
- Operations: renew expiration, delete, endpoint retrieval.
The UI should disable buttons the user's role cannot use (e.g., hide "Create" for read_only), but the server is always the final authority. The browser only uses the user-auth path; the API key is never shipped in frontend code.
If Console requests are rejected with 401 because trusted headers are missing, the Console should render an explicit "authentication required / auth proxy misconfiguration" state instead of retrying with anonymous assumptions.
API and Spec Changes
Primary lifecycle endpoints MUST remain unchanged.
Updates to specs/sandbox-lifecycle.yml:
- Document dual auth path (API key + user auth mode).
- Add
401responses for unauthenticated user-auth requests (including trusted-header mode with missing required headers). - Add
403responses where role restrictions apply (e.g., create/renew/delete). - Clarify reserved metadata keys used for ownership/team scoping.
- Add error codes for authentication and authorization failures.
Operational Rollout
- Phase 1 stays behind a config flag (
auth.mode = "api_key_and_user"). - Deploy console + updated server in a non-prod environment first.
- Validate role boundaries and scope filtering.
- Phase 2 switches to OIDC JWT mode and runs PostgreSQL migrations.
Test Plan
Unit Tests
- Auth middleware:
- API key success/failure unchanged.
- user principal extraction in enabled mode.
- dual-mode conflict behavior.
- trusted-header mode rejects missing required headers with
401.
- Authorization logic:
- Each action against the role permission table.
- Owner/team scope checks (allow and deny cases).
- Reserved metadata injection on create.
- Canonicalization:
- deterministic label-safe owner/team tokens.
Integration Tests (Server)
- Route-level authz:
read_onlycan list/get/endpoint; gets 403 on create/renew/delete.operatorcan do all MVP operations.
- Backward compat:
- Existing API key clients work exactly as before.
- Scope filtering:
- Users only see sandboxes matching their owner/team.
- Runtime parity:
- Scoped list/get/delete/renew behaves the same on Docker and Kubernetes.
- Trusted-header deployment behavior:
- direct Console-to-server request without proxy-injected headers returns
401. - proxy misconfiguration (one or more missing identity headers) returns
401.
- direct Console-to-server request without proxy-injected headers returns
Console Tests
- Page-level API integration tests for list/detail/create/renew/delete/endpoint flows.
- Role-based UX tests (buttons disabled/hidden for read_only).
- E2E smoke path from login context to sandbox operations.
Phase 2 Tests
- JWT signature and claim validation tests.
- PostgreSQL RBAC lookup tests.
- Durable audit write/read tests.
Drawbacks
- A second auth path in the server means more code to maintain and more surface to test.
- Metadata-based scoping (Phase 1) is less flexible than a proper DB-backed policy.
- Adding a React app introduces a frontend build/release cycle into the repo.
- Durable audit and richer RBAC are punted to Phase 2.
Alternatives
Alternative 1: Keep API-only (no console)
Pros:
- Zero frontend maintenance.
- No auth model changes.
Cons:
- Does not address operator efficiency and onboarding needs.
Decision: Rejected as it does not solve the efficiency and onboarding problems raised in #348
Alternative 2: Implement Full OIDC + DB in One Phase
Pros:
- Strongest model from day one.
Cons:
- Larger scope, slower delivery, higher integration risk.
Decision: Rejected in favor of phased delivery, as mentioned in the issue.
Alternative 3: Expose the API key to the browser
Would need almost no server changes, but leaks the global API key to every console user and gives up per-user governance entirely. Rejected.
Infrastructure Needed
Phase 1:
- Node.js (for building/testing the
console/app). - Existing OpenSandbox server runtime (Docker or Kubernetes).
- If using trusted-header mode: a reverse proxy (e.g., Nginx, Envoy) that sets the identity headers after authenticating the user.
Phase 2:
- An OIDC provider (e.g., Google, Keycloak, Auth0).
- PostgreSQL instance for RBAC bindings and audit events.
- A schema migration tool (e.g., Alembic).
Upgrade & Migration Strategy
- Backward compatibility is preserved by default:
auth.mode = "api_key_only"keeps existing behavior.
- User auth path is opt-in through configuration.
- Existing SDK/automation clients continue using
OPEN-SANDBOX-API-KEY. - Enabling console/user auth does not require lifecycle API contract breaks.
- Phase 2 DB migrations are additive:
- static config role mapping can remain as fallback during cutover.
This page is sourced from:
oseps/0006-developer-console.md