Skip to content

Eliminate daemon-env.json — derive env at spawn time (stateless inversion) #118

@c-h-

Description

@c-h-

Context

Per the stateless daemon architecture (ADR 004, PR #52, PR #117): agentctl is a stateless orchestration layer. Ground truth lives in adapters. No shadow state.

daemon-env.json is shadow state — a point-in-time snapshot of the shell environment taken at daemon start. It goes stale, it gets corrupted (currently 0 bytes or missing critical keys), and it creates a mystery gap between the user's actual environment and what child processes receive.

The Problem

buildSpawnEnv() loads from daemon-env.json. If that file is stale/empty/corrupt, child processes (Claude Code, OpenCode, etc.) launch without critical env vars like ANTHROPIC_API_KEY, OPENAI_API_KEY. This causes silent 502 failures.

Fix: Delete the Snapshot, Derive at Spawn Time

Following the same inversion principle as PR #52/#117:

Before: daemon start → snapshot env → save JSON → load JSON at spawn → pray it's still valid
After:  spawn time → source ~/.zshenv + process.env → done

Implementation

  1. Delete saveEnvironment() from daemon/server.ts — no more env snapshot at daemon start
  2. Rewrite buildSpawnEnv() to derive env at spawn time:
    • Start with process.env (whatever the daemon/CLI has)
    • Source ~/.zshenv via child_process.execSync('zsh -c "source ~/.zshenv && env"') and parse the output
    • ~/.zshenv is the POSIX-standard file for non-interactive env vars — it's where API keys belong
    • Apply opts.env overrides (per-launch extras, same as today)
    • ~10ms overhead per launch, negligible
  3. Delete daemon-env.json — it's shadow state, same as state.json sessions were
  4. Delete loadSavedEnvironment() — no more loading stale snapshots
  5. Keep getCommonBinDirs() — PATH augmentation is still useful

Benefits

  • API key rotation takes effect immediately (no daemon restart needed)
  • No stale snapshot corruption
  • No 0-byte file failures
  • Daemon is more stateless (fewer files to manage)
  • Follows ADR 004 principle: derive from ground truth, don't cache

Note on Future Key Governance

The current Ohm architecture (V4 spec) envisions a Control Plane → Data Plane push model where API keys are part of pushed TenantContexts, not local env vars. The concept of 'server keys taking precedence over client keys' will eventually be replaced by the Control Plane pushing the authoritative key configuration. This ~/.zshenv approach is correct for the current phase (self-managed, local proxy) and will naturally be superseded when the Control Plane exists.

Testing

  • Verify Claude Code launches successfully with API keys from ~/.zshenv
  • Verify OpenCode launches successfully
  • Verify opts.env overrides still work
  • Verify PATH augmentation still works
  • Verify graceful behavior when ~/.zshenv doesn't exist (fall back to process.env only)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions