Skip to content

feat: three-prong session lifecycle fuse for opencode adapter #144

@c-h-

Description

@c-h-

Problem

opencode run (non-interactive mode) doesn't write session files to opencode's native storage directory. This means:

  • Sessions launched via agentctl launch --matrix are invisible to agentctl list
  • The events() generator never sees them start or stop
  • No session.stopped lifecycle events are emitted to OrgLoop
  • The supervisor never gets notified when coding agents finish

Discovered during ENG-8709 and ENG-8669 autonomous implementation runs: agents did real work (Kimi committed code, modified 5+ files) but died before pushing/opening PRs, and the supervisor was never woken.

Session meta files ARE written to ~/.agentctl/opencode-sessions/*.json with PIDs, but the discovery logic in list() and events() only enumerates from opencode's native storage (~/.local/share/opencode/storage/session/).

Design: Three-Prong Fuse

When agentctl launches an opencode session, register three signals. First to fire cancels the others (AbortController per session).

Signal 1: Wrapper Exit Hook (primary, immediate)

  • Launch a thin wrapper script instead of opencode directly
  • Wrapper runs opencode, captures exit code, writes it to a .exit file alongside the session meta
  • On next poll cycle (≤15s), agentctl detects the exit file → emits session.stopped with exit code
  • Gives us exit code for free, which PID poll alone can't provide

Signal 2: PID Death Poll (fallback, ≤15s latency)

  • The events() generator polls kill(pid, 0) every ~15s for all tracked sessions
  • Catches crashes, OOM kills, SIGKILL — anything the wrapper wouldn't survive
  • Uses stored startTime from session meta to detect PID recycling
  • If PID dies without an exit file: emit session.stopped with unknown exit code

Signal 3: Master Timeout (safety net, configurable)

  • Configurable per-session, default 3 hours
  • Emits session.timeout (distinct event type from session.stopped)
  • Does NOT kill the process — the supervisor decides what to do
  • Catches: zombie PIDs, PID recycled to another matching process, poll bugs

Fuse Pattern

launch → register { pid, exitFileWatcher, pollInterval, timeout, abortController }
any signal fires → emit event → controller.abort() → cancels the other two

Each session gets its own AbortController. Daemon tracks Map<sessionId, AbortController>.

Implementation Notes

  • Wrapper collapses signals 1+2: exit file is the fast path, PID poll is the fallback
  • Poll interval: 15s (cheap, kill(pid, 0) is ~0 cost)
  • The list() method should also enumerate from the meta dir as a primary source, not just supplementary
  • session.timeout should be a new event type in the lifecycle types
  • Timeout is configurable via launch opts or agentctl config, not hardcoded

Current State

  • Session meta files: ~/.agentctl/opencode-sessions/<uuid>.json (already written on launch ✅)
  • PID stored with startTime for recycling detection ✅
  • events() async generator exists but only watches opencode's native storage ❌
  • No wrapper, no PID poll, no timeout ❌

Acceptance Criteria

  • agentctl list -a shows sessions launched via --matrix / launch opencode
  • session.stopped fires within 15s of opencode process exit
  • session.timeout fires after configurable duration (default 3h)
  • First signal cancels the other two (fuse pattern)
  • Exit code captured when available (via wrapper)
  • PID recycling detected and handled
  • Existing claude-code/codex lifecycle behavior unchanged

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions