Skip to content

feat: session timeout + health probe — detect zombie/hung sessions #157

@c-h-

Description

@c-h-

Problem

When OpenCode hangs (empty provider response, Ohm circuit breaker stuck), the wrapper shell stays alive but the inner process is stuck in kevent64 doing nothing. agentctl reports the session as 'running' because the PID exists. No completion webhook fires. The supervisor never triggers. The PR is abandoned.

Proposed Solution

1. Configurable max session runtime

--timeout <seconds> flag. After the timeout, agentctl kills the session and fires the completion webhook with a timeout status.

2. Output-based health probe

If a session produces no output for >N minutes (configurable, default 15min), consider it stalled. Fire a warning event or kill+callback.

3. PID vs process health

Don't just check if the wrapper PID exists — check if the inner OpenCode process has open network connections or has produced output recently.

Impact

This is the backstop for all silent agent failures. Without it, any provider outage or model error causes sessions to hang forever.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions