-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Problem
After ~4 days of continuous uptime, the OrgLoop daemon's HTTP control API stops responding on its configured port. The Node.js process remains alive (PID active, launchd sees it as healthy), but the HTTP server is dead. The last stderr entry before the failure is [github] Failed to poll workflow runs: The client is closed.
This means:
orgloop statusreports the daemon as running (reads PID file)- Port 4800 returns nothing (HTTP server dead)
- All webhook deliveries fail silently
- launchd
KeepAlivenever triggers (process didn't exit) - Manual
orgloop stop+ restart is required to recover
Environment
- macOS, Node 24.13.0
- 4 registered modules (kindo-engineering, personal, charlie-personal, content-engine)
- Connectors: github (polling), linear (polling, auth errors), openclaw (webhook target), agentctl, cron
- Uptime at failure: 3d 21h
Likely Cause
Resource exhaustion from accumulated connector errors:
- Linear connector: continuous
Authentication requirederrors every poll cycle for 4 days - GitHub connector (Kindo): hundreds of 403 responses polling
UsableMachines/monoPRs - These errors likely caused HTTP client pool exhaustion or socket leaks that eventually corrupted the daemon's internal HTTP server state
Suggested Fixes
- HTTP server health watchdog — periodically verify the control API is responsive; self-restart if not
- Connector error circuit breaker — back off exponentially on repeated auth/403 errors instead of retrying at full poll rate
- Graceful HTTP client recycling — periodically recreate HTTP clients to prevent long-lived connection pool degradation
- Process-level watchdog — expose a
/healthendpoint that launchd or an external monitor can probe, with auto-restart on failure
Workaround
Restart the daemon periodically or monitor port responsiveness externally.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels