-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Problem
@orgloop/transform-dedup (v0.1.10) uses an in-memory Map for dedup state. On service restart, the seen map is cleared, causing previously-deduplicated events to pass through again and trigger duplicate route firings.
Current behavior
From the README:
"State is in-memory only and lost on restart."
Config schema: store: string — Storage backend. Only "memory" currently.
Any service restart (config change, update, crash recovery) loses all dedup state, causing events within the dedup window to be re-delivered.
Proposal
Add store: "file" option, using the same FileCheckpointStore infrastructure from #107.
Suggested config
transforms:
- name: dedup
type: package
package: "@orgloop/transform-dedup"
config:
window: "15m"
store: "file"
fields:
- "type"
- "source"
- "provenance.platform_event"
- "provenance.issue_number"
Requirements
- Persist hash + timestamp to file on each pass/drop decision
- On startup, load persisted hashes, filter out entries older than window
- Atomic file writes (write to temp, rename) — same as connector checkpoints
- Files in .orgloop/checkpoints/dedup-.json
- store: "memory" remains default for backward compat
- Periodic expired-entry cleanup (existing timer pattern)
Context
Related: #107 — extends connector checkpoint persistence to the dedup transform.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels