Skip to content

feat(transform-dedup): Add file-based persistence for dedup state #136

@warren-t-c

Description

@warren-t-c

Problem

@orgloop/transform-dedup (v0.1.10) uses an in-memory Map for dedup state. On service restart, the seen map is cleared, causing previously-deduplicated events to pass through again and trigger duplicate route firings.

Current behavior

From the README:

"State is in-memory only and lost on restart."

Config schema: store: string — Storage backend. Only "memory" currently.

Any service restart (config change, update, crash recovery) loses all dedup state, causing events within the dedup window to be re-delivered.

Proposal

Add store: "file" option, using the same FileCheckpointStore infrastructure from #107.

Suggested config

transforms:

  • name: dedup
    type: package
    package: "@orgloop/transform-dedup"
    config:
    window: "15m"
    store: "file"
    fields:
    - "type"
    - "source"
    - "provenance.platform_event"
    - "provenance.issue_number"
    Requirements
  1. Persist hash + timestamp to file on each pass/drop decision
  2. On startup, load persisted hashes, filter out entries older than window
  3. Atomic file writes (write to temp, rename) — same as connector checkpoints
  4. Files in .orgloop/checkpoints/dedup-.json
  5. store: "memory" remains default for backward compat
  6. Periodic expired-entry cleanup (existing timer pattern)

Context

Related: #107 — extends connector checkpoint persistence to the dedup transform.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions