Skip to main content
← Back to list
01Issue
BugShippedSwamp Club
Assigneeskeeb

Relationships

#491 Telemetry never retroactively credits a device's pre-association history

Opened by keeb · 5/30/2026· Shipped 5/30/2026

Summary

The telemetry attribution pipeline records the binding between a per-device distinct_id and a username (in identity_map) but never applies that binding to history. All CLI usage emitted on a machine before it was associated with an account is permanently excluded from that account's profile and score. A user who uses swamp anonymously and creates an account later loses 100% of their pre-account activity — even though the intended design is for association to retroactively claim it.

Intended behavior

Each install has a stable per-device id (~/.config/swamp/identity.json) sent on every event. identity_map is a username -> {distinct_ids} map. Because the device id is stable across the anonymous -> authenticated transition, associating any machine with an account is meant to retroactively claim that machine's entire history into the account's score — and likewise for each additional machine (it is a 1:N map by design).

Actual behavior

Association only ever affects events ingested after the binding exists. Pre-binding history is stranded forever.

Root cause (server-side: swamp-club telemetry service)

The username is resolved and frozen onto each event at ingest time, and the per-username aggregate that backs the score (username_metrics) is accumulated forward, one authenticated event at a time. identity_map is consulted only at ingest to label the incoming event — it is never read again to re-attribute already-stored events. Specifically:

  • The bind point (adding a distinct_id to a username's map) has no consumer that folds that device's existing history into the user's aggregate.
  • The serving-path identity_map -> user_metrics merge — the one place that aggregates across a user's devices — is only a fallback, and is permanently shadowed by the username_metrics doc that the binding event itself creates.
  • Reindex-by-username matches the frozen username column only; it does not expand a username to its mapped distinct_ids, so it re-reads the same already-attributed set and recovers nothing.

The CLI is doing the right thing: it already sends the stable device id plus the credential on every authenticated invocation, so the server learns the binding on the user's next authenticated command. The defect is entirely server-side — the binding is recorded and never acted upon retroactively.

Evidence (production)

  • User mgreten: fully associated and active daily since 2026-05-17, but the score aggregate starts 2026-05-18 — the single pre-account day (2026-05-17, 49 cli_invocations) is missing. The device is in the map; the day is simply never folded in.
  • User gnordin: used the CLI anonymously for several days before creating an account, then associated. Profile shows 0 activity and 0 cli events despite that history existing under the device's per-distinct_id metrics. Reported the experience directly: "when I started using it I did not make an account — that came later. How do I connect my account to my swamp instance that I have used and plan to use more?"

Impact

Every user who evaluates swamp before signing up — the try-then-adopt path — sees an empty profile and score after associating, which silently understates real usage and breaks the "your account inherits your machine history" promise. The data is intact (per-device metrics exist); it is just never attributed.

Expected fix approach (high level)

Make the per-username aggregate a materialized projection of identity_map rather than a forward-only log: when a device is bound to a username, fold that device's existing history into the user's aggregate, and recompute the user's aggregate from its mapped devices on bind and on subsequent ingest. Add a one-time backfill that recomputes every existing mapping so already-associated devices are credited. No CLI change is required — the binding signal already arrives on the next authenticated command. As a secondary hardening, restrict device binding to currently-unclaimed device ids.

02Bog Flow
OPENTRIAGEDIN PROGRESSSHIPPED+ 2 MORETRIAGE+ 8 MOREREVIEW+ 3 MOREPR_MERGED+ 1 MORENOTIFICATION_SKIPPED

Shipped

5/30/2026, 6:17:29 PM

Click a lifecycle step above to view its details.

03Sludge Pulse
keeb assigned keeb5/30/2026, 5:29:57 PM

Sign in to post a ripple.