Raxx · internal docs

internal · gated

Threat Model — Account Merge D1 (Who Picks Primary?)

Date: 2026-06-05T00:00:00Z
Author: security-agent
Status: Input for operator D1 decision — NOT a design document. Operator + detection-engineer outputs required before D1 is locked.
Scope: Decision D1 only, with cross-references to locked decisions D2 (soft-delete/tombstone) and D3 (14-day reversal window).
Architecture doc: docs/architecture/account-merge-2026-06-05.md (PR #3256, Epic #3245)
Detection-engineer memo: docs/security/threat-models/2026-06-05-account-merge-d1-detection.md (in parallel — path TBD)
Paired with: detection-engineer running behavioral-detection analysis concurrently. This memo owns boundary enforcement; detection-engineer owns signal shapes. Both are required inputs to the D1 decision.


Executive Summary

The three D1 candidate flows have meaningfully different attack surfaces. The per-flow attack-surface ranking (1 = smallest, 3 = largest) is:

Rank Flow Attack surface summary
1 (smallest) CS-only Primary picks are not customer-influenceable; attack surface reduces to CS social engineering only
2 Hybrid Adds a customer-swap window, but its pre-code-consumption lock is a strong invariant if implemented correctly
3 (largest) Customer-choice during verification Customer controls primary selection while simultaneously in possession of codes; social engineering + account compromise attacks compose

Recommendation on rejection: Customer-choice during verification (Flow 3) should be rejected for v1. The specific attack that justifies this is detailed in Section 5. The swap and verification steps happen in the same session window, enabling a single-inbox compromise to both verify and flip the primary in one authenticated pass.

The five most impactful mandatory invariants (regardless of D1 choice):

  1. Both codes must be consumed before primary is locked, and primary is locked atomically at the point the second code is consumed — no update-then-lock race.
  2. A CS user who initiates a merge must be prohibited from consuming verification codes on behalf of either customer.
  3. The customer whose email owns Account A must supply Account B's code (and vice versa) — session identity and code directionality are enforced together, not independently.
  4. Reversal requires a different CS user than the initiator (four-eyes principle, in-scope for D3 even though D3 is locked).
  5. Every state transition on account_merges and every per-table row re-key writes an audit event before the state change commits, not after.

D2 / D3 architectural flaws found: Two issues warrant reopening. See Section 9.


1. Flow Descriptions (as assessed)

Restating from the architecture doc for traceability:

Flow 1 — Hybrid: CS nominates primary. Customer may request one swap before either code is consumed. Swap locks after first code is verified.

Flow 2 — CS-only: CS nominates primary. Customer has no input.

Flow 3 — Customer-choice during verification: Customer picks primary during the cross-verification flow (the active code-entry window).

The architecture doc's proposed design is Flow 1 (Hybrid). The threat model treats all three as candidates.


2. Attack Trees by Scenario

2.1 One-Inbox Compromise — Attacker Controls Account B's Email, Not Account A's

The attacker's goal: cause the merge to complete with Account B as the primary so that B's account data, passkeys, and session surface survive the merge.

Flow 1 (Hybrid):

The swap window is the attack surface. The sequence:

  1. Attacker compromises Account B's email inbox.
  2. CS initiates merge. Merge-initiated email goes to both inboxes. Attacker intercepts B's email (contains code_B). Account A's legitimate user receives code_A.
  3. The swap affordance is triggered via POST /merges/{id}/swap-primary. The architecture doc §11 note 2 specifies this is a Console button (CS action), not customer-facing. This is the critical implementation question: who can call swap-primary? - If swap-primary is CS-triggered only: attacker controlling B's inbox cannot call it. Attack requires social engineering of CS (see 2.2). - If swap-primary is customer-callable (authenticated session on either account): attacker with a session on Account B can call POST /merges/{id}/swap-primary before any code is consumed, swapping themselves to primary. They then verify using code_B (which they intercepted) and wait for Account A's user to verify. Account B becomes primary.
  4. The architecture doc exposes POST /merges/{id}/swap-primary as a customer-facing route (Customer-facing, D1 swap, before any code consumed). This is the attack-enabling surface.

Risk in Flow 1: If /merges/{id}/swap-primary is callable by any authenticated session belonging to either account, a one-inbox compromise is sufficient to flip the primary. The attacker does not need to verify — they only need to call swap before any code is consumed. They can then verify normally with their intercepted code. This is an exploitable path in the current design spec.

Mitigation within Flow 1: Restrict swap-primary to CS-only (Console endpoint, not customer-facing). This collapses the customer-swap UI from a customer-callable API to a CS-initiated Console action. The open question in §11 note 2 of the architecture doc must be resolved as: Console button only.

Flow 2 (CS-only):

No customer-influenceable swap. Attacker controlling B's inbox can intercept code_B. They cannot change the primary designation (CS-only). However:

The DoS-on-merge vector (attacker blocks merge indefinitely by holding code_B and never verifying) exists in all three flows but is more relevant to think about in CS-only because it is the only leverage an Account B inbox attacker has.

Flow 3 (Customer-choice during verification):

Verification and primary selection happen in the same window. Attacker controlling Account B's email:

  1. Receives code_B, which Account B's authenticated session uses to verify.
  2. During verification, Account B's user picks Account B as primary.
  3. Account A's user must also verify (they supply code_B — wait, per the design, Account A supplies code_B). Account A still has to supply B's code.

Wait — reviewing the sequence diagram: primary user calls /verify supplying code_B (the secondary's code). Secondary calls /verify supplying code_A. So:

So the cross-verification scheme means a one-inbox compromise alone cannot complete a merge in any flow. The attacker needs to cause A's user to also verify, or they need to intercept code_A too.

Revised assessment of one-inbox compromise across all three flows: A pure one-inbox compromise cannot complete a merge without cooperation from the other account's holder. However:

Conclusion for 2.1: One-inbox compromise attack succeeds in Flow 1 only if swap-primary is customer-callable. Making it CS-only eliminates this vector. Flows 2 and 3 are not vulnerable to one-inbox-only primary flip (though Flow 3 has other issues per Section 2.6).


2.2 CS-Side Social Engineering — Attacker Convinces CS to Initiate or Flip Primary

The attacker's goal: convince a CS rep to (a) initiate a merge they should not, or (b) flip the primary designation.

Flow 1 (Hybrid):

Two social engineering surfaces: - Initiation: Attacker calls/emails CS claiming to be Account A's holder and asks for a merge with Account B. CS cannot verify identity at this point — the design relies on the subsequent cross-verification emails. If CS initiates, both Account A and Account B receive emails. If the attacker controls neither inbox, they cannot verify. If they control one inbox, see 2.1. The cross-verification email to both accounts is the defense here. - Primary flip: Attacker calls CS asking to flip the primary before codes are consumed. CS calls the Console swap-primary button. Attacker convinces CS that "Account B should be primary because that's where my real trading history is."

The primary-flip via CS social engineering does not require the attacker to have email access at all — it only requires convincing a CS rep. The mitigations:

Flow 2 (CS-only):

Every primary designation is a CS action. This means social engineering CS on primary selection is the only attack vector for primary manipulation — but it applies to every merge, not just a subset. The attack is identical to Flow 1's primary-flip path. The difference is that Flow 2 provides no customer-side defense (no customer swap, no customer-visible notification that they can trigger before the lock).

Flow 2 is not meaningfully more or less vulnerable to CS social engineering than Flow 1 on this specific vector. It simply removes the customer-swap affordance that Flow 1 provides.

Flow 3 (Customer-choice during verification):

CS initiates the merge but cannot enforce the primary. A social engineer who wants to flip the primary to their account needs only to authenticate as that account and pick their account as primary during verification — no CS manipulation required. CS social engineering as a vector is actually reduced in Flow 3 for primary selection, but this is because the attack vector moves to the customer layer.


2.3 Customer-Initiated Abuse — Bad Actor with Legitimate Access to One Account

The attacker's goal: absorb a victim's account into their own by triggering a merge.

This scenario requires CS as an intermediary — customers cannot initiate merges. The attacker must convince CS to initiate the merge in the first place. This is covered by 2.2. The cross-verification requirement means the victim receives an email when the merge is initiated and must supply a code from the other account's email. The victim can simply not verify, and the merge stalls.

The real risk here is a variant: the attacker IS a legitimate user of Account B, and they want to steal Account A's data by absorbing it into their merge. They:

  1. Call CS and say "I have two accounts, please merge them." CS initiates.
  2. Victim (real Account A holder) receives a merge-initiated email. If the victim has been phished or their email is compromised, they may verify without understanding. More realistically, a sufficiently sophisticated attacker could craft a pretext: "my partner is going to verify from their device."

Mitigations that must hold across all flows:


2.4 Race Conditions — Timing the Code Entry Window

The hybrid swap-then-verify race:

In Flow 1, the swap window closes after the first code is consumed. The race condition:

  1. Account A's user verifies (consumes code_B). primary_verified_at is set.
  2. Simultaneously, attacker calls swap-primary.

If the lock check is: "set primary_swap_requested = true if no code has been consumed yet," and Account A's verification and the swap call arrive at the same time, two outcomes are possible depending on transaction ordering:

The architecture doc does not explicitly describe the transactional relationship between swap-primary and verify. If these are not in the same serializable transaction, the race window exists.

Implementation requirement to close this race:

swap-primary and the first verify call must both lock the account_merges row with SELECT FOR UPDATE or equivalent. The swap must atomically check primary_verified_at IS NULL AND secondary_verified_at IS NULL and either succeed or fail with no gap. If using Postgres, this is a single UPDATE with a WHERE clause and a RETURNING check:

UPDATE account_merges
SET primary_user_id = $new_primary,
    secondary_user_id = $new_secondary,
    primary_swap_requested = true
WHERE id = $merge_id
  AND primary_verified_at IS NULL
  AND secondary_verified_at IS NULL
  AND status = 'initiated'

If the UPDATE affects 0 rows, the swap is rejected. This is atomic — no separate lock step needed. The first verify call uses the same row-level locking pattern.

The resend-then-verify race:

When a CS operator resends a code (minting a new code and discarding the old hash), there is a window where: 1. Customer submits old code (valid in transit, not yet discarded). 2. CS resend fires, old hash is overwritten. 3. Customer's old code fails verification despite being valid at send time.

This is not an attack vector but is a UX failure mode. The resend should atomically swap the hash in the same DB transaction. More importantly: if an attacker can trigger a resend (by convincing CS) while a legitimate code is in-flight, they can force a verification failure that extends the merge window and potentially buys time for a social engineering attempt.

Mitigation: Resend should only be callable by CS if the code has not yet been consumed (verified_at IS NULL). Resend on an already-verified code is a no-op with 409.

The in_progress state race (D2/D3 cross-concern):

Once status transitions to in_progress, the merge engine begins row re-keying. The 90-day soft-delete window (D2) starts from this point (or from merge_completed_at). If an attacker can somehow pause the merge at in_progress (through a crafted transaction failure), the secondary account's deleted_at is not set yet. If the attacker then triggers a D3 reversal from this inconsistent state, the merged data is partially migrated.

The architecture doc specifies that mid-flight failures roll back fully (Section 6.3) and status goes to failed. This should be verified in implementation: the transaction must either complete or fully roll back. There is no valid in_progress terminal state — if the transaction fails, the status must transition to failed, not remain in_progress.


2.5 Insider Threat from CS — Rogue CS Rep

The attacker is a CS rep who initiates merges to harvest data associations or facilitate credential stuffing.

What a rogue CS rep can do:

What a rogue CS rep cannot do (design must enforce):

Detection hook for insider threat: Every POST /internal/merges (initiation) must log the initiating CS operator's identity. If a single CS account initiates >N merges per day, or initiates merges where neither account matches an open FreeScout ticket, this is a detection signal. The detection-engineer's parallel output should include this signal shape.

Four-eyes principle for reversal (D3 interaction):

The architecture doc specifies reversal is "CS-only" but does not specify that the reversing CS user must differ from the initiating CS user. A rogue CS rep who initiates a fraudulent merge should not also be able to reverse it (to cover their tracks) or to complete-then-reverse (to harvest data associations between the two accounts without leaving a lasting merge artifact).

This is a gap in the locked D3 design. See Section 9.


2.6 Reversal Abuse — Attacker Reverses a Legitimate Merge

D3 is locked at 14-day, CS-only reversal. The attacker's goal here is to abuse the reversal to their advantage.

Reversal as data exfiltration:

A bad actor who caused a merge to complete can request a reversal within 14 days via CS social engineering. Reversal re-forks data to the secondary user_id. This restores both accounts but does NOT un-do the fact that the attacker observed both accounts' data during the in_progresscompletedreversed sequence. The reversal does not scrub the attacker's knowledge of the merged data set.

Reversal as disruption:

A legitimate merge is completed by two genuine users. An attacker who later compromises one email inbox calls CS and claims the merge was unauthorized. CS initiates a reversal. The real user's merged-account state is disrupted.

Mitigation: Reversal should require the same cross-verification step as initiation — both email addresses must confirm the reversal before it executes. The current D3 design requires only CS action with no customer confirmation for reversal. This is a gap. See Section 9.

D3 / D2 interaction — reversal during soft-delete window:

D2 specifies soft-delete for 90 days, tombstone after. D3 specifies 14-day reversal. These are compatible: within 14 days, deleted_at is set but tombstoned_at is not, and reversal can clear deleted_at. After 14 days, reversal is impossible (409 from the endpoint). After 90 days, tombstone fires. These windows are correct and do not conflict.

However: the architecture doc says reversal "re-forks all MERGE-classified rows back to the secondary user_id." It does NOT address ASK_USER rows where the customer chose to DISCARD data from the secondary ("It does not rebuild data that was classified ASK_USER if the customer chose to discard it"). An attacker who triggered a merge (social engineering) and caused a customer to discard their own data via the ASK_USER flow, then triggers a reversal, leaves the customer with irreversible data loss. The merge reversal is not a full restore. This is a customer-trust issue, not a security vulnerability per se, but it means reversal is not a true "undo" and customers should be told so explicitly at reversal time.


2.7 Tombstone Abuse — Re-Creating an Account at the Secondary Email Post-Tombstone

D2: secondary account is soft-deleted for 90 days, then tombstoned. The user_redirects table persists forever.

The attack:

  1. Merge completes. Secondary account is soft-deleted (deleted_at set, deleted_reason = 'merged').
  2. After 90 days, tombstone fires. PII columns are nulled. user_id row persists for FK integrity.
  3. An attacker (or the original secondary account holder, or a new user who happens to use the same email address) attempts to register a new Raxx account with the secondary account's email address.
  4. The email address is no longer visible in the users table (it was nulled at tombstone time).
  5. If email uniqueness is checked against the live email column only, a new account could be created at the same email address. This new account would NOT be caught by the user_redirects middleware (which matches on from_user_id, not on email).

Conditions for this to be a meaningful attack:

Risk: Low for a bad actor (they cannot re-claim the merged account's data, which lives on the primary now). Higher as a data-integrity issue: a legitimate user whose account was merged may try to re-register with their old email address after 90+ days if they have forgotten about the merge. They would create a new clean account and lose all history.

Mitigation: The tombstone process should NOT null the email column (or should maintain a separate tombstoned_emails table) to allow email uniqueness enforcement to continue working. Alternatively, tombstone should set a flag (tombstoned = true) and the registration flow must check email = $email AND tombstoned = false rather than just email = $email AND deleted_at IS NULL.

This is an architectural gap in the locked D2 design. See Section 9.


3. Invariant Analysis

3.1 "Both codes are required for any merge to complete"

How it holds: The state machine requires both primary_verified_at IS NOT NULL AND secondary_verified_at IS NOT NULL before transitioning from initiated to verified. The merge engine only runs from verified status.

How an implementation mistake breaks it:

Verification point: The merge-engine dispatch query should be:

SELECT * FROM account_merges
WHERE id = $id
  AND status = 'initiated'
  AND primary_verified_at IS NOT NULL
  AND secondary_verified_at IS NOT NULL

If the merge engine uses any looser check, it is exploitable.

3.2 "Once a code is consumed, the primary is locked" (Hybrid invariant)

How it holds: primary_swap_requested is the swap flag. The swap endpoint checks primary_verified_at IS NULL AND secondary_verified_at IS NULL. Once either is non-null, swap fails.

How an implementation mistake breaks it:

Required invariant: The swap endpoint must use the pattern:

UPDATE account_merges
SET primary_user_id = $new_primary_id,
    secondary_user_id = $new_secondary_id,
    primary_swap_requested = TRUE
WHERE id = $merge_id
  AND status = 'initiated'
  AND primary_verified_at IS NULL
  AND secondary_verified_at IS NULL
RETURNING id

If RETURNING yields no row, the swap is rejected.

3.3 "Codes are argon2-hashed at rest"

The architecture doc specifies argon2. This must be verified in the migration and in the code path.

Verification points: - primary_code_hash and secondary_code_hash columns store argon2 hashes. - At no point in the flow is a plaintext code written to account_merges, to the audit log, to collision_data, or to any log output. - The verify endpoint uses argon2 verify (not comparison of a re-hashed value) to resist timing attacks. - Resend must generate a new random code, argon2-hash it, and overwrite the old hash column — never log or store the plaintext.

One implementation subtlety: argon2 has a configurable work factor. The code generator and verifier must use the same work factor and salt length as the passkey backup code flow (if one exists) or define its own explicit parameters. A weak argon2 configuration (e.g., low iteration count, small memory cost) reduces the brute-force resistance of the 8-character code space. An 8-character alphanumeric code has ~2.8 trillion combinations — brute-force is infeasible if argon2 parameters are appropriately set (recommended: argon2id, t=2, m=65536, p=2).

3.4 "Reversal requires a different CS user than the initiator" (four-eyes principle)

Current design status: NOT specified. The architecture doc says "CS-only, audit-logged" for D3 but does not require the reversing CS user to differ from the initiating CS user.

Why this matters: An insider threat who initiates a fraudulent merge and then reverses it covers the primary observable trace (the completed merge). The reversal audit log still exists, but if the same operator initiated and reversed, it looks like a self-correcting error. Four-eyes would require a second CS user to approve the reversal.

This is a gap in the locked D3 design. Recommendation: require reversal_initiated_by != initiated_by_cs at the API layer. If only one CS user exists (current v1 state), reversal requires operator-level approval (a second console session with customers:merge:write). This is documented as an open D3 issue in Section 9.

3.5 "CS cannot verify on behalf of the customer"

Current design analysis: The customer-facing /merges/{id}/verify route requires the caller's session to belong to either primary_user_id or secondary_user_id. CS Console routes proxy to internal Raptor endpoints, not to the customer-facing routes.

The gap: The architecture doc does not explicitly prohibit a CS internal endpoint that transitions status directly from initiated to verified, bypassing the cross-verification requirement. As long as no such endpoint exists, the invariant holds by design omission. But the design must explicitly state: no internal endpoint may advance account_merges.status past initiated without both primary_verified_at and secondary_verified_at being set by the customer-facing verify route.

If POST /internal/merges/{id}/cancel is the only internal mutation on initiated-state merges (besides initiation itself), this is satisfied. But if a future "force complete" capability is added for stuck merges, it would break this invariant.


4. RBAC and Audit Posture

4.1 Minimum-Privilege Role Shape

The architecture doc specifies customers:merge:write for all mutating operations. This is a single permission bit that covers initiation, cancellation, and reversal. For minimum privilege and audit clarity, these should be separate permission bits:

Operation Recommended permission Rationale
Initiate a merge customers:merge:initiate Lower sensitivity — triggers customer email; cannot complete without customer action
Cancel a merge customers:merge:cancel Similar sensitivity to initiate; CS fixes their own mistake
Reverse a completed merge customers:merge:reverse Higher sensitivity — modifies completed data. Should be a separate bit with a higher role requirement
View merge records customers:merge:read Already separate in architecture doc
Swap primary (if CS-triggered) customers:merge:swap-primary If swap is CS-only (as recommended), needs its own bit for auditability

The minimum separation that matters for security: customers:merge:reverse must require a different, higher privilege bit than customers:merge:initiate. A support agent who can initiate merges must not automatically be able to reverse them.

4.2 Can the Initiating CS User Enter Codes on Behalf of Customers?

The architecture doc establishes that /merges/{id}/verify requires the caller's session to belong to primary_user_id or secondary_user_id. CS sessions belong to console_admins, not users. Therefore, a CS session cannot call the customer-facing verify endpoint at all — the session ownership check structurally prevents it.

This invariant holds as designed, but must be explicitly tested. If the session middleware resolves primary_user_id lookup against both the users table and the console_admins table, a CS user could masquerade. The middleware must check against users only.

Required: an explicit integration test that asserts a CS console session token receives 403 when calling POST /merges/{id}/verify. This test must be in the merge feature's test suite and must not be removed.

4.3 Audit-Log Granularity Required

For a security review of any single merge to be completable in under 5 minutes, the audit trail must answer:

Question Required audit event Event fields
Who initiated? merge.initiated merge_id, cs_operator_id_hash, primary_user_id, secondary_user_id, freescout_ticket_id, timestamp
Was a swap requested? By whom? When? merge.primary_swapped merge_id, actor_type (cs or customer), actor_id, old_primary_id, new_primary_id, timestamp
Who verified first? merge.primary_verified merge_id, verifying_session_user_id, timestamp
Who verified second? merge.secondary_verified Same
Did any resend happen? merge.code_resent merge_id, resending_cs_operator_id_hash, account_side (primary/secondary), resend_count_after, timestamp
When did merge execute? merge.engine_started, merge.engine_completed merge_id, tables_touched, rows_rekeyed_count, timestamp
Was there a reversal? Who initiated it? Who approved (if four-eyes)? merge.reversal_initiated, merge.reversal_approved, merge.reversal_completed merge_id, actor_id, approver_id, timestamp
Did any verification attempt fail? merge.verification_failed merge_id, attempt_number, session_user_id, failure_reason, timestamp
Was a cancellation triggered? merge.cancelled merge_id, cancelled_by_actor_type, cancelled_by_id, reason, timestamp

Events merge.initiated through merge.verification_failed are the signals the detection-engineer's parallel output will monitor. These events must exist in customer_audit_events (not just in account_merges row fields) so the audit trail is immutable and queryable via the audit log UI.

Critical ordering rule: Audit events must be written BEFORE the state transition commits. If the state transition succeeds and the audit write fails, the state change has no audit trail. Use a single database transaction that includes both the state update and the audit insert. If the audit insert fails, the transaction rolls back and the state change is rejected with a 500.


5. Per-Flow Security Posture Comparison

Attack Surface Rankings

Flow 1 — Hybrid (proposed design)

Attack surfaces: - POST /merges/{id}/swap-primary customer-callable path (if not restricted to CS-only) - Swap-verify race window (between swap call and first code consumption) - CS social engineering on swap request - Code resend timing (extending the attack window up to 5×24h)

Residual attack surface with recommended mitigations applied (swap restricted to CS-only, atomic lock, no CS verify-on-behalf): Reduces to CS social engineering only, same as Flow 2.

Without mitigations (as currently specified in the architecture doc): swap-primary is customer-callable, creating a one-inbox-compromise primary-flip path.

Flow 2 — CS-only

Attack surfaces: - CS social engineering on initiation and primary designation (only attack surface) - DoS-on-merge via code non-response (inbox compromise blocks merge indefinitely)

No customer-influenceable primary flip. Smallest inherent attack surface. CS social engineering is the primary vector for all flows.

Flow 3 — Customer-choice during verification

Attack surfaces: - Customer-callable primary selection during active code-entry window - Single-inbox compromise + customer cooperation allows primary flip even if attacker only controls one inbox and Social Engineers Account A's user into cooperating - Primary selection and code consumption happen in the same authenticated session window, which means a session compromise on either account during verification gives an attacker both verification power AND primary selection power simultaneously

The specific attack that justifies rejecting Flow 3:

In Flow 3, the verification session is the same session window in which the customer selects the primary. An attacker who has achieved session hijacking of Account B (not just email compromise — session token theft, e.g., via XSS or stolen device) during the active verification window can:

  1. Call POST /merges/{id}/verify as Account B to consume code_B (from B's email, which the attacker has access to from the session).
  2. In the same session, designate Account B as primary.
  3. Wait for Account A's user to verify (who is acting in good faith).
  4. Merge completes with B as primary.

The attack does not require persistent session compromise — only during the 24-hour verification window. In Flow 1, a session compromise during this window does NOT allow primary flip if swap-primary is CS-only. In Flow 3, it does.

Flow 3 should be rejected. The coupling of verification power and primary-selection power in a single customer session window creates a broader exploitable window than either alternative. This is not compensatable within the Flow 3 design without fundamentally separating the primary-selection step from the verification step, which would make it equivalent to Hybrid.

Per-Flow Ranking

Rank Flow Basis
1 (smallest) CS-only Attack surface limited to CS social engineering; no customer-influenceable primary designation
2 Hybrid (with mitigations applied) Equivalent to CS-only after swap is restricted to CS-only and atomic locking enforced; the swap step adds one additional CS social-engineering surface but no direct customer attack surface
3 (largest) Customer-choice during verification Session-compromise-during-window enables primary flip without separate social engineering; verification power and selection power co-reside in one session

6. Mandatory Security Boundaries (Regardless of D1 Choice)

M1 — Both codes required; no CS bypass to verified state

No internal endpoint may set status = 'verified' or transition to in_progress without both primary_verified_at and secondary_verified_at being set by the customer-facing verify route.

M2 — CS cannot call the customer-facing verify endpoint

Session middleware for /merges/{id}/verify must check user_id IN (primary_user_id, secondary_user_id) against the users table only, not console_admins. Enforced by explicit integration test.

M3 — Swap is CS-only (applicable to Flow 1 if chosen)

POST /merges/{id}/swap-primary must be an internal Console endpoint, not a customer-facing route. The architecture doc currently exposes it as customer-facing. This must be corrected before implementation.

If the operator chooses Flow 3 (not recommended), this invariant is vacuously satisfied (there is no swap endpoint). But Flow 3 should be rejected for the reasons in Section 5.

M4 — Atomic swap with row-level lock

Swap and verify both use UPDATE ... WHERE ... RETURNING with all guard conditions in the WHERE clause. No TOCTOU gap.

M5 — Audit events written in the same transaction as state changes

Audit inserts are not fire-and-forget. If the audit insert fails, the state change rolls back. No state change exists without an audit trail.

M6 — Argon2id with explicit parameters

Codes stored as argon2id with t≥2, m≥65536, p≥2. Implementation must specify these parameters explicitly, not use library defaults. Plaintext codes must never appear in logs, audit events, or database columns other than at the moment of generation (in memory only).

M7 — No-merge email carries a clear cancel CTA

The merge-initiated email to both accounts must include a one-click "I did not request this — cancel this merge" link. This is a signed time-limited token (not an auth credential) that transitions the merge to cancelled. The token expires when the verification codes expire (24h). This is a required customer-trust element, not optional.

M8 — Code verification direction is enforced at the session layer

When Account A's session calls /verify, the submitted code is verified against secondary_code_hash (Account B's code), not primary_code_hash. The session-to-hash routing must be explicit:

session.user_id == primary_user_id → verify submitted code against secondary_code_hash
session.user_id == secondary_user_id → verify submitted code against primary_code_hash

If this routing is reversed or ambiguous, the cross-verification model breaks.


7. Customer-Trust UX Requirements

7.1 Cancel Path for "I Did Not Initiate This"

Status: Not explicitly in scope for v1 per the architecture doc. Must be in scope.

The merge-initiated email goes to both accounts. The customer must have a path to cancel without needing to call CS. CS-only cancellation for this scenario creates a social-engineering surface (attacker calls CS to block a legitimate cancellation attempt).

Required: A signed cancel token in the merge-initiated email (per M7 above) that transitions the merge to cancelled when any account holder clicks it, regardless of which account. Both parties can cancel; first cancellation wins.

The cancel token must not be an auth credential. It is a single-use HMAC-signed URL token that references the merge record. It does not log the user into their account or grant any session. It only cancels the merge.

7.2 24-Hour Cooling-Off Window

Recommendation: No. Adding a 24-hour delay between initiation and first code-acceptance would extend the attack window (more time for an attacker to intercept emails, more time for social engineering), not reduce it. The cooling-off concept provides customer benefit only if customers are expected to receive merge-initiated emails and not act on them promptly. For account-merge the right model is: act promptly or cancel; don't act within 24h and the codes are still valid but a warning email fires at 12h.

A 12-hour reminder email ("Your merge is still pending — verify or cancel here") with a cancel CTA achieves the customer-trust goal without extending attack windows.

7.3 Customer Revocation of Merge-In-Progress (Pre-Completion)

As noted in 7.1: either account holder can cancel before any code is consumed (per the architecture doc: POST /merges/{id}/cancel available while status is initiated). After the first code is consumed, the architecture doc does not provide a cancel path — only D3 reversal post-completion.

Gap: Once one party has verified but the other has not, neither party can cancel. The merge is in a partial-verification limbo. If the non-verified party has changed their mind, they can simply not verify (the merge will expire after 24h). This is acceptable for v1, but should be documented: "Not verifying by code expiry time is equivalent to cancellation."

However: if an attacker has verified as one party and is waiting for the other party to verify (which will complete the merge), the other party currently has no active cancel path — only "don't verify." The merge-initiated email's cancel CTA (M7) must remain active and functional even after one party has verified, allowing the non-verified party to explicitly cancel. This should be tested.


8. Required Audit Event Additions for Detection-Engineer Coordination

The detection-engineer's parallel work will define the monitoring payload shapes. This memo establishes that the following state transitions MUST exist as discrete events for them to hook:

Event name Trigger Minimum payload
merge.initiated POST /internal/merges succeeds merge_id, cs_operator_id_hash, primary_user_id, secondary_user_id, ticket_id, initiated_at
merge.code_sent Postmark send call for each code merge_id, account_side, postmark_message_id, sent_at
merge.verification_attempt Any call to POST /merges/{id}/verify merge_id, verifying_user_id, account_side, succeeded (bool), attempt_number, timestamp
merge.primary_swapped Swap endpoint succeeds merge_id, actor_type, actor_id, old_primary_id, new_primary_id, timestamp
merge.code_resent Resend endpoint merge_id, account_side, resend_count_after, cs_operator_id_hash, timestamp
merge.both_verified Second verification completes merge_id, primary_verified_at, secondary_verified_at, timestamp
merge.engine_started Merge engine begins transaction merge_id, timestamp
merge.row_rekeyed Each table operation in merge engine merge_id, table_name, row_count, policy, timestamp
merge.engine_completed Merge transaction commits merge_id, tables_touched_count, rows_rekeyed_total, timestamp
merge.engine_failed Merge transaction rolls back merge_id, error_detail, timestamp
merge.cancelled Cancel endpoint succeeds merge_id, cancelled_by_type (cs/customer/system), cancelled_by_id, cancel_reason, timestamp
merge.reversal_initiated Reversal endpoint called merge_id, reversing_cs_operator_id_hash, initiated_by_cs (original), is_four_eyes_satisfied (bool), timestamp
merge.reversal_completed Reversal commits merge_id, rows_restored_count, timestamp

Coordination note: The detection-engineer should define the monitoring rules (e.g., "alert if merge.verification_attempt with succeeded=false occurs >5 times on a single merge") against these event shapes. The merge engine must emit all of the above; the detection rules are the detection-engineer's deliverable.


9. Architectural Issues Found in Locked Decisions D2 and D3

These issues were found in the course of this threat model. They interact with D1 but are not D1-specific. Both warrant reopening.

Issue D2-1 — Tombstone nulls email column; new-account-at-same-email registration gap

Severity: MEDIUM
Decision affected: D2 (locked)

As described in Section 2.7: after 90 days, the tombstone job nulls PII columns including email. A new registration attempt with the same email address after tombstone may succeed, creating a new clean account at an address that was previously merged. This is a data-integrity issue and a potential customer-confusion issue.

Recommendation: Tombstone should maintain a separate tombstoned_emails table (or retain the email column in hashed form) to allow email uniqueness enforcement to continue working post-tombstone. The registration flow should check tombstoned_emails and return an appropriate error ("this email was previously associated with a merged account — if you believe this is an error, contact support").

Reopening decision: Operator should confirm D2 is updated to add this constraint before implementation begins.

Issue D3-1 — No four-eyes requirement on reversal

Severity: HIGH
Decision affected: D3 (locked)

As described in Section 2.5: the architecture doc specifies "CS-only, audit-logged" for reversal but does not require a different CS user to approve the reversal. A rogue CS operator who initiates a fraudulent merge can also reverse it, leaving a reversal audit trail that looks like a self-correcting error.

Recommendation: D3 should require reversal_initiated_by != initiated_by_cs at the API layer. The reversal endpoint should return 403 if the requesting CS operator is the same as the initiating operator. For v1 single-operator environments: escalate reversal to the operator account (break-glass equivalent), not to the same CS role that initiated.

Reopening decision: Operator should confirm D3 is updated before implementation begins. This is a HIGH-severity finding because it is the primary technical control against CS insider fraud on the reversal path.

Issue D3-2 — No customer confirmation required for reversal

Severity: MEDIUM
Decision affected: D3 (locked)

D3 reversal is CS-only. If a legitimate merge was completed by both parties, a subsequent social-engineering attack on CS (attacker impersonates one party and claims the merge was unauthorized) can cause a reversal without the other party's knowledge or consent.

Recommendation: Reversal should send notification emails to both the primary and former secondary accounts before executing. The emails should include a 24-hour window in which either party can block the reversal. If neither party blocks, the reversal proceeds. This is analogous to the initiation cross-verification flow but for reversal. It adds complexity but protects against CS social engineering on the reversal path.

Alternatively (simpler): At minimum, the reversal notification email to both parties should fire BEFORE the reversal executes, not after. As-designed, the reversal executes and emails fire as completion notifications. The timing matters: a pre-reversal notification with a short delay gives customers a chance to contact CS if the reversal is fraudulent.


10. Summary Tables

Per-Flow Attack-Surface Ranking

Rank Flow Key differentiating attack
1 — smallest CS-only All primary designation is CS action; one social-engineering surface
2 Hybrid (with swap restricted to CS-only) Same as CS-only after mitigation; M3 is mandatory
3 — largest Customer-choice during verification Session-compromise-during-window enables primary flip + verification in one pass

Five Most Impactful Mandatory Invariants

# Invariant Enforcement mechanism
M1 Both codes required; no CS bypass to verified No internal endpoint advances status past initiated without both verified_at values set
M2 CS cannot call customer verify route Session middleware checks users table only; integration test enforces
M3 Swap is CS-only swap-primary is an internal Console endpoint, not customer-facing
M4 Atomic swap with row-level lock UPDATE ... WHERE ... RETURNING pattern; no TOCTOU
M5 Audit events in same transaction as state changes Audit insert failure rolls back state change

Flows to Reject Outright

Flow 3 (Customer-choice during verification) — recommended for rejection.
Justification: session-compromise-during-verification-window grants simultaneous verification power and primary-selection power to an attacker. This cannot be mitigated without separating the selection step from the verification step, which makes it equivalent to Hybrid. The additional attack surface vs. Hybrid is material and not compensatable within the Flow 3 design.

D2/D3 Issues Warranting Reopening

Issue Severity Decision Recommendation
D2-1: Tombstone nulls email; new-account-at-same-email gap MEDIUM D2 Maintain tombstoned_emails table or retained hashed email for uniqueness enforcement
D3-1: No four-eyes on reversal HIGH D3 reversal_initiated_by != initiated_by_cs enforced at API layer
D3-2: No customer notification before reversal executes MEDIUM D3 Pre-reversal notification email with short hold window

This memo is a security-agent deliverable. It does not declare D1's winner — that decision belongs to the operator incorporating this output alongside the detection-engineer's parallel memo. The mandatory invariants in Section 6 apply regardless of which flow is chosen.