Threat Model — Account Merge D1 (Who Picks Primary?)
Date: 2026-06-05T00:00:00Z
Author: security-agent
Status: Input for operator D1 decision — NOT a design document. Operator + detection-engineer outputs required before D1 is locked.
Scope: Decision D1 only, with cross-references to locked decisions D2 (soft-delete/tombstone) and D3 (14-day reversal window).
Architecture doc: docs/architecture/account-merge-2026-06-05.md (PR #3256, Epic #3245)
Detection-engineer memo: docs/security/threat-models/2026-06-05-account-merge-d1-detection.md (in parallel — path TBD)
Paired with: detection-engineer running behavioral-detection analysis concurrently. This memo owns boundary enforcement; detection-engineer owns signal shapes. Both are required inputs to the D1 decision.
Executive Summary
The three D1 candidate flows have meaningfully different attack surfaces. The per-flow attack-surface ranking (1 = smallest, 3 = largest) is:
| Rank | Flow | Attack surface summary |
|---|---|---|
| 1 (smallest) | CS-only | Primary picks are not customer-influenceable; attack surface reduces to CS social engineering only |
| 2 | Hybrid | Adds a customer-swap window, but its pre-code-consumption lock is a strong invariant if implemented correctly |
| 3 (largest) | Customer-choice during verification | Customer controls primary selection while simultaneously in possession of codes; social engineering + account compromise attacks compose |
Recommendation on rejection: Customer-choice during verification (Flow 3) should be rejected for v1. The specific attack that justifies this is detailed in Section 5. The swap and verification steps happen in the same session window, enabling a single-inbox compromise to both verify and flip the primary in one authenticated pass.
The five most impactful mandatory invariants (regardless of D1 choice):
- Both codes must be consumed before primary is locked, and primary is locked atomically at the point the second code is consumed — no update-then-lock race.
- A CS user who initiates a merge must be prohibited from consuming verification codes on behalf of either customer.
- The customer whose email owns Account A must supply Account B's code (and vice versa) — session identity and code directionality are enforced together, not independently.
- Reversal requires a different CS user than the initiator (four-eyes principle, in-scope for D3 even though D3 is locked).
- Every state transition on
account_mergesand every per-table row re-key writes an audit event before the state change commits, not after.
D2 / D3 architectural flaws found: Two issues warrant reopening. See Section 9.
1. Flow Descriptions (as assessed)
Restating from the architecture doc for traceability:
Flow 1 — Hybrid: CS nominates primary. Customer may request one swap before either code is consumed. Swap locks after first code is verified.
Flow 2 — CS-only: CS nominates primary. Customer has no input.
Flow 3 — Customer-choice during verification: Customer picks primary during the cross-verification flow (the active code-entry window).
The architecture doc's proposed design is Flow 1 (Hybrid). The threat model treats all three as candidates.
2. Attack Trees by Scenario
2.1 One-Inbox Compromise — Attacker Controls Account B's Email, Not Account A's
The attacker's goal: cause the merge to complete with Account B as the primary so that B's account data, passkeys, and session surface survive the merge.
Flow 1 (Hybrid):
The swap window is the attack surface. The sequence:
- Attacker compromises Account B's email inbox.
- CS initiates merge. Merge-initiated email goes to both inboxes. Attacker intercepts B's email (contains code_B). Account A's legitimate user receives code_A.
- The swap affordance is triggered via
POST /merges/{id}/swap-primary. The architecture doc §11 note 2 specifies this is a Console button (CS action), not customer-facing. This is the critical implementation question: who can callswap-primary? - Ifswap-primaryis CS-triggered only: attacker controlling B's inbox cannot call it. Attack requires social engineering of CS (see 2.2). - Ifswap-primaryis customer-callable (authenticated session on either account): attacker with a session on Account B can callPOST /merges/{id}/swap-primarybefore any code is consumed, swapping themselves to primary. They then verify using code_B (which they intercepted) and wait for Account A's user to verify. Account B becomes primary. - The architecture doc exposes
POST /merges/{id}/swap-primaryas a customer-facing route (Customer-facing, D1 swap, before any code consumed). This is the attack-enabling surface.
Risk in Flow 1: If /merges/{id}/swap-primary is callable by any authenticated session belonging to either account, a one-inbox compromise is sufficient to flip the primary. The attacker does not need to verify — they only need to call swap before any code is consumed. They can then verify normally with their intercepted code. This is an exploitable path in the current design spec.
Mitigation within Flow 1: Restrict swap-primary to CS-only (Console endpoint, not customer-facing). This collapses the customer-swap UI from a customer-callable API to a CS-initiated Console action. The open question in §11 note 2 of the architecture doc must be resolved as: Console button only.
Flow 2 (CS-only):
No customer-influenceable swap. Attacker controlling B's inbox can intercept code_B. They cannot change the primary designation (CS-only). However:
- If attacker does not want the merge to complete, they simply do not verify with code_B. The merge stalls at
initiatedand expires after 24h. - If the attacker wants to complete the merge (perhaps Account A has more value and they want to absorb it into B), they cannot make B primary. They can verify code_B (if they choose), but CS's primary designation stands.
- If the attacker wants to PREVENT a legitimate merge (Account B is legitimately a duplicate and the real user wants to merge), they can indefinitely block the merge by intercepting code_B and never verifying.
The DoS-on-merge vector (attacker blocks merge indefinitely by holding code_B and never verifying) exists in all three flows but is more relevant to think about in CS-only because it is the only leverage an Account B inbox attacker has.
Flow 3 (Customer-choice during verification):
Verification and primary selection happen in the same window. Attacker controlling Account B's email:
- Receives code_B, which Account B's authenticated session uses to verify.
- During verification, Account B's user picks Account B as primary.
- Account A's user must also verify (they supply code_B — wait, per the design, Account A supplies code_B). Account A still has to supply B's code.
Wait — reviewing the sequence diagram: primary user calls /verify supplying code_B (the secondary's code). Secondary calls /verify supplying code_A. So:
- To complete the merge, BOTH sides must verify using the other's code.
- An attacker who only controls B's inbox has code_B, not code_A. They cannot complete the merge alone.
- However, if Account B's user calls verify with code_A (which they received because... they don't have it — code_A went to A's inbox which attacker does NOT control), they cannot proceed.
So the cross-verification scheme means a one-inbox compromise alone cannot complete a merge in any flow. The attacker needs to cause A's user to also verify, or they need to intercept code_A too.
Revised assessment of one-inbox compromise across all three flows: A pure one-inbox compromise cannot complete a merge without cooperation from the other account's holder. However:
- The
swap-primaryendpoint in Flow 1 (if customer-callable) allows the attacker to flip primary designation without completing verification. This is a primary-flip without merge — which may have no material effect if the merge never completes. But if Account A's legitimate user then verifies, and B's attacker verifies, the merge completes with B as primary despite B being compromised. - The dangerous scenario is composite: attacker controls B's inbox AND A's legitimate user is actively cooperating with the merge (i.e., A wants to merge). In this case, A will verify. B's attacker can flip primary (if customer-callable), then verify. Merge completes with B as primary. Attacker absorbs A's data into B.
Conclusion for 2.1: One-inbox compromise attack succeeds in Flow 1 only if swap-primary is customer-callable. Making it CS-only eliminates this vector. Flows 2 and 3 are not vulnerable to one-inbox-only primary flip (though Flow 3 has other issues per Section 2.6).
2.2 CS-Side Social Engineering — Attacker Convinces CS to Initiate or Flip Primary
The attacker's goal: convince a CS rep to (a) initiate a merge they should not, or (b) flip the primary designation.
Flow 1 (Hybrid):
Two social engineering surfaces: - Initiation: Attacker calls/emails CS claiming to be Account A's holder and asks for a merge with Account B. CS cannot verify identity at this point — the design relies on the subsequent cross-verification emails. If CS initiates, both Account A and Account B receive emails. If the attacker controls neither inbox, they cannot verify. If they control one inbox, see 2.1. The cross-verification email to both accounts is the defense here. - Primary flip: Attacker calls CS asking to flip the primary before codes are consumed. CS calls the Console swap-primary button. Attacker convinces CS that "Account B should be primary because that's where my real trading history is."
The primary-flip via CS social engineering does not require the attacker to have email access at all — it only requires convincing a CS rep. The mitigations:
- CS should require a pre-established out-of-band verification step before performing a primary swap. This is a policy control, not a technical one, but it should be documented as a required CS procedure.
- The architecture doc requires
initiated_by_csbe stored (as a hash) and all state transitions audit-logged. The primary swap must be an explicit audit event. - If the primary is flipped after initiation, an email notification to both accounts should fire immediately ("The primary account for your pending merge has been updated").
Flow 2 (CS-only):
Every primary designation is a CS action. This means social engineering CS on primary selection is the only attack vector for primary manipulation — but it applies to every merge, not just a subset. The attack is identical to Flow 1's primary-flip path. The difference is that Flow 2 provides no customer-side defense (no customer swap, no customer-visible notification that they can trigger before the lock).
Flow 2 is not meaningfully more or less vulnerable to CS social engineering than Flow 1 on this specific vector. It simply removes the customer-swap affordance that Flow 1 provides.
Flow 3 (Customer-choice during verification):
CS initiates the merge but cannot enforce the primary. A social engineer who wants to flip the primary to their account needs only to authenticate as that account and pick their account as primary during verification — no CS manipulation required. CS social engineering as a vector is actually reduced in Flow 3 for primary selection, but this is because the attack vector moves to the customer layer.
2.3 Customer-Initiated Abuse — Bad Actor with Legitimate Access to One Account
The attacker's goal: absorb a victim's account into their own by triggering a merge.
This scenario requires CS as an intermediary — customers cannot initiate merges. The attacker must convince CS to initiate the merge in the first place. This is covered by 2.2. The cross-verification requirement means the victim receives an email when the merge is initiated and must supply a code from the other account's email. The victim can simply not verify, and the merge stalls.
The real risk here is a variant: the attacker IS a legitimate user of Account B, and they want to steal Account A's data by absorbing it into their merge. They:
- Call CS and say "I have two accounts, please merge them." CS initiates.
- Victim (real Account A holder) receives a merge-initiated email. If the victim has been phished or their email is compromised, they may verify without understanding. More realistically, a sufficiently sophisticated attacker could craft a pretext: "my partner is going to verify from their device."
Mitigations that must hold across all flows:
- The merge-initiated email to Account A must clearly state: "You received this because a merge was requested between your account and [Account B email]. If you did NOT request this, click here to cancel." A conspicuous "I did not request this" CTA in the email is the primary customer defense. This is not currently called out explicitly in the architecture doc.
- The merge must not complete until both parties verify. No CS shortcut to "verify on behalf of customer."
2.4 Race Conditions — Timing the Code Entry Window
The hybrid swap-then-verify race:
In Flow 1, the swap window closes after the first code is consumed. The race condition:
- Account A's user verifies (consumes code_B).
primary_verified_atis set. - Simultaneously, attacker calls
swap-primary.
If the lock check is: "set primary_swap_requested = true if no code has been consumed yet," and Account A's verification and the swap call arrive at the same time, two outcomes are possible depending on transaction ordering:
- Verification commits first: swap fails with 409 "primary is locked." Correct behavior.
- Swap commits first: primary is flipped, then verification proceeds with the new primary. Potentially attacker's preferred outcome.
The architecture doc does not explicitly describe the transactional relationship between swap-primary and verify. If these are not in the same serializable transaction, the race window exists.
Implementation requirement to close this race:
swap-primary and the first verify call must both lock the account_merges row with SELECT FOR UPDATE or equivalent. The swap must atomically check primary_verified_at IS NULL AND secondary_verified_at IS NULL and either succeed or fail with no gap. If using Postgres, this is a single UPDATE with a WHERE clause and a RETURNING check:
UPDATE account_merges
SET primary_user_id = $new_primary,
secondary_user_id = $new_secondary,
primary_swap_requested = true
WHERE id = $merge_id
AND primary_verified_at IS NULL
AND secondary_verified_at IS NULL
AND status = 'initiated'
If the UPDATE affects 0 rows, the swap is rejected. This is atomic — no separate lock step needed. The first verify call uses the same row-level locking pattern.
The resend-then-verify race:
When a CS operator resends a code (minting a new code and discarding the old hash), there is a window where: 1. Customer submits old code (valid in transit, not yet discarded). 2. CS resend fires, old hash is overwritten. 3. Customer's old code fails verification despite being valid at send time.
This is not an attack vector but is a UX failure mode. The resend should atomically swap the hash in the same DB transaction. More importantly: if an attacker can trigger a resend (by convincing CS) while a legitimate code is in-flight, they can force a verification failure that extends the merge window and potentially buys time for a social engineering attempt.
Mitigation: Resend should only be callable by CS if the code has not yet been consumed (verified_at IS NULL). Resend on an already-verified code is a no-op with 409.
The in_progress state race (D2/D3 cross-concern):
Once status transitions to in_progress, the merge engine begins row re-keying. The 90-day soft-delete window (D2) starts from this point (or from merge_completed_at). If an attacker can somehow pause the merge at in_progress (through a crafted transaction failure), the secondary account's deleted_at is not set yet. If the attacker then triggers a D3 reversal from this inconsistent state, the merged data is partially migrated.
The architecture doc specifies that mid-flight failures roll back fully (Section 6.3) and status goes to failed. This should be verified in implementation: the transaction must either complete or fully roll back. There is no valid in_progress terminal state — if the transaction fails, the status must transition to failed, not remain in_progress.
2.5 Insider Threat from CS — Rogue CS Rep
The attacker is a CS rep who initiates merges to harvest data associations or facilitate credential stuffing.
What a rogue CS rep can do:
- Initiate merges between real customer accounts (both parties receive emails — detection opportunity).
- Observe
collision_dataJSONB contents (PII: display name, alt email, phone, mailing address of both accounts). - If CS can view code hashes or the plain verification codes: break the cryptographic trust model entirely. The architecture doc stores codes as argon2 hashes — CS must never see plaintext codes.
- Access the merge record including
initiated_by_cs(stored as SHA-256 hash of the CS operator's email — this allows attribution but not enumeration of targets).
What a rogue CS rep cannot do (design must enforce):
- Enter verification codes on behalf of either customer. The customer-facing
/merges/{id}/verifyroute must require a session belonging to the account whose code is expected. CS Console routes must not have a "verify on behalf of" pathway. This is the most critical insider-threat boundary. - See plaintext codes. The argon2 hash storage is correct. The Postmark delivery goes directly to the customer's email — CS does not receive a copy. This must hold.
Detection hook for insider threat: Every POST /internal/merges (initiation) must log the initiating CS operator's identity. If a single CS account initiates >N merges per day, or initiates merges where neither account matches an open FreeScout ticket, this is a detection signal. The detection-engineer's parallel output should include this signal shape.
Four-eyes principle for reversal (D3 interaction):
The architecture doc specifies reversal is "CS-only" but does not specify that the reversing CS user must differ from the initiating CS user. A rogue CS rep who initiates a fraudulent merge should not also be able to reverse it (to cover their tracks) or to complete-then-reverse (to harvest data associations between the two accounts without leaving a lasting merge artifact).
This is a gap in the locked D3 design. See Section 9.
2.6 Reversal Abuse — Attacker Reverses a Legitimate Merge
D3 is locked at 14-day, CS-only reversal. The attacker's goal here is to abuse the reversal to their advantage.
Reversal as data exfiltration:
A bad actor who caused a merge to complete can request a reversal within 14 days via CS social engineering. Reversal re-forks data to the secondary user_id. This restores both accounts but does NOT un-do the fact that the attacker observed both accounts' data during the in_progress → completed → reversed sequence. The reversal does not scrub the attacker's knowledge of the merged data set.
Reversal as disruption:
A legitimate merge is completed by two genuine users. An attacker who later compromises one email inbox calls CS and claims the merge was unauthorized. CS initiates a reversal. The real user's merged-account state is disrupted.
Mitigation: Reversal should require the same cross-verification step as initiation — both email addresses must confirm the reversal before it executes. The current D3 design requires only CS action with no customer confirmation for reversal. This is a gap. See Section 9.
D3 / D2 interaction — reversal during soft-delete window:
D2 specifies soft-delete for 90 days, tombstone after. D3 specifies 14-day reversal. These are compatible: within 14 days, deleted_at is set but tombstoned_at is not, and reversal can clear deleted_at. After 14 days, reversal is impossible (409 from the endpoint). After 90 days, tombstone fires. These windows are correct and do not conflict.
However: the architecture doc says reversal "re-forks all MERGE-classified rows back to the secondary user_id." It does NOT address ASK_USER rows where the customer chose to DISCARD data from the secondary ("It does not rebuild data that was classified ASK_USER if the customer chose to discard it"). An attacker who triggered a merge (social engineering) and caused a customer to discard their own data via the ASK_USER flow, then triggers a reversal, leaves the customer with irreversible data loss. The merge reversal is not a full restore. This is a customer-trust issue, not a security vulnerability per se, but it means reversal is not a true "undo" and customers should be told so explicitly at reversal time.
2.7 Tombstone Abuse — Re-Creating an Account at the Secondary Email Post-Tombstone
D2: secondary account is soft-deleted for 90 days, then tombstoned. The user_redirects table persists forever.
The attack:
- Merge completes. Secondary account is soft-deleted (
deleted_atset,deleted_reason = 'merged'). - After 90 days, tombstone fires. PII columns are nulled.
user_idrow persists for FK integrity. - An attacker (or the original secondary account holder, or a new user who happens to use the same email address) attempts to register a new Raxx account with the secondary account's email address.
- The email address is no longer visible in the
userstable (it was nulled at tombstone time). - If email uniqueness is checked against the live
emailcolumn only, a new account could be created at the same email address. This new account would NOT be caught by theuser_redirectsmiddleware (which matches onfrom_user_id, not on email).
Conditions for this to be a meaningful attack:
- The tombstone process must null the email column (by design, per D2).
- The registration flow must not check tombstoned accounts.
- The attacker must know the old secondary email address (or it is the attacker's own address).
Risk: Low for a bad actor (they cannot re-claim the merged account's data, which lives on the primary now). Higher as a data-integrity issue: a legitimate user whose account was merged may try to re-register with their old email address after 90+ days if they have forgotten about the merge. They would create a new clean account and lose all history.
Mitigation: The tombstone process should NOT null the email column (or should maintain a separate tombstoned_emails table) to allow email uniqueness enforcement to continue working. Alternatively, tombstone should set a flag (tombstoned = true) and the registration flow must check email = $email AND tombstoned = false rather than just email = $email AND deleted_at IS NULL.
This is an architectural gap in the locked D2 design. See Section 9.
3. Invariant Analysis
3.1 "Both codes are required for any merge to complete"
How it holds: The state machine requires both primary_verified_at IS NOT NULL AND secondary_verified_at IS NOT NULL before transitioning from initiated to verified. The merge engine only runs from verified status.
How an implementation mistake breaks it:
- If the status check at merge-engine dispatch is
ORinstead ofAND: one verification would trigger the merge. This is a logic error in the state machine transition guard, not in the schema. - If the merge can be triggered by a privileged API call (e.g., a CS-only endpoint that forces
status = 'verified'): the two-code requirement is bypassed at the CS layer. The design must not include any endpoint that allows CS to force a status transition pastverifiedwithout both verifications. - If resend races (see 2.4) cause one
verified_atto be set and then cleared (e.g., if resend incorrectly resetsverified_at): a single verification would restart the process. Resend must not touch*_verified_atcolumns under any circumstances.
Verification point: The merge-engine dispatch query should be:
SELECT * FROM account_merges
WHERE id = $id
AND status = 'initiated'
AND primary_verified_at IS NOT NULL
AND secondary_verified_at IS NOT NULL
If the merge engine uses any looser check, it is exploitable.
3.2 "Once a code is consumed, the primary is locked" (Hybrid invariant)
How it holds: primary_swap_requested is the swap flag. The swap endpoint checks primary_verified_at IS NULL AND secondary_verified_at IS NULL. Once either is non-null, swap fails.
How an implementation mistake breaks it:
- If the swap endpoint checks only
primary_verified_at IS NULLand ignoressecondary_verified_at: an attacker who has already caused the secondary verification to complete can still swap the primary. - If there is a TOCTOU (time-of-check-time-of-use) gap between the swap endpoint's check and its update: the race described in 2.4 applies.
- If the swap is implemented as two separate operations (read: check no
verified_at; write: update primary) instead of a single atomicUPDATE ... WHERE ... RETURNING: race window exists.
Required invariant: The swap endpoint must use the pattern:
UPDATE account_merges
SET primary_user_id = $new_primary_id,
secondary_user_id = $new_secondary_id,
primary_swap_requested = TRUE
WHERE id = $merge_id
AND status = 'initiated'
AND primary_verified_at IS NULL
AND secondary_verified_at IS NULL
RETURNING id
If RETURNING yields no row, the swap is rejected.
3.3 "Codes are argon2-hashed at rest"
The architecture doc specifies argon2. This must be verified in the migration and in the code path.
Verification points:
- primary_code_hash and secondary_code_hash columns store argon2 hashes.
- At no point in the flow is a plaintext code written to account_merges, to the audit log, to collision_data, or to any log output.
- The verify endpoint uses argon2 verify (not comparison of a re-hashed value) to resist timing attacks.
- Resend must generate a new random code, argon2-hash it, and overwrite the old hash column — never log or store the plaintext.
One implementation subtlety: argon2 has a configurable work factor. The code generator and verifier must use the same work factor and salt length as the passkey backup code flow (if one exists) or define its own explicit parameters. A weak argon2 configuration (e.g., low iteration count, small memory cost) reduces the brute-force resistance of the 8-character code space. An 8-character alphanumeric code has ~2.8 trillion combinations — brute-force is infeasible if argon2 parameters are appropriately set (recommended: argon2id, t=2, m=65536, p=2).
3.4 "Reversal requires a different CS user than the initiator" (four-eyes principle)
Current design status: NOT specified. The architecture doc says "CS-only, audit-logged" for D3 but does not require the reversing CS user to differ from the initiating CS user.
Why this matters: An insider threat who initiates a fraudulent merge and then reverses it covers the primary observable trace (the completed merge). The reversal audit log still exists, but if the same operator initiated and reversed, it looks like a self-correcting error. Four-eyes would require a second CS user to approve the reversal.
This is a gap in the locked D3 design. Recommendation: require reversal_initiated_by != initiated_by_cs at the API layer. If only one CS user exists (current v1 state), reversal requires operator-level approval (a second console session with customers:merge:write). This is documented as an open D3 issue in Section 9.
3.5 "CS cannot verify on behalf of the customer"
Current design analysis: The customer-facing /merges/{id}/verify route requires the caller's session to belong to either primary_user_id or secondary_user_id. CS Console routes proxy to internal Raptor endpoints, not to the customer-facing routes.
The gap: The architecture doc does not explicitly prohibit a CS internal endpoint that transitions status directly from initiated to verified, bypassing the cross-verification requirement. As long as no such endpoint exists, the invariant holds by design omission. But the design must explicitly state: no internal endpoint may advance account_merges.status past initiated without both primary_verified_at and secondary_verified_at being set by the customer-facing verify route.
If POST /internal/merges/{id}/cancel is the only internal mutation on initiated-state merges (besides initiation itself), this is satisfied. But if a future "force complete" capability is added for stuck merges, it would break this invariant.
4. RBAC and Audit Posture
4.1 Minimum-Privilege Role Shape
The architecture doc specifies customers:merge:write for all mutating operations. This is a single permission bit that covers initiation, cancellation, and reversal. For minimum privilege and audit clarity, these should be separate permission bits:
| Operation | Recommended permission | Rationale |
|---|---|---|
| Initiate a merge | customers:merge:initiate |
Lower sensitivity — triggers customer email; cannot complete without customer action |
| Cancel a merge | customers:merge:cancel |
Similar sensitivity to initiate; CS fixes their own mistake |
| Reverse a completed merge | customers:merge:reverse |
Higher sensitivity — modifies completed data. Should be a separate bit with a higher role requirement |
| View merge records | customers:merge:read |
Already separate in architecture doc |
| Swap primary (if CS-triggered) | customers:merge:swap-primary |
If swap is CS-only (as recommended), needs its own bit for auditability |
The minimum separation that matters for security: customers:merge:reverse must require a different, higher privilege bit than customers:merge:initiate. A support agent who can initiate merges must not automatically be able to reverse them.
4.2 Can the Initiating CS User Enter Codes on Behalf of Customers?
The architecture doc establishes that /merges/{id}/verify requires the caller's session to belong to primary_user_id or secondary_user_id. CS sessions belong to console_admins, not users. Therefore, a CS session cannot call the customer-facing verify endpoint at all — the session ownership check structurally prevents it.
This invariant holds as designed, but must be explicitly tested. If the session middleware resolves primary_user_id lookup against both the users table and the console_admins table, a CS user could masquerade. The middleware must check against users only.
Required: an explicit integration test that asserts a CS console session token receives 403 when calling POST /merges/{id}/verify. This test must be in the merge feature's test suite and must not be removed.
4.3 Audit-Log Granularity Required
For a security review of any single merge to be completable in under 5 minutes, the audit trail must answer:
| Question | Required audit event | Event fields |
|---|---|---|
| Who initiated? | merge.initiated |
merge_id, cs_operator_id_hash, primary_user_id, secondary_user_id, freescout_ticket_id, timestamp |
| Was a swap requested? By whom? When? | merge.primary_swapped |
merge_id, actor_type (cs or customer), actor_id, old_primary_id, new_primary_id, timestamp |
| Who verified first? | merge.primary_verified |
merge_id, verifying_session_user_id, timestamp |
| Who verified second? | merge.secondary_verified |
Same |
| Did any resend happen? | merge.code_resent |
merge_id, resending_cs_operator_id_hash, account_side (primary/secondary), resend_count_after, timestamp |
| When did merge execute? | merge.engine_started, merge.engine_completed |
merge_id, tables_touched, rows_rekeyed_count, timestamp |
| Was there a reversal? Who initiated it? Who approved (if four-eyes)? | merge.reversal_initiated, merge.reversal_approved, merge.reversal_completed |
merge_id, actor_id, approver_id, timestamp |
| Did any verification attempt fail? | merge.verification_failed |
merge_id, attempt_number, session_user_id, failure_reason, timestamp |
| Was a cancellation triggered? | merge.cancelled |
merge_id, cancelled_by_actor_type, cancelled_by_id, reason, timestamp |
Events merge.initiated through merge.verification_failed are the signals the detection-engineer's parallel output will monitor. These events must exist in customer_audit_events (not just in account_merges row fields) so the audit trail is immutable and queryable via the audit log UI.
Critical ordering rule: Audit events must be written BEFORE the state transition commits. If the state transition succeeds and the audit write fails, the state change has no audit trail. Use a single database transaction that includes both the state update and the audit insert. If the audit insert fails, the transaction rolls back and the state change is rejected with a 500.
5. Per-Flow Security Posture Comparison
Attack Surface Rankings
Flow 1 — Hybrid (proposed design)
Attack surfaces:
- POST /merges/{id}/swap-primary customer-callable path (if not restricted to CS-only)
- Swap-verify race window (between swap call and first code consumption)
- CS social engineering on swap request
- Code resend timing (extending the attack window up to 5×24h)
Residual attack surface with recommended mitigations applied (swap restricted to CS-only, atomic lock, no CS verify-on-behalf): Reduces to CS social engineering only, same as Flow 2.
Without mitigations (as currently specified in the architecture doc): swap-primary is customer-callable, creating a one-inbox-compromise primary-flip path.
Flow 2 — CS-only
Attack surfaces: - CS social engineering on initiation and primary designation (only attack surface) - DoS-on-merge via code non-response (inbox compromise blocks merge indefinitely)
No customer-influenceable primary flip. Smallest inherent attack surface. CS social engineering is the primary vector for all flows.
Flow 3 — Customer-choice during verification
Attack surfaces: - Customer-callable primary selection during active code-entry window - Single-inbox compromise + customer cooperation allows primary flip even if attacker only controls one inbox and Social Engineers Account A's user into cooperating - Primary selection and code consumption happen in the same authenticated session window, which means a session compromise on either account during verification gives an attacker both verification power AND primary selection power simultaneously
The specific attack that justifies rejecting Flow 3:
In Flow 3, the verification session is the same session window in which the customer selects the primary. An attacker who has achieved session hijacking of Account B (not just email compromise — session token theft, e.g., via XSS or stolen device) during the active verification window can:
- Call
POST /merges/{id}/verifyas Account B to consume code_B (from B's email, which the attacker has access to from the session). - In the same session, designate Account B as primary.
- Wait for Account A's user to verify (who is acting in good faith).
- Merge completes with B as primary.
The attack does not require persistent session compromise — only during the 24-hour verification window. In Flow 1, a session compromise during this window does NOT allow primary flip if swap-primary is CS-only. In Flow 3, it does.
Flow 3 should be rejected. The coupling of verification power and primary-selection power in a single customer session window creates a broader exploitable window than either alternative. This is not compensatable within the Flow 3 design without fundamentally separating the primary-selection step from the verification step, which would make it equivalent to Hybrid.
Per-Flow Ranking
| Rank | Flow | Basis |
|---|---|---|
| 1 (smallest) | CS-only | Attack surface limited to CS social engineering; no customer-influenceable primary designation |
| 2 | Hybrid (with mitigations applied) | Equivalent to CS-only after swap is restricted to CS-only and atomic locking enforced; the swap step adds one additional CS social-engineering surface but no direct customer attack surface |
| 3 (largest) | Customer-choice during verification | Session-compromise-during-window enables primary flip without separate social engineering; verification power and selection power co-reside in one session |
6. Mandatory Security Boundaries (Regardless of D1 Choice)
M1 — Both codes required; no CS bypass to verified state
No internal endpoint may set status = 'verified' or transition to in_progress without both primary_verified_at and secondary_verified_at being set by the customer-facing verify route.
M2 — CS cannot call the customer-facing verify endpoint
Session middleware for /merges/{id}/verify must check user_id IN (primary_user_id, secondary_user_id) against the users table only, not console_admins. Enforced by explicit integration test.
M3 — Swap is CS-only (applicable to Flow 1 if chosen)
POST /merges/{id}/swap-primary must be an internal Console endpoint, not a customer-facing route. The architecture doc currently exposes it as customer-facing. This must be corrected before implementation.
If the operator chooses Flow 3 (not recommended), this invariant is vacuously satisfied (there is no swap endpoint). But Flow 3 should be rejected for the reasons in Section 5.
M4 — Atomic swap with row-level lock
Swap and verify both use UPDATE ... WHERE ... RETURNING with all guard conditions in the WHERE clause. No TOCTOU gap.
M5 — Audit events written in the same transaction as state changes
Audit inserts are not fire-and-forget. If the audit insert fails, the state change rolls back. No state change exists without an audit trail.
M6 — Argon2id with explicit parameters
Codes stored as argon2id with t≥2, m≥65536, p≥2. Implementation must specify these parameters explicitly, not use library defaults. Plaintext codes must never appear in logs, audit events, or database columns other than at the moment of generation (in memory only).
M7 — No-merge email carries a clear cancel CTA
The merge-initiated email to both accounts must include a one-click "I did not request this — cancel this merge" link. This is a signed time-limited token (not an auth credential) that transitions the merge to cancelled. The token expires when the verification codes expire (24h). This is a required customer-trust element, not optional.
M8 — Code verification direction is enforced at the session layer
When Account A's session calls /verify, the submitted code is verified against secondary_code_hash (Account B's code), not primary_code_hash. The session-to-hash routing must be explicit:
session.user_id == primary_user_id → verify submitted code against secondary_code_hash
session.user_id == secondary_user_id → verify submitted code against primary_code_hash
If this routing is reversed or ambiguous, the cross-verification model breaks.
7. Customer-Trust UX Requirements
7.1 Cancel Path for "I Did Not Initiate This"
Status: Not explicitly in scope for v1 per the architecture doc. Must be in scope.
The merge-initiated email goes to both accounts. The customer must have a path to cancel without needing to call CS. CS-only cancellation for this scenario creates a social-engineering surface (attacker calls CS to block a legitimate cancellation attempt).
Required: A signed cancel token in the merge-initiated email (per M7 above) that transitions the merge to cancelled when any account holder clicks it, regardless of which account. Both parties can cancel; first cancellation wins.
The cancel token must not be an auth credential. It is a single-use HMAC-signed URL token that references the merge record. It does not log the user into their account or grant any session. It only cancels the merge.
7.2 24-Hour Cooling-Off Window
Recommendation: No. Adding a 24-hour delay between initiation and first code-acceptance would extend the attack window (more time for an attacker to intercept emails, more time for social engineering), not reduce it. The cooling-off concept provides customer benefit only if customers are expected to receive merge-initiated emails and not act on them promptly. For account-merge the right model is: act promptly or cancel; don't act within 24h and the codes are still valid but a warning email fires at 12h.
A 12-hour reminder email ("Your merge is still pending — verify or cancel here") with a cancel CTA achieves the customer-trust goal without extending attack windows.
7.3 Customer Revocation of Merge-In-Progress (Pre-Completion)
As noted in 7.1: either account holder can cancel before any code is consumed (per the architecture doc: POST /merges/{id}/cancel available while status is initiated). After the first code is consumed, the architecture doc does not provide a cancel path — only D3 reversal post-completion.
Gap: Once one party has verified but the other has not, neither party can cancel. The merge is in a partial-verification limbo. If the non-verified party has changed their mind, they can simply not verify (the merge will expire after 24h). This is acceptable for v1, but should be documented: "Not verifying by code expiry time is equivalent to cancellation."
However: if an attacker has verified as one party and is waiting for the other party to verify (which will complete the merge), the other party currently has no active cancel path — only "don't verify." The merge-initiated email's cancel CTA (M7) must remain active and functional even after one party has verified, allowing the non-verified party to explicitly cancel. This should be tested.
8. Required Audit Event Additions for Detection-Engineer Coordination
The detection-engineer's parallel work will define the monitoring payload shapes. This memo establishes that the following state transitions MUST exist as discrete events for them to hook:
| Event name | Trigger | Minimum payload |
|---|---|---|
merge.initiated |
POST /internal/merges succeeds |
merge_id, cs_operator_id_hash, primary_user_id, secondary_user_id, ticket_id, initiated_at |
merge.code_sent |
Postmark send call for each code | merge_id, account_side, postmark_message_id, sent_at |
merge.verification_attempt |
Any call to POST /merges/{id}/verify |
merge_id, verifying_user_id, account_side, succeeded (bool), attempt_number, timestamp |
merge.primary_swapped |
Swap endpoint succeeds | merge_id, actor_type, actor_id, old_primary_id, new_primary_id, timestamp |
merge.code_resent |
Resend endpoint | merge_id, account_side, resend_count_after, cs_operator_id_hash, timestamp |
merge.both_verified |
Second verification completes | merge_id, primary_verified_at, secondary_verified_at, timestamp |
merge.engine_started |
Merge engine begins transaction | merge_id, timestamp |
merge.row_rekeyed |
Each table operation in merge engine | merge_id, table_name, row_count, policy, timestamp |
merge.engine_completed |
Merge transaction commits | merge_id, tables_touched_count, rows_rekeyed_total, timestamp |
merge.engine_failed |
Merge transaction rolls back | merge_id, error_detail, timestamp |
merge.cancelled |
Cancel endpoint succeeds | merge_id, cancelled_by_type (cs/customer/system), cancelled_by_id, cancel_reason, timestamp |
merge.reversal_initiated |
Reversal endpoint called | merge_id, reversing_cs_operator_id_hash, initiated_by_cs (original), is_four_eyes_satisfied (bool), timestamp |
merge.reversal_completed |
Reversal commits | merge_id, rows_restored_count, timestamp |
Coordination note: The detection-engineer should define the monitoring rules (e.g., "alert if merge.verification_attempt with succeeded=false occurs >5 times on a single merge") against these event shapes. The merge engine must emit all of the above; the detection rules are the detection-engineer's deliverable.
9. Architectural Issues Found in Locked Decisions D2 and D3
These issues were found in the course of this threat model. They interact with D1 but are not D1-specific. Both warrant reopening.
Issue D2-1 — Tombstone nulls email column; new-account-at-same-email registration gap
Severity: MEDIUM
Decision affected: D2 (locked)
As described in Section 2.7: after 90 days, the tombstone job nulls PII columns including email. A new registration attempt with the same email address after tombstone may succeed, creating a new clean account at an address that was previously merged. This is a data-integrity issue and a potential customer-confusion issue.
Recommendation: Tombstone should maintain a separate tombstoned_emails table (or retain the email column in hashed form) to allow email uniqueness enforcement to continue working post-tombstone. The registration flow should check tombstoned_emails and return an appropriate error ("this email was previously associated with a merged account — if you believe this is an error, contact support").
Reopening decision: Operator should confirm D2 is updated to add this constraint before implementation begins.
Issue D3-1 — No four-eyes requirement on reversal
Severity: HIGH
Decision affected: D3 (locked)
As described in Section 2.5: the architecture doc specifies "CS-only, audit-logged" for reversal but does not require a different CS user to approve the reversal. A rogue CS operator who initiates a fraudulent merge can also reverse it, leaving a reversal audit trail that looks like a self-correcting error.
Recommendation: D3 should require reversal_initiated_by != initiated_by_cs at the API layer. The reversal endpoint should return 403 if the requesting CS operator is the same as the initiating operator. For v1 single-operator environments: escalate reversal to the operator account (break-glass equivalent), not to the same CS role that initiated.
Reopening decision: Operator should confirm D3 is updated before implementation begins. This is a HIGH-severity finding because it is the primary technical control against CS insider fraud on the reversal path.
Issue D3-2 — No customer confirmation required for reversal
Severity: MEDIUM
Decision affected: D3 (locked)
D3 reversal is CS-only. If a legitimate merge was completed by both parties, a subsequent social-engineering attack on CS (attacker impersonates one party and claims the merge was unauthorized) can cause a reversal without the other party's knowledge or consent.
Recommendation: Reversal should send notification emails to both the primary and former secondary accounts before executing. The emails should include a 24-hour window in which either party can block the reversal. If neither party blocks, the reversal proceeds. This is analogous to the initiation cross-verification flow but for reversal. It adds complexity but protects against CS social engineering on the reversal path.
Alternatively (simpler): At minimum, the reversal notification email to both parties should fire BEFORE the reversal executes, not after. As-designed, the reversal executes and emails fire as completion notifications. The timing matters: a pre-reversal notification with a short delay gives customers a chance to contact CS if the reversal is fraudulent.
10. Summary Tables
Per-Flow Attack-Surface Ranking
| Rank | Flow | Key differentiating attack |
|---|---|---|
| 1 — smallest | CS-only | All primary designation is CS action; one social-engineering surface |
| 2 | Hybrid (with swap restricted to CS-only) | Same as CS-only after mitigation; M3 is mandatory |
| 3 — largest | Customer-choice during verification | Session-compromise-during-window enables primary flip + verification in one pass |
Five Most Impactful Mandatory Invariants
| # | Invariant | Enforcement mechanism |
|---|---|---|
| M1 | Both codes required; no CS bypass to verified | No internal endpoint advances status past initiated without both verified_at values set |
| M2 | CS cannot call customer verify route | Session middleware checks users table only; integration test enforces |
| M3 | Swap is CS-only | swap-primary is an internal Console endpoint, not customer-facing |
| M4 | Atomic swap with row-level lock | UPDATE ... WHERE ... RETURNING pattern; no TOCTOU |
| M5 | Audit events in same transaction as state changes | Audit insert failure rolls back state change |
Flows to Reject Outright
Flow 3 (Customer-choice during verification) — recommended for rejection.
Justification: session-compromise-during-verification-window grants simultaneous verification power and primary-selection power to an attacker. This cannot be mitigated without separating the selection step from the verification step, which makes it equivalent to Hybrid. The additional attack surface vs. Hybrid is material and not compensatable within the Flow 3 design.
D2/D3 Issues Warranting Reopening
| Issue | Severity | Decision | Recommendation |
|---|---|---|---|
| D2-1: Tombstone nulls email; new-account-at-same-email gap | MEDIUM | D2 | Maintain tombstoned_emails table or retained hashed email for uniqueness enforcement |
| D3-1: No four-eyes on reversal | HIGH | D3 | reversal_initiated_by != initiated_by_cs enforced at API layer |
| D3-2: No customer notification before reversal executes | MEDIUM | D3 | Pre-reversal notification email with short hold window |
This memo is a security-agent deliverable. It does not declare D1's winner — that decision belongs to the operator incorporating this output alongside the detection-engineer's parallel memo. The mandatory invariants in Section 6 apply regardless of which flow is chosen.