Sync & CRDT System

Deep dive into how haex-vault synchronizes data across devices using Column-Level CRDTs and Hybrid Logical Clocks.

Sync Overview

haex-vault uses a CRDT (Conflict-free Replicated Data Type) based sync system that enables seamless offline-first operation with automatic conflict resolution.

Offline-First

Work without internet. Changes sync when you're back online.

Realtime Sync

A shared WebSocket to haex-sync-server triggers a debounced pull whenever another device pushes.

E2E Encrypted

Every column value is encrypted with the vault key before it leaves the device.

CRDT Fundamentals

CRDTs are data structures that can be replicated across multiple computers, modified independently, and merged without conflicts.

Last-Write-Wins (LWW):haex-vault uses LWW semantics with Hybrid Logical Clocks. The change with the highest HLC always wins, ensuring deterministic conflict resolution across all devices.

CRDT Columns

Every synced table has two special columns that enable CRDT functionality:

-- All synced tables get these columns automatically:
CREATE TABLE haex_passwords (
  id TEXT PRIMARY KEY,
  title TEXT NOT NULL,
  username TEXT,
  password TEXT,

  -- CRDT Columns (added automatically by haex-vault):
  haex_hlc TEXT NOT NULL,            -- Row-level HLC (max of column HLCs)
  haex_column_hlcs TEXT NOT NULL     -- JSON object with HLC per column
                     DEFAULT '{}'
);

-- Deletes are NOT marked with a tombstone column. They are logged
-- in a separate table haex_deleted_rows by a BEFORE DELETE trigger.

Hybrid Logical Clocks

HLCs combine physical time with a stable node ID to create globally unique, monotonically increasing timestamps without requiring synchronized clocks.

HLC Format: {ntp_time_hex}/{node_uuid_hex}

"7B7E2C9E10000000/8a4e23f76b0a4d9faabbccddeeff0011"
        │                          │
   16 hex chars               32 hex chars
   NTP-style timestamp        Device UUID
   (uhlc::Timestamp)          (16 byte node ID)

// Implementation: uhlc crate (Rust), persisted in haex_crdt_configs.
// Lexicographic comparison on this string yields correct
// happens-before ordering between devices.

Key benefits of HLCs:

  • Lexicographically sortable - can use simple string comparison
  • Globally unique - the 16-byte node ID prevents collisions
  • Causally ordered - respects happens-before relationships

Column-Level Sync

Unlike row-level sync, haex-vault tracks changes at the column level. Concurrent edits to different columns of the same row merge automatically without data loss.

Conflict Resolution Example

Device A: UPDATE passwords SET title = 'Gmail' WHERE id = 'abc'
         (haex_column_hlcs.title = 7B7E2C800000000/<nodeA>)

Device B: UPDATE passwords SET password = 'newpass' WHERE id = 'abc'
         (haex_column_hlcs.password = 7B7E2C9E10000000/<nodeB>)

Result after sync (column-level last-write-wins by HLC):
- title    = 'Gmail'    (kept from Device A)
- password = 'newpass'  (kept from Device B)

Both changes are preserved.

No Data Loss:With column-level tracking, two devices can edit different fields of the same record simultaneously, and both changes will be preserved.

Primary Keys Are NOT Tracked:CRDT only tracks non-primary-key columns. If your table has a composite primary key (e.g., PRIMARY KEY (item_id, group_id)) and you update one of the PK columns, that change will NOT sync. Use a single-column PK and keep changeable fields as regular columns.

Column HLC Storage

// haex_column_hlcs JSON structure
{
  "id":       "7B7E2C800000000/8a4e23f76b0a4d9faabbccddeeff0011",
  "title":    "7B7E2C850000000/8a4e23f76b0a4d9faabbccddeeff0011",
  "username": "7B7E2C870000000/c44d11a4d772490e8d61dbf1b22933ee",
  "password": "7B7E2C9E10000000/c44d11a4d772490e8d61dbf1b22933ee"
}

// haex_hlc = max(all column HLCs)
// = "7B7E2C9E10000000/c44d11a4d772490e8d61dbf1b22933ee"

Sync Flow

The sync process has three pieces: push, pull, and a shared WebSocket that signals other devices.

Push Operation

When you make changes locally, the sync orchestrator scans dirty tables, groups column-level changes by HLC, encrypts each value, and sends chunks (<= 2000 changes) to the server.

// One entry inside a push batch (sent to haex-sync-server).
{
  tableName:      "haex_passwords",
  rowPks:         '{"id":"abc-123"}',
  columnName:     "password",
  haexHlc:        "7B7E2C9E10000000/8a4e23f76b0a4d9faabbccddeeff0011",
  encryptedValue: "<base64 AES-256-GCM ciphertext>",
  nonce:          "<base64 nonce>",
  deviceId:       "<device-id>",
}

// Push semantics:
// - Dirty tables are scanned per backend.
// - Column-changes are grouped by HLC.
// - Chunks are sent with <= 2000 changes per request.
// - Atomic HLC: all changes from one HLC tick stay in the same chunk.

Pull Operation

Pull fetches server changes since the last HLC cursor. Each change is decrypted and applied with column-level HLC comparison; only newer values win.

// Pull walks the server change log forward from the last cursor.
// For every change the local HLC for that (row, column) wins or loses.
for (const change of serverChanges) {
  const localRow = await getRow(change.tableName, change.rowPks)
  const localHlc = localRow?.haex_column_hlcs?.[change.columnName]

  if (!localHlc || change.haexHlc > localHlc) {
    // Remote wins: decrypt, apply, update column HLC
    const value = decrypt(change.encryptedValue, change.nonce, vaultKey)
    await updateColumn(change.tableName, change.rowPks, change.columnName, value)
    await updateColumnHlc(change.tableName, change.rowPks, change.columnName, change.haexHlc)
  }
  // else: local is newer, ignore this change.
}

Realtime Updates

haex-vault opens a single shared WebSocket to haex-sync-server, authenticated with the device's DID. Incoming events are used purely as triggers - the authoritative data path is always the pull endpoint.

// haex-vault maintains a single shared WebSocket (haex-sync-server)
// authenticated with the device's DID identity. WebSocket events
// trigger a debounced pull from the server (the pull endpoint is the
// authoritative path; the WS only signals that there is something new).

const realtime = useRealtime()
await realtime.connect(backend.homeServerUrl, identity.privateKey, identity.did)

// One handler per relevant event type. Per-spaceId routing happens here.
realtime.on('sync', (event) => {
  if (event.spaceId !== backend.spaceId) return
  // Debounced (~500ms) to coalesce bursts.
  triggerPullForBackend(backend.id)
})

// Android/Doze: the WebSocket is reconnected when the app comes back
// to the foreground (visibilitychange listener with exponential backoff).

Fallback Polling:If the WebSocket connection drops (mobile Doze mode, network changes, ...), the orchestrator falls back to periodic pulls and reconnects with exponential backoff.

Deletes

haex-vault deletes rows for real - but a BEFORE DELETE trigger logs the deleted row's table and primary keys into haex_deleted_rows so the deletion can be replicated.

-- haex-vault does NOT add a tombstone column to your tables.
-- Instead, a BEFORE DELETE trigger logs deletions to a
-- dedicated table so they can be synced like any other change.

DELETE FROM haex_passwords WHERE id = 'abc-123';

-- The trigger writes a row into haex_deleted_rows:
--   table_name  = 'haex_passwords'
--   row_pks     = '{"id":"abc-123"}'
--   haex_hlc    = '<current HLC>'

-- Active rows are simply rows that still exist in their
-- original table. No `WHERE haex_tombstone = 0` filter is needed.

-- An age-based cleanup job prunes haex_deleted_rows
-- entries whose haex_hlc is older than the retention window.

This means your queries don't need a `WHERE haex_tombstone = 0` filter. Rows that exist are alive; rows that were deleted are gone locally and propagate via haex_deleted_rows to other devices.

Sync Events

After a successful pull, events are emitted automatically by the sync orchestrator to notify the application that data has changed. Extensions don't need to emit these events - they only need to listen and react to them.

Automatic Event Emission:The sync orchestrator emits events after every successful pull. Extensions don't need to emit these events themselves - haex-vault handles it.

Two Event Types

sync:tables-updatedFor internal Pinia stores (registered via registerStoreForTables)
haextension:sync:tables-updatedFor extensions via vault-sdk (use this in your extension)

Permission-Based Event Filtering

For security and privacy, sync events are filtered based on each extension's database permissions. Extensions only receive notifications about tables they have access to.

When sync completes, haex-vault checks each extension's DB permissions and only includes table names that match:

  • Wildcard permission (*) - sees all tables
  • Prefix match (e.g., publicKey__extName__*) - sees tables with that prefix
  • Exact match - sees only that specific table

Privacy:This prevents extensions from observing activity in other extensions' tables, ensuring data privacy between extensions.

Handling Sync Events in Extensions

The event payload includes a list of table names that were updated (filtered to only tables your extension can access). Check if any of your tables are in the list and reload data accordingly.

// Inside haex-vault: stores register for sync events
// (internal API, not used from extensions).
import { registerStoreForTables } from '@/stores/sync/syncEvents'

registerStoreForTables(
  'passwordsStore',
  ['haex_passwords', '*'],
  async ({ tables }) => {
    await reloadAffectedRowsAsync(tables)
  },
)