LibreChat/packages/data-schemas/misc/ferretdb/ferretdb-multitenancy-plan.md

# FerretDB Multi-Tenancy Plan

## Status: Active Investigation

## Goal

Database-per-org data isolation using FerretDB (PostgreSQL-backed) with horizontal sharding across multiple FerretDB+Postgres pairs. MongoDB and AWS DocumentDB are not options.

---

## Findings

### 1. FerretDB Architecture (DocumentDB Backend)

FerretDB with `postgres-documentdb` does **not** create separate PostgreSQL schemas per MongoDB database. All data lives in a single `documentdb_data` PG schema:

- Each MongoDB collection → `documents_<id>` + `retry_<id>` table pair
- Catalog tracked in `documentdb_api_catalog.collections` and `.collection_indexes`
- `mongoose.connection.useDb('org_X')` creates a logical database in DocumentDB's catalog

**Implication**: No PG-level schema isolation, but logical isolation is enforced by FerretDB's wire protocol layer. Backup/restore must go through FerretDB, not raw `pg_dump`.

### 2. Schema & Index Compatibility

All 29 LibreChat Mongoose models and 98 custom indexes work on FerretDB v2.7.0:

| Index Type | Count | Status |
|---|---|---|
| Sparse + unique | 9 (User OAuth IDs) | Working |
| TTL (expireAfterSeconds) | 8 models | Working |
| partialFilterExpression | 2 (File, Group) | Working |
| Compound unique | 5+ | Working |
| Concurrent creation | All 29 models | No deadlock (single org) |

### 3. Scaling Curve (Empirically Tested)

| Orgs | Collections | Catalog Indexes | Data Tables | pg_class | Init/org | Query avg | Query p95 |
|------|-------------|-----------------|-------------|----------|----------|-----------|-----------|
| 10   | 450         | 1,920           | 900         | 5,975    | 501ms    | 1.03ms    | 1.44ms    |
| 50   | 1,650       | 7,040           | 3,300       | 20,695   | 485ms    | 1.00ms    | 1.46ms    |
| 100  | 3,150       | 13,440          | 6,300       | 39,095   | 483ms    | 0.83ms    | 1.13ms    |

**Key finding**: Init time and query latency are flat through 100 orgs. No degradation.

### 4. Write Amplification

User model (11+ indexes) vs zero-index collection: **1.11x** — only 11% overhead. DocumentDB's JSONB index management is efficient.

### 5. Sharding PoC

Tenant router proven with:
- Pool assignment with capacity limits (fill-then-spill)
- Warm cache routing overhead: **0.001ms** (sub-microsecond)
- Cold routing (DB lookup + connection + model registration): **6ms**
- Cross-pool data isolation confirmed
- Express middleware pattern (`req.getModel('User')`) works transparently

### 6. Scaling Thresholds

| Org Count | Postgres Instances | Notes |
|-----------|-------------------|-------|
| 1–300     | 1                 | Default config |
| 300–700   | 1                 | Tune autovacuum, PgBouncer, shared_buffers |
| 700–1,000 | 1-2               | Split when monitoring signals pressure |
| 1,000+    | N / ~500 each     | One FerretDB+Postgres pair per ~500 orgs |

### 7. Deadlock Behavior

- **Single org, concurrent index creation**: No deadlock (DocumentDB handles it)
- **Bulk provisioning (10 orgs sequential)**: Deadlock occurred on Pool B, recovered via retry
- **Production requirement**: Exponential backoff + jitter retry on `createIndexes()`

---

## Open Items

### A. Production Deadlock Retry ✅
- [x] Build `retryWithBackoff` utility with exponential backoff + jitter
- [x] Integrate into `initializeOrgCollections` and `migrateOrg` scripts
- [x] Tested against FerretDB — real deadlocks detected and recovered:
  - `retry_4` hit a deadlock on `createIndexes(User)`, recovered via backoff (1,839ms total)
  - `retry_5` also hit retry path (994ms vs ~170ms clean)
  - Production utility at `packages/data-schemas/src/utils/retryWithBackoff.ts`

### B. Per-Org Backup/Restore ✅
- [x] `mongodump`/`mongorestore` CLI not available — tested programmatic driver-level approach
- [x] **Backup**: `listCollections()` → `find({}).toArray()` per collection → in-memory `OrgBackup` struct
- [x] **Restore**: `collection.insertMany(docs)` per collection into fresh org database
- [x] **BSON type preservation verified**: ObjectId, Date, String all round-trip correctly
- [x] **Data integrity verified**: `_id` values, field values, document counts match exactly
- [x] **Performance**: Backup 24ms, Restore 15ms (8 docs across 29 collections)
- [x] Scales linearly with document count — no per-collection overhead beyond the query

### C. Schema Migration Across Orgs ✅
- [x] `createIndexes()` is idempotent — re-init took 86ms with 12 indexes unchanged
- [x] **New collection propagation**: Added `AuditLog` collection with 4 indexes to 5 orgs — 109ms total
- [x] **New index propagation**: Added compound `{username:1, createdAt:-1}` index to `users` across 5 orgs — 22ms total
- [x] **Full migration run**: 5 orgs × 29 models = 88ms/org average (with deadlock retry)
- [x] **Data preservation confirmed**: All existing user data intact after migration
- [x] Extrapolating: 1,000 orgs × 88ms/org = ~88 seconds for a full migration sweep

---

## Test Files

| File | Purpose |
|---|---|
| `packages/data-schemas/src/methods/multiTenancy.ferretdb.spec.ts` | 5-phase benchmark (useDb mapping, indexes, scaling, write amp, shared collection) |
| `packages/data-schemas/src/methods/sharding.ferretdb.spec.ts` | Sharding PoC (router, assignment, isolation, middleware pattern) |
| `packages/data-schemas/src/methods/orgOperations.ferretdb.spec.ts` | Production operations (backup/restore, migration, deadlock retry) |
| `packages/data-schemas/src/utils/retryWithBackoff.ts` | Production retry utility |

## Docker

| File | Purpose |
|---|---|
| `docker-compose.ferretdb.yml` | Single FerretDB + Postgres (dev/test) |

---

## Detailed Empirical Results

### Deadlock Retry Behavior

The `retryWithBackoff` utility was exercised under real FerretDB load. Key observations:

| Scenario | Attempts | Total Time | Notes |
|---|---|---|---|
| Clean org init (no contention) | 1 | 165-199ms | Most orgs complete in one shot |
| Deadlock on User indexes | 2 | 994ms | Single retry recovers cleanly |
| Deadlock with compounding retries | 2-3 | 1,839ms | Worst case in 5-org sequential batch |

The `User` model (11+ indexes including 9 sparse unique) is the most deadlock-prone collection. The retry utility's exponential backoff with jitter (100ms base, 10s cap) handles this gracefully.

### Backup/Restore Round-Trip

Tested with a realistic org containing 4 populated collections:

| Operation | Time | Details |
|---|---|---|
| Backup (full org) | 24ms | 8 docs across 29 collections (25 empty) |
| Restore (to new org) | 15ms | Including `insertMany()` for each collection |
| Index re-creation | ~500ms | Separate `initializeOrgCollections` call |

Round-trip verified:
- `_id` (ObjectId) preserved exactly
- `createdAt` / `updatedAt` (Date) preserved
- String, Number, ObjectId ref fields preserved
- Document counts match source

For larger orgs (thousands of messages/conversations), backup time scales linearly with document count. The bottleneck is network I/O to FerretDB, not serialization.

### Schema Migration Performance

| Operation | Time | Per Org |
|---|---|---|
| Idempotent re-init (no changes) | 86ms | 86ms |
| New collection + 4 indexes | 109ms | 22ms/org |
| New compound index on users | 22ms | 4.4ms/org |
| Full migration sweep (29 models) | 439ms | 88ms/org |

Migration is safe to run while the app is serving traffic — `createIndexes` and `createCollection` are non-blocking operations that don't lock existing data.

### 5-Org Provisioning with Production Retry

```
retry_1: 193ms (29 models) — clean
retry_2: 199ms (29 models) — clean
retry_3: 165ms (29 models) — clean
retry_4: 1839ms (29 models) — deadlock on User indexes, recovered
retry_5: 994ms (29 models) — deadlock on User indexes, recovered
Total: 3,390ms for 5 orgs (678ms avg, but 165ms median)
```

---

## Production Recommendations

### 1. Org Provisioning

Use `initializeOrgCollections()` from `packages/data-schemas/src/utils/retryWithBackoff.ts` for all new org setup. Process orgs in batches of 10 with `Promise.all()` to parallelize across pools while minimizing per-pool contention.

### 2. Backup Strategy

Implement driver-level backup (not `mongodump`):
- Enumerate collections via `listCollections()`
- Stream documents via `find({}).batchSize(1000)` for large collections
- Write to object storage (S3/GCS) as NDJSON per collection
- Restore via `insertMany()` in batches of 1,000

### 3. Schema Migrations

Run `migrateAllOrgs()` as a deployment step:
- Enumerate all org databases from the assignment table
- For each org: register models, `createCollection()`, `createIndexesWithRetry()`
- `createIndexes()` is idempotent — safe to re-run
- At 88ms/org, 1,000 orgs complete in ~90 seconds

### 4. Monitoring

Track per-org provisioning and migration times. If the median provisioning time rises above 500ms/org, investigate PostgreSQL catalog pressure:
- `pg_stat_user_tables.n_dead_tup` for autovacuum health
- `pg_stat_bgwriter.buffers_backend` for buffer pressure
- `documentdb_api_catalog.collections` count for total table count