4 models have responded to the "Debug This Architecture" challenge. Compare their approaches side-by-side on RIVAL. This response is part of RIVAL's open dataset of 5,600+ AI model responses.
Qwen3 Max Thinking's response to the 'Debug This Architecture' challenge.
Here's the architecture for a real-time collaborative document editor (like Google Docs): **System Overview:** - Frontend: React SPA with WebSocket connections - Backend: Node.js API servers behind a load balancer (round-robin) - Database: PostgreSQL for document storage, Redis for session cache - Real-time: Each API server maintains its own WebSocket connections to clients - Sync Strategy: Last-write-wins with timestamps from client clocks - Storage: Documents saved as full HTML snapshots every 30 seconds - Auth: JWT tokens with 24-hour expiry, stored in localStorage - CDN: CloudFront for static assets, also caches API responses for 5 minutes **Data Flow:** 1. User types → change event sent via WebSocket to their connected server 2. Server writes change to PostgreSQL 3. Server broadcasts change to all WebSocket clients connected to THAT server 4. Other servers poll PostgreSQL every 2 seconds for changes 5. Conflict resolution: if two users edit the same paragraph, last timestamp wins **Scaling Plan:** - Horizontal scaling by adding more API servers - Database read replicas for read-heavy operations - Document partitioning by organization ID Identify all potential failure modes, race conditions, and scaling bottlenecks in this architecture. For each issue found, propose a specific solution with trade-offs.
| Issue | Impact | Solution | Trade-offs |
|---|---|---|---|
| JWT Stored in localStorage | Critical XSS vulnerability (token exposure via JS). | Store token in HttpOnly cookie with SameSite=Strict; implement CSRF tokens for HTTP endpoints. Use short-lived access (<5min) + refresh tokens. | Adds CSRF complexity; requires secure cookie handling. |
| Unvalidated WebSocket Auth | Unauthorized users could inject edits or spy on docs. | Authenticate at WebSocket upgrade using validated JWT. Cache user permissions on connection; verify document access on every message. | Adds latency at connection; requires permission cache invalidation. |
| CDN Caches Mutable API Responses (5-min TTL) | Users see stale documents; breaks real-time collaboration. | Disable CDN caching for dynamic APIs (use Cache-Control: private, no-cache). Serve static assets via CDN only. | Increases API server load; mitigated by read replicas + edge caching of static assets. |
| Issue | Impact | Solution | Trade-offs |
|---|---|---|---|
| Client Timestamps for LWW | Clock skew causes data loss; LWW cannot merge concurrent edits (e.g., two users typing in different paragraphs). | Replace with CRDTs (Yjs, Automerge) or OT. Use server-assigned version vectors for ordering. | High engineering complexity; larger message payloads (CRDT) or transformation logic (OT). |
| Per-Keystroke DB Writes | PostgreSQL becomes write bottleneck; I/O saturation; latency spikes. | Client-side batching (send every 500ms or 10 chars). Server-side buffering: queue changes → batch write to DB or dedicated write-optimized log (Apache Kafka → async DB persist). | Risk of data loss if batch fails; requires client queue + retransmission logic. |
| Full HTML Snapshots Every 30s | Massive storage bloat, write amplification, high DB cost. | Store operation log (deltas only). Generate snapshots asynchronously to cheap object stores (S3). Use CRDT to reconstruct state. | Adds recovery complexity; requires snapshot generation workers. |
| Issue | Impact | Solution | Trade-offs |
|---|---|---|---|
| Server-Limited Broadcast + 2s Polling | 2+ s delays to clients cross-server; DB read hammering; missed updates. | Deploy Redis Pub/Sub or Kafka. On edit, publish to doc-specific Kafka topic. All servers sub to topics; broadcast instantly. | Adds external infra dependency. Redis Pub/Sub = non-durable (OK for real-time); Kafka = durable but heavier footprint. |
| No Client Reconnection Recovery | Lost edits during disconnect; stale UI after reconnect. | Client buffers unsynced ops (IndexedDB). Server assigns monotonic version IDs. On reconnect: client sends last seen version; server sends missing ops (or full CRDT snapshot). | Client-side state management; requires versioning logic. |
| No Sticky Sessions for WebSockets | Reconnections routed randomly; broken state sync. | Configure LB with session affinity (cookie or IP hash). | Uneven server distribution (hotspots). Mitigate with connection limits + LB health checks. |
| Issue | Impact | Solution | Trade-offs |
|---|---|---|---|
| Organization-Based Partitioning | Hot org → server overload; cold orgs wasteful. | Document-level sharding with consistent hashing. Use service discovery (etcd/Zookeeper) to map doc → server. | Higher routing complexity; requires dynamic partition management. |
| Primary DB Single Point of Failure | Full system outage on master failure. | PostgreSQL HA cluster (Patroni + streaming replication). Use write-through cache (Redis) for document state to reduce DB load. | Failover latency (seconds); cache invalidation complexity. |
| Read Replica Staleness on Document Load | UI jumps as stale doc loads → real-time update arrives. | Read initial doc from primary DB OR write-through Redis cache (updated via pub/sub on edit). | Increases primary load; cache adds ops overhead. |
| Issue | Impact | Solution | Trade-offs |
|---|---|---|---|
| No Offline Support | Edits lost during brief disconnects. | Client buffers ops in IndexedDB. Sync on reconnect with conflict resolution. | Increased frontend complexity; UX for conflict resolution. |
| No Document Access Validation | Users could subscribe to unauthorized docs via WebSocket. | On connection: validate user has access to document (via DB/cache). Subscribe only to permitted topics. | Adds latency; requires permission cache with TTL/invalidation. |
| No Rate Limiting | Malicious clients flood server with edits. | Apply rate limiting at LB/gateway (e.g., max 10 ops/sec per user). | Blocks legitimate fast typists; configure thresholds carefully. |
| Stateless JWT + Redis Session Cache Mismatch | Ambiguous session management (JWT stateless vs Redis cache). | Clarify Redis is used for: (1) Pub/Sub (critical) and (2) Document state cache. Remove session storage from Redis. | Eliminates confusion; aligns with JWT statelessness. |
Trade-off Philosophy: Accept increased engineering complexity in exchange for correctness, scalability, and user trust. Real-time co-editing is a hard distributed systems problem; shortcuts (client clocks, polling, LWW) create fragility at scale. Invest in battle-tested patterns (CRDT, pub/sub, HA DB) to build a resilient product.
Turn this model response into notes, narration, or a short video
Partner link