Technology Web Development

Deep Dive into Real-Time Frameworks: From REST to CRDTs

Building a collaborative AI whiteboard taught us everything about real-time communication - from basic polling to sophisticated conflict-free data structures. Here's the complete journey.

“Why do I have to refresh to see David’s changes?”

Sarah Kim slammed her laptop shut with more force than necessary. The 26-year-old Product Manager had cut her teeth at Google, where “seamless user experience” wasn’t just a buzzword—it was religion. She could spot UX friction from a mile away, and right now their prototype collaborative whiteboard, “MeetMind,” was a masterclass in everything that frustrated users.

“Maya!” she called across the open-plan office, her voice carrying the exasperation of someone who’d spent the last hour watching changes appear and disappear like magic tricks. “I just lost twenty minutes of sprint planning notes. Again.”

Maya Chen looked up from her MacBook, coffee cup suspended halfway to her lips. At 28, she’d already survived three startup acquisitions and had the particular look of a Senior Frontend Engineer who knew exactly what was wrong but hoped nobody would ask—the kind of engineer who could debug a race condition at 2 AM and still show up bright-eyed for standup. “The polling is killing us,” she admitted. “We’re hammering the API every 5 seconds and it’s still not fast enough.”

Alex Rodriguez spun around in his chair, abandoning the server logs that had been consuming his morning. The 32-year-old backend engineer had earned his gray hairs optimizing databases at scale-ups, and his instinct for performance bottlenecks was legendary—he could smell a memory leak from three terminal windows away. “Sarah’s team of 12 people is generating 144 requests per minute just checking for updates,” he said, adjusting his glasses with the precision of someone who’d spent the last two hours in htop. “And that’s before we even turn on the AI analysis features I’ve been building.”

This is the story of how three engineers went from basic REST APIs to building a production-ready real-time collaborative system that works seamlessly offline, handles voice streaming for AI-powered meeting analysis, and scales to hundreds of concurrent users. Along the way, they discovered that real-time isn’t just about speed—it’s about fundamentally rethinking how data flows between humans and machines.

What started as a simple polling problem would lead them through Server-Sent Events, WebSockets, conflict-free replicated data types, and eventually into the cutting edge of real-time voice analysis. Each solution solved one problem only to reveal three more.

Chapter 1: The Humble Beginning

When Everything Was Simple (And Slow)

Let me paint you a picture of MeetMind Version 0.1. Picture the most basic collaborative document you can imagine—Google Docs from 2006, but somehow worse.

Maya had started MeetMind the way every developer starts: with the RESTful CRUD pattern that feels as natural as breathing. Clean, predictable, and completely inadequate for real-time collaboration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// src/lib/api.js - The "simple" days
class DocumentAPI {
  async getDocument(docId) {
    const response = await fetch(`/api/documents/${docId}`);
    return response.json();
  }

  async updateDocument(docId, changes) {
    const response = await fetch(`/api/documents/${docId}`, {
      method: 'PUT',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify(changes)
    });
    return response.json();
  }
}

// The dreaded polling loop
setInterval(async () => {
  const latestDoc = await api.getDocument(currentDocId);
  if (latestDoc.updatedAt > lastKnownUpdate) {
    updateUI(latestDoc);
    lastKnownUpdate = latestDoc.updatedAt;
  }
}, 5000); // Poll every 5 seconds

This architecture worked beautifully in Maya’s localhost world of one. But then Sarah, with her product manager’s instinct for breaking things at scale, decided to do what product managers do best: invite her entire 12-person team to stress-test the prototype simultaneously during their weekly sprint planning session.

The Problems Started Immediately:

David Thompson, a 35-year-old freelance strategy consultant who’d made peace with Portland’s coffee shop WiFi being his primary office, was the first casualty. Working on project specifications between sips of his third espresso, he’d type a paragraph, save it, and watch it vanish into the digital void—only to reappear 5 seconds later like a ghost.

“This is completely unusable,” David’s voice crackled over the Slack call, barely audible over coffee shop chatter. “I’m trying to document requirements here, and I genuinely can’t tell if my changes are saved or if the app is having an existential crisis. My paragraphs keep doing this disappearing act.”

Lisa Wang, their 29-year-old UX Designer with an eye for detail that bordered on obsessive (she once redesigned a form seven times to shave off two clicks), was collaborating with Kevin in Tokyo on wireframe layouts and had her own horror stories. “Picture this: I move a button from the header to the sidebar. Feels good, right? Then—boom—five seconds later it teleports to wherever Kevin just moved it. We’re not collaborating, we’re fighting each other through the medium of UI elements.”

Maya measured the damage:

  • User Experience: 5-second delays killed the flow state completely
  • Server Load: 12 users × 12 requests/minute = 144 requests just for updates
  • Waste: 90% of those requests returned “no changes”
  • Battery Drain: Mobile users complained about rapid battery consumption
  • Race Conditions: Users would make conflicting edits during the 5-second gaps

But the nail in the coffin came when they added the AI features Alex had been working on.

The AI Analysis Disaster

Alex had built what he considered his magnum opus: an LLM integration that could analyze meeting content in real-time and provide insights. It could detect when conversations were getting heated, automatically identify action items, and suggest relevant documents from their knowledge base. The AI was brilliant—when it worked.

The problem was that processing a document through their AI pipeline took anywhere from 2-15 seconds depending on content length. And Alex, in his backend engineer wisdom, had made the classic mistake of blocking everything on the AI response.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# server.py - The blocking nightmare
@app.route('/api/documents/<doc_id>', methods=['PUT'])
def update_document(doc_id):
    # Update the document
    doc = update_document_in_db(doc_id, request.json)
    
    # Run AI analysis - THIS BLOCKS EVERYTHING
    ai_insights = analyze_document_with_llm(doc.content)  # 2-15 seconds!
    doc.ai_insights = ai_insights
    save_document(doc)
    
    return jsonify(doc.to_dict())

The first time Sarah’s team tried to collaboratively edit a document while AI analysis was running, the entire system had what could charitably be called a nervous breakdown.

“I clicked save and the page just… died,” Sarah explained the next morning, gesturing dramatically with her coffee mug. “Twelve seconds of nothing. Complete radio silence. Then suddenly everyone’s changes came flooding in at once like a dam burst, completely out of order, overlapping, conflicting. It was chaos.”

Maya pulled up Chrome DevTools to investigate, and what she saw in the Network tab made her stomach drop. Requests were queuing up like cars in a traffic jam, timing out, retrying, failing. The beautiful clean API she’d built had become a digital pileup.

The Async Awakening

Alex, staring at his server logs with the thousand-yard stare of a backend engineer whose system was melting down, had a moment of clarity.

“What if,” he said slowly, turning away from his monitor, “we just… don’t make users wait for the AI analysis?”

Maya looked up from her debugging session. “You mean…?”

“Decouple them. Save the document immediately, queue the AI analysis for later. Users get instant feedback, AI gets its time to think.”

It seems obvious in retrospect—the kind of solution that makes you wonder why you didn’t think of it earlier. But at the time, it felt revolutionary:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# server.py - Version 2: Basic async processing
import asyncio
from queue import Queue

ai_processing_queue = Queue()

@app.route('/api/documents/<doc_id>', methods=['PUT'])
def update_document(doc_id):
    # Update document immediately
    doc = update_document_in_db(doc_id, request.json)
    
    # Queue AI processing for later
    ai_processing_queue.put({
        'doc_id': doc_id,
        'content': doc.content,
        'version': doc.version
    })
    
    return jsonify(doc.to_dict())  # Return immediately!

# Separate background worker
def ai_worker():
    while True:
        task = ai_processing_queue.get()
        try:
            insights = analyze_document_with_llm(task['content'])
            update_ai_insights_in_db(task['doc_id'], insights, task['version'])
        except Exception as e:
            log_error(f"AI processing failed: {e}")

The improvement was dramatic. Document updates that used to take 12 seconds now returned in 200 milliseconds. Users could edit freely while AI analysis happened in its own background world, no longer holding the entire application hostage.

Sarah was thrilled. “Finally! I can actually type without the app having an existential crisis.”

But Alex’s solution created a new problem: users had no idea when AI analysis was complete. They’d make changes and then… wait. Check back in a few minutes. Maybe refresh the page.

Maya’s solution was pragmatic, if not elegant—more polling:

1
2
3
4
5
6
7
8
// Check for AI updates every 10 seconds
setInterval(async () => {
  const insights = await api.getAIInsights(currentDocId);
  if (insights && insights.version > lastKnownAIVersion) {
    displayInsights(insights);
    lastKnownAIVersion = insights.version;
  }
}, 10000);

This worked, technically. But Maya stared at her code with growing unease. They now had two different polling loops running constantly: one checking for document changes every 5 seconds, another checking for AI insights every 10 seconds. Her laptop fan was spinning like a helicopter, and users were still waiting up to 10 seconds to see their AI analysis results.

“We’re building a Rube Goldberg machine,” Maya muttered to herself. “There has to be a better way.”

The Real-Time Epiphany

The breakthrough came during one of Maya’s late-night debugging sessions. She was watching the Network tab, tracing through the polling requests, when she noticed something that made her sit up straight.

She could see the exact moment when AI analysis completed on the server. The database would update, server logs would cheerfully announce “Analysis complete for doc_123,” and the data would be sitting there, ready to go. But the user, blissfully unaware, would keep polling every 10 seconds like a patient person knocking on a door.

“We’re living in the computational stone age,” Maya said to the empty office, her voice echoing off the exposed brick walls. “The server knows when things happen. We have the information. Why are we making users play this guessing game?”

Alex, still hunched over his monitor debugging memory leaks, looked up. “What if we could just… tell them? Like, immediately?”

It was one of those beautifully simple questions that changes everything.

The next morning, Maya started researching Server-Sent Events (SSE). The concept was beautifully simple: instead of the client constantly asking “anything new?” the server could just announce “hey, something new happened!”

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
// The SSE revelation
const eventSource = new EventSource(`/api/documents/${docId}/stream`);

eventSource.onmessage = function(event) {
  const update = JSON.parse(event.data);
  
  switch(update.type) {
    case 'document_changed':
      updateDocument(update.document);
      break;
    case 'ai_analysis_complete':
      displayInsights(update.insights);
      break;
    case 'user_joined':
      showUserPresence(update.user);
      break;
  }
};

For the first time since starting MeetMind, document updates felt instantaneous. Sarah could type a sentence and watch it appear on David’s screen in real-time, even with his unreliable coffee shop WiFi. When AI analysis completed, insights would pop up immediately instead of waiting for the next polling cycle.

“This feels like magic,” Lisa said during their next team demo. “It’s like the app is actually alive.”

The Performance Revolution

The numbers told the whole story:

Before (REST + Polling) (estimate):

  • Average update latency: 2.5 seconds (polling interval / 2)
  • Worst case latency: 5 seconds
  • Server requests per minute: 144 (just for updates)
  • AI insight delivery: 5-15 seconds after completion
  • Battery impact: High (constant network activity)

After (REST + SSE) (estimate):

  • Average update latency: 200-400ms (network time only)
  • Worst case latency: 800ms
  • Server requests per minute: 24 (actual updates only)
  • AI insight delivery: 200-400ms after completion
  • Battery impact: Minimal (server pushes only when needed)

For two blissful weeks, Maya felt like she’d conquered real-time collaboration. Users were happy, AI insights flowed seamlessly, and the server wasn’t crying under load.

Then Sarah came to the weekly standup with a new request that would change everything.

“Hey Maya,” Sarah said, pulling up her laptop. “The instant updates are amazing, but I have a problem. Yesterday I spent 20 minutes polishing a section of our product spec, only to find out David had been editing the exact same section the whole time. We basically overwrote each other’s work without realizing it.”

David’s voice crackled over the video call from his Portland coffee shop. “Yeah, it’s weird. I can see Sarah’s changes appear instantly now, which is great. But I have no idea when she’s actually working on something. I can’t tell where she’s looking, what she’s thinking about, whether she’s about to edit the paragraph I’m editing. We’re like ghosts to each other until our changes materialize.”

Maya suddenly realized the fundamental limitation of their Server-Sent Events approach: it was a one-way street.

Chapter 2: The WebSocket Revolution

The Invisible Problem

Three weeks after deploying SSE, Maya was feeling pretty good about herself. The performance metrics were stellar, users were happy, and Sarah had stopped complaining about refresh delays. But then Sarah raised her hand during the weekly standup.

“Hey Maya, I love that I can see David’s changes instantly now, but… I have no idea when he’s actually working. Yesterday I spent 20 minutes editing a paragraph, only to discover David had been working on the same section. We basically overwrote each other’s work.”

David nodded from his video call, coffee shop noise echoing in the background. “Yeah, and when I’m trying to reference something Sarah wrote, I can’t tell where she’s looking or what she’s focused on. It feels like we’re working in parallel universes that occasionally sync up.”

SSE was perfect for the server broadcasting “hey, something changed!” But when users wanted to send lightweight signals back—cursor positions, typing indicators, “I’m looking at this section”—they were stuck. SSE couldn’t handle client-to-server communication at all, and making full HTTP requests for every cursor movement would be ridiculous.

What they needed was presence: the ability to see cursors moving in real-time, know when someone was typing, understand who was looking at what section of the document. All the subtle social cues that make in-person collaboration natural.

“We need bidirectional communication,” Maya announced to Alex, gesturing at her whiteboard covered in message flow diagrams. “SSE solved the server-to-client problem beautifully, but we need the client-to-server piece too. And not with HTTP requests—those are too heavy for cursor positions.”

Enter WebSockets

Alex looked up from his monitor, where he’d been hunting down a memory leak that was eating 200MB of RAM every hour. “WebSockets?”

“Exactly!” Maya’s eyes lit up with the particular enthusiasm of an engineer who’d just seen the solution. “Think about it—one persistent connection per user. We keep all the instant server-to-client updates we love from SSE, but now users can send lightweight events back. Cursor positions, typing indicators, selection changes, presence info. No HTTP overhead, no request/response dance, just pure bidirectional real-time communication.”

Alex saved his debugging session and spun his chair around to face her fully. “So we’re talking about a persistent connection that can handle both document updates and all the ephemeral presence stuff?”

“Exactly. It’s like SSE grew up and learned to talk back.”

The concept was elegant: establish a persistent connection when a user opens a document, then stream events bidirectionally until they close it. Unlike HTTP’s request-response pattern, WebSockets allowed true real-time conversation between client and server.

Maya spent the weekend in pure development mode, building a WebSocket-based collaboration layer from scratch:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
// src/lib/collaboration/websocket.js
class CollaborationSocket {
  constructor(docId, userId) {
    this.docId = docId;
    this.userId = userId;
    this.socket = null;
    this.handlers = new Map();
    this.connect();
  }

  connect() {
    const wsUrl = `wss://${window.location.host}/collab/${this.docId}?userId=${this.userId}`;
    this.socket = new WebSocket(wsUrl);
    
    this.socket.onopen = () => {
      console.log('Connected to collaboration socket');
      this.emit('presence', { 
        type: 'join', 
        cursor: null, 
        selection: null 
      });
    };

    this.socket.onmessage = (event) => {
      const data = JSON.parse(event.data);
      const handler = this.handlers.get(data.type);
      if (handler) handler(data);
    };

    this.socket.onclose = () => {
      console.log('Collaboration socket closed, attempting to reconnect...');
      setTimeout(() => this.connect(), 1000);
    };
  }

  emit(type, payload) {
    if (this.socket && this.socket.readyState === WebSocket.OPEN) {
      this.socket.send(JSON.stringify({ 
        type, 
        payload,
        timestamp: Date.now()
      }));
    }
  }

  on(type, handler) {
    this.handlers.set(type, handler);
  }
}

The Magic Moment

The first time Sarah and David tested the new WebSocket implementation felt like watching magic happen in real-time.

Sarah opened the document and started typing a sentence about user onboarding. On his screen in Portland, David watched her cursor move letter by letter, saw her select a word, watched her pause to think. When she highlighted a section, it lit up on his screen instantly.

“Holy shit,” David’s voice came through the Slack call, barely containing his excitement. “Sarah, I can see exactly where you’re looking. I can see you thinking. This feels like we’re sitting at the same desk, editing the same piece of paper.”

Sarah grinned, selecting another paragraph just to watch David react. “You can see this selection?”

“Instantly. And there’s this little ‘Sarah is typing…’ indicator that just appeared. Maya, this is incredible.”

Maya had implemented several presence features:

  1. Real-time cursors: See where everyone is looking
  2. Live selections: Highlighted text shows what others are selecting
  3. Typing indicators: Know when someone is actively editing
  4. User avatars: Visual indication of who’s in the document
  5. Edit highlights: Briefly highlight where changes are made

The technical implementation was surprisingly elegant. Maya created a presence manager that could track and display multiple users’ activities:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
// src/lib/collaboration/presence.js
class PresenceManager {
  constructor(collaborationSocket, editor) {
    this.socket = collaborationSocket;
    this.editor = editor;
    this.cursors = new Map(); // userId -> cursor element
    this.throttledCursorUpdate = this.throttle(this.sendCursorUpdate.bind(this), 100);
    
    this.setupEventListeners();
    this.setupSocketHandlers();
  }

  setupEventListeners() {
    // Track local cursor and selection changes
    this.editor.on('cursorActivity', () => {
      this.throttledCursorUpdate();
    });
    
    // Track typing indicators
    let typingTimer = null;
    this.editor.on('keydown', () => {
      this.socket.emit('presence', { type: 'typing', isTyping: true });
      
      clearTimeout(typingTimer);
      typingTimer = setTimeout(() => {
        this.socket.emit('presence', { type: 'typing', isTyping: false });
      }, 1000);
    });
  }

  sendCursorUpdate() {
    const cursor = this.editor.getCursor();
    this.socket.emit('cursor_position', {
      line: cursor.line,
      ch: cursor.ch,
      timestamp: Date.now()
    });
  }

  throttle(func, delay) {
    let timeoutId;
    let lastExecTime = 0;
    return function (...args) {
      const currentTime = Date.now();
      
      if (currentTime - lastExecTime > delay) {
        func.apply(this, args);
        lastExecTime = currentTime;
      } else {
        clearTimeout(timeoutId);
        timeoutId = setTimeout(() => {
          func.apply(this, args);
          lastExecTime = Date.now();
        }, delay - (currentTime - lastExecTime));
      }
    };
  }
}

The Performance Sweet Spot

The WebSocket implementation delivered everything they’d hoped for:

The User Experience:

  • Instant presence awareness: Users could see who was active and where they were working
  • Real-time collaboration: Changes appeared immediately on all screens
  • Smooth interactions: Cursor movements and selections felt natural and responsive
  • Reduced conflicts: Knowing where others are working prevented accidental overwrites

The Technical Metrics (estimate):

  • Connection overhead: One persistent connection per user instead of constant HTTP requests
  • Latency: Sub-200ms for presence updates, maintaining the fast document sync from SSE
  • Bandwidth efficiency: Tiny JSON messages for cursor positions and presence updates
  • Battery life: Eliminated the constant polling, extending mobile device battery life

During their next all-hands demo, Sarah practically bounced in her chair as she shared her screen. “Watch this,” she said, opening a document where David was already working. His cursor was visible, moving through a paragraph about feature requirements.

“I can see exactly where David is focused,” Sarah explained to the team. “When he starts selecting text, I know he’s about to make a change. When I see his cursor in a section I want to edit, I can just… work somewhere else for a minute. It’s like pair programming, but for document writing. We’re finally collaborating instead of accidentally sabotaging each other.”

The Growing Complexity

But success bred new challenges. As more teams started using MeetMind, Maya began noticing problems she hadn’t anticipated.

The Race Condition Problem: Two users would edit the same paragraph simultaneously. Even with WebSockets, the order of edits wasn’t guaranteed. Sometimes Sarah’s edit would arrive first, sometimes David’s. The document would flicker between states, and occasionally, one person’s work would vanish entirely.

The Offline Nightmare: David’s coffee shop WiFi would cut out mid-edit. When he reconnected, his changes were gone. Even worse, the document was in a completely different state—other people had kept working while he was offline, and there was no way to reconcile his local changes with the server state.

The Scale Challenge: With 50+ concurrent users across multiple documents, the WebSocket server was struggling. Each presence update was being broadcast to every connected user, creating an O(n²) message amplification problem.

Maya found herself staying later and later, debugging increasingly complex scenarios that kept her staring at her monitor until 2 AM:

1
2
3
4
5
6
// The nightmare debugging session
WebSocket connection lost at 14:32:15
User had unsaved changes: "Modified paragraph 3, added bullet point"
Server state at reconnect: "Paragraph 3 deleted by another user"
User's local state: "Paragraph 3 with new bullet point"
// How do we reconcile this?

“You know what our problem is?” Alex said, looking up from his third cup of coffee during what had become a nightly ritual of debugging sessions. “We’re fighting the fundamental problem of distributed systems. When you have multiple sources of truth—and every connected client is a source of truth—conflicts aren’t just possible, they’re mathematically inevitable.”

Maya stepped back from the whiteboard that had become her obsession—a maze of arrows showing message flows, conflict scenarios, and failed resolution strategies. Every solution they’d tried just moved the problem around.

“There has to be a better way,” she said, more to herself than to Alex. “Google Docs doesn’t have these problems. Figma doesn’t have these problems. Notion, Miro, Linear—they all handle real-time collaboration without users losing work. What are we missing? What do they know that we don’t?”

That question would lead them down a rabbit hole that would reshape their entire understanding of real-time collaboration.


Maya had built a beautiful real-time system with WebSockets, but she was about to discover that real-time communication was only the beginning. The hard part wasn’t getting the messages to flow—it was making sure those messages resulted in a consistent, conflict-free state for everyone.

What happens when two people edit the same sentence simultaneously? How do you handle users going offline mid-edit? And how do you scale this to hundreds of concurrent users without the system melting down?

Coming up: The complexity wall that every real-time collaborative application hits, and why sometimes the simplest-looking solutions require the most sophisticated technology underneath.

Chapter 3: The Complexity Wall

The Research Deep Dive

Maya spent the next week doing what every engineer does when faced with an impossible problem: she disappeared down a research rabbit hole. Her browser tabs multiplied like digital rabbits—academic papers on operational transformation, blog posts about Figma’s architecture, deep dives into Google’s Wave protocol, discussions of conflict-free replicated data types.

Alex found her at 1 AM on Thursday, surrounded by empty coffee cups and Post-it notes covered in illegible diagrams.

“Find anything useful?” he asked, settling into the chair next to her with his own late-night coffee.

Maya turned her laptop screen toward him. “Look at this,” she said, pointing to a blog post titled “You might not need a CRDT” by someone at a company called Jamsocket. “This guy is arguing that most collaborative apps don’t actually use CRDTs—they use centralized conflict resolution.”

“CRDTs?” Alex squinted at the screen.

“Conflict-free Replicated Data Types. Think of them as data structures that automatically merge changes without conflicts, even when multiple people edit simultaneously.” Maya’s eyes had that particular gleam of an engineer who’d found something fascinating. “But here’s the thing—this post says companies like Figma don’t use them. They use server-ordered operations instead.”

Alex leaned back in his chair. “So there are two different approaches to this problem?”

“At least two,” Maya said, pulling up another tab. “There’s the CRDT approach—make the data structures themselves conflict-free, so any order of operations produces the same result. And there’s the centralized approach—have one authoritative server that decides the order of all operations.”

The Scaling Crisis

Their philosophical discussion was interrupted the next morning when Sarah burst into the office with the kind of energy that usually meant either great news or catastrophic problems.

“We have a situation,” she announced, laptop already open and connecting to the projector. “I just got off a call with that consulting firm in Chicago—they want to use MeetMind for a client workshop next week. Fifty people. All editing the same strategic planning document. At the same time.”

Maya felt her stomach drop. Their current WebSocket solution was already struggling with 10-15 concurrent users. Fifty would be a disaster.

“There’s more,” Sarah continued, pulling up her notes. “They specifically asked about offline support. Apparently, their client has these strategy sessions at remote retreat centers with terrible WiFi. They need people to be able to work offline and sync up when connectivity comes back.”

Alex and Maya exchanged glances. Their WebSocket solution required constant connectivity. If someone went offline, they’d lose all their changes when they reconnected.

“How long do we have?” Maya asked.

“The workshop is in two weeks,” Sarah said. “Can we make it work?”

The Breaking Point

Maya and Alex spent the weekend stress-testing their WebSocket implementation with simulated users. The results were… educational.

“Look at this,” Alex said, pointing to his monitoring dashboard. “With 20 concurrent users, we’re seeing message collisions every few seconds. The server is trying to broadcast every cursor movement, every keystroke, every selection change to every connected client. It’s an O(n²) nightmare.”

Maya nodded grimly, watching the browser performance profiler. “The client can’t keep up either. We’re dropping frames, the UI is stuttering, and that’s just with message processing. When we add actual conflict resolution…”

She trailed off, pulling up a test scenario they’d been running. Two simulated users were editing the same paragraph simultaneously, and the results were chaotic. Text would appear and disappear, merge incorrectly, sometimes duplicate entire sentences.

“We need a fundamentally different approach,” Alex said. “WebSockets solved the bidirectional communication problem, but they didn’t solve the distributed systems problem.”

The Offline Revelation

That’s when Maya had her second epiphany of the month. She was reading about Google Docs’ architecture when she stumbled across a paper about something called Operational Transformation—the algorithm that powers collaborative text editing.

“Alex,” she called across the office, “come look at this.”

The paper was dense with mathematical notation, but the core concept was elegant: instead of sending the final state of changes, send the operations that create those changes. When conflicts arise, transform the operations so they can be applied in any order while producing consistent results.

“It’s like the difference between saying ’the document now contains XYZ’ versus ‘insert ABC at position 5, delete 3 characters at position 12,’” Maya explained. “Operations can be transformed and reordered. Final states… can’t.”

Alex studied the diagrams. “But this still requires a central server to do the transformation, right?”

“For Operational Transform, yes. But that’s where CRDTs come in.” Maya pulled up another paper, this one about Yjs, a JavaScript library that implemented CRDTs for collaborative editing. “CRDTs handle the conflict resolution locally, without needing a central authority. Multiple users can edit offline, and when they reconnect, the changes merge automatically without data loss.”

Alex raised an eyebrow. “Automatically? Like, actually automatically?”

“That’s what they claim. Let me show you something.” Maya pulled up a demo of Yjs running in two browser windows. She typed in one window, disconnected it from the network, typed in the other window, then reconnected the first. The changes merged seamlessly, with no conflicts or lost data.

“Okay, that’s impressive,” Alex admitted. “But there must be trade-offs, right?”

The Architecture Decision

Maya spent the next few days diving deep into both approaches, building simple prototypes and testing edge cases. The trade-offs were becoming clear.

Server-Ordered Operations (like Figma):

  • Simpler to reason about and debug
  • Central authority can enforce business rules and permissions
  • Works well for always-online applications
  • Requires constant connectivity
  • Server becomes a bottleneck at scale

CRDTs (like Yjs):

  • Complex internal mechanics, but simple to use
  • Works offline with automatic merge on reconnection
  • True peer-to-peer collaboration possible
  • Harder to enforce business rules or permissions
  • Some operations can’t be made conflict-free (complex business logic)

“The offline requirement makes the decision for us,” Maya told Alex during their architecture review meeting. “If people need to work offline and sync later, server-ordered operations are out. We can’t have a central authority if clients might not be able to reach it.”

Alex nodded slowly. “And with 50 concurrent users, the server bottleneck would kill us anyway.”

“There’s something else,” Maya continued, pulling up the Yjs documentation. “Look at how they handle presence—cursors, selections, typing indicators. It’s built in, but it’s ephemeral. Presence data doesn’t persist, it just flows between connected clients. So we get all our WebSocket presence features, but the document state itself is conflict-free.”

The Moment of Truth

Sarah walked over to their corner of the office, attracted by the energy of their discussion. “You two look like you’ve figured something out.”

“We think we have,” Maya said. “But it means rewriting our entire collaboration layer.”

She walked Sarah through the decision: replace their custom WebSocket conflict resolution with Yjs, a mature CRDT library. Users could edit offline, changes would sync automatically when connectivity returned, and the system could scale to hundreds of concurrent users without the server becoming a bottleneck.

“The catch,” Alex added, “is that we’re essentially betting our collaborative features on a technology neither of us has used in production before.”

Sarah considered this. “What’s the alternative?”

“Tell the Chicago firm we can’t handle 50 users, can’t support offline editing, and watch them choose a competitor,” Maya said bluntly.

“Or,” Alex added with a slight smile, “spend the next six months building our own conflict resolution system and probably getting it wrong.”

Sarah looked between them. “How confident are you that this CRDT thing will work?”

Maya pulled up the Yjs website, showing a list of companies using it in production: Linear, Nimbus, Tiptap, and dozens of others. “It’s not experimental. Real companies are betting real money on this technology. The question isn’t whether it works—it’s whether we can integrate it fast enough.”

“Two weeks?” Sarah asked.

“Two weeks,” Maya confirmed.

The Deep Dive Begins

That afternoon, Maya cloned the Yjs repository and started reading the source code. What she found was both elegantly simple and mind-bendingly complex.

On the surface, Yjs provided data structures that looked familiar—Y.Map for objects, Y.Array for lists, Y.Text for collaborative text editing. You could manipulate them just like normal JavaScript objects, and the library handled all the conflict resolution automatically.

Under the hood, it was a different story. The library maintained a complex web of vector clocks, operation logs, and state representations that ensured every client could independently resolve conflicts to the same final state.

“It’s like the library has two faces,” Maya explained to Alex over lunch. “The face it shows to developers is dead simple—just modify Y.Map like you would a regular Map. But internally, it’s tracking every operation, assigning unique IDs, building transformation matrices…”

“Sounds like the kind of complexity we should let someone else handle,” Alex replied pragmatically. “If we tried to build this ourselves, we’d spend the next year just on the mathematical foundations.”

Maya nodded. “That’s exactly what I was thinking. We don’t need to understand how CRDTs work internally—we just need to understand how to use them effectively.”

She pulled out her laptop and showed Alex her proof of concept: a simple collaborative text editor built with Yjs that worked across multiple browser tabs, synced through WebSockets, and even persisted to localStorage for offline editing.

“The integration pattern is actually pretty clean,” she continued. “Yjs handles the document state and conflict resolution. We handle the UI, the persistence, and the business logic on top of the collaborative layer.”

The Implementation Strategy

By the end of the week, Maya had outlined their integration strategy:

  1. Replace document state management with Yjs data structures
  2. Keep WebSockets for real-time sync and presence, but use Yjs protocols instead of custom messages
  3. Add offline persistence using Yjs’s built-in IndexedDB support
  4. Maintain server-side persistence by storing Yjs document snapshots and update logs
  5. Layer business logic on top of the CRDT layer for features like permissions and validation

“The beauty of this approach,” Maya explained to the team during Friday’s architecture review, “is that we get to keep all the user experience improvements we’ve built—real-time cursors, presence indicators, instant updates—while solving the fundamental conflict resolution problem.”

Alex nodded approvingly. “And if someone goes offline, edits for an hour, then reconnects, their changes will merge automatically without any ‘please refresh’ messages or lost work.”

“What about performance?” Sarah asked. “Fifty concurrent users still sounds like a lot.”

“CRDTs are designed for this,” Maya replied. “Instead of broadcasting every change to every user, Yjs sends minimal binary updates. Only the actual changes, compressed and efficient. And because conflict resolution happens locally on each client, the server isn’t a bottleneck anymore.”

The Weekend That Changed Everything

Maya spent the weekend building a complete proof of concept—a collaborative document editor that could handle multiple simultaneous editors, work offline, and sync seamlessly when reconnected. She invited David and Lisa to test it remotely.

Sunday evening, she got a Slack message from David: “I just spent 2 hours editing a document while my internet was down. When I reconnected, everything synced perfectly with the changes Lisa made while I was offline. This is exactly what we needed for that Chicago workshop.”

Maya stared at the message, then at her proof of concept running in three browser tabs, then at the clean, elegant Yjs code that made it all possible.

For the first time since starting MeetMind, she felt like they weren’t fighting the technology anymore. They were partnering with it.


Maya had discovered that sometimes the most sophisticated technology is the kind that makes complex problems look simple. CRDTs weren’t just a technical solution—they were a paradigm shift from fighting conflicts to embracing them.

But implementing CRDTs in production would bring its own challenges. How do you migrate existing data? How do you debug distributed state? And how do you build business logic on top of conflict-free foundations?

Coming up: The deep dive into Yjs implementation, the migration strategy, and the unexpected discoveries that come from building on top of conflict-free replicated data types.

Chapter 4: The CRDT Success Story

The Race Against Time

Maya and Alex had two weeks to implement a solution that could handle 50 concurrent users, work offline, and seamlessly sync when connectivity returned. After their deep dive into CRDTs and the decision to use Yjs, they faced the classic startup challenge: make it work, make it right, make it fast—in that order.

“We’re not building a research project,” Maya told Alex as they mapped out their implementation plan. “We’re solving a real business problem with proven technology.”

Their strategy was elegantly simple: leverage Yjs for all the hard parts—conflict resolution, offline sync, and distributed state management—while building a clean integration layer that felt natural to their existing SvelteKit application.

The Implementation Sprint

Maya spent the first week building what she called the “CRDT sandwich”: a thin layer that made Yjs documents behave like normal reactive Svelte stores, hiding all the CRDT complexity from their UI components.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
// Conceptual approach - hide CRDT complexity behind familiar APIs
function createCollaborativeDocument(docId, token) {
  // Set up Yjs with WebSocket for online sync and IndexedDB for offline persistence
  // Create reactive stores that automatically update when CRDT state changes
  // Handle authentication and permissions at the transport layer
  return {
    title: reactiveStore,    // Updates automatically when anyone changes the title
    sections: reactiveStore, // Real-time collaborative sections
    presence: reactiveStore, // Live cursors and typing indicators
    methods: {
      setTitle: (newTitle) => { /* Updates CRDT, syncs to all clients */ },
      addSection: (type) => { /* Collaborative section management */ }
    }
  };
}

Alex handled the server side, setting up Yjs document persistence and authentication. His approach was equally pragmatic: store Yjs document snapshots in their existing PostgreSQL database, with an append-only log of updates for recovery.

“The beautiful thing about this approach,” Alex explained during their mid-week check-in, “is that we’re not throwing away our existing infrastructure. Yjs handles the complex distributed systems stuff, but we still use our normal database, authentication, and permissions systems.”

The Migration Magic

One of Maya’s cleverest solutions was making the migration from their old system completely invisible to users. Instead of a big-bang migration, they implemented lazy migration: documents were converted to the CRDT format the first time someone opened them after the deployment.

“Users won’t even know it’s happening,” Maya demonstrated, opening a document in their staging environment. “From their perspective, they just open a document and it works exactly like before—except now it supports real-time collaboration and offline editing.”

The Moment of Truth

By Thursday afternoon, they had a complete working system. The test with David and Lisa was everything they’d hoped for: simultaneous editing worked flawlessly, David could edit offline for hours and his changes merged perfectly when he reconnected, and the real-time presence features made collaboration feel natural rather than chaotic.

“This is exactly what we needed for the Chicago workshop,” David said after testing the offline capabilities with his unreliable coffee shop WiFi. “I can work on strategy documents during the flights, in the hotel, wherever, and everything syncs up perfectly when I get back online.”

The Chicago Workshop Success

Two weeks later, Maya got a text message from Sarah that made her week: “Chicago workshop was incredible. 50 people editing the same strategic planning document simultaneously, some working offline during breaks, everything syncing perfectly. The client called it ’transformative.’ We just signed a $200K annual contract.”

But the real validation came from the user feedback. People weren’t just tolerating the collaborative features—they were changing how they worked because of them.

“It’s the confidence factor,” one workshop participant explained. “I know my changes won’t get lost or overwritten, so I’m more willing to contribute ideas. And seeing other people’s cursors and edits in real-time makes the whole process feel more collaborative and less like we’re working in isolation.”

The Unexpected Benefits

The CRDT implementation brought benefits Maya hadn’t anticipated. Because Yjs handled all conflict resolution locally on each client, their servers were no longer the bottleneck for collaboration. They could scale to hundreds of concurrent users without the O(n²) message broadcasting problems that had plagued their WebSocket implementation.

“Our server costs actually went down,” Alex reported during the post-project review. “We’re processing fewer messages, doing less work, and users are getting a better experience. It’s the holy grail of engineering: better performance and lower costs.”

The offline capabilities also opened up new market opportunities. Teams working in environments with unreliable connectivity—field researchers, consultants, remote locations—suddenly had a tool that worked regardless of network conditions.

The Business Transformation

Within three months of the CRDT deployment, MeetMind had transformed from a struggling productivity tool into a must-have collaboration platform. They signed deals with consulting firms, research organizations, and distributed teams who needed rock-solid collaborative editing.

Sarah’s product metrics told the story (estimate):

  • User retention increased by 40%
  • Average session length doubled
  • Customer support tickets for “lost changes” dropped to near zero
  • Net Promoter Score jumped from 6 to 8.2

“We solved a problem our users didn’t even know they had,” Sarah reflected during the quarterly review. “They thought the occasional lost edit or sync conflict was just the price of collaboration tools. When those problems disappeared, they realized how much friction we’d removed from their workflows.”

The Team’s Growth

The CRDT project had also transformed the engineering team. Maya had become genuinely excited about distributed systems and conflict-free data structures. Alex had developed a deep appreciation for choosing the right tool for the job rather than building everything from scratch.

“Six months ago, I would have spent a year trying to build our own operational transform system,” Alex admitted. “Learning to leverage mature technology for the hard problems was a game-changer. Now I can focus on the business logic that actually differentiates our product.”

Even David, their remote consultant, had been so impressed with the collaborative features that he’d started recommending MeetMind to other clients. “It’s the first collaboration tool that actually gets out of my way and lets me focus on the work,” he said.

Looking Forward

By the end of the year, MeetMind had grown from a three-person startup to a 12-person company with real revenue and happy customers. The CRDT foundation had enabled features Maya never would have attempted with their old WebSocket approach: collaborative diagramming, real-time data visualization, and even basic version control for document templates.

“The funny thing is,” Maya told Alex over coffee one Friday afternoon, “the users don’t care about CRDTs or operational transforms or any of the technical complexity underneath. They just know that the tool works the way they expect it to work. And that’s exactly how it should be.”

Sarah joined their conversation with news that would set the stage for their next chapter: “The consulting firms love the collaborative editing, but they want something more. They want real-time insights about their workshops and strategy sessions. Things like: who’s dominating the conversation? Are we getting off track? What topics are we avoiding?”

Maya and Alex looked at each other. They’d mastered conflict-free collaborative editing, but Sarah was describing something entirely different: real-time analysis of human collaboration itself.

“You’re talking about voice analysis,” Maya said slowly. “Processing speech in real-time, understanding conversational dynamics, providing insights without being intrusive.”

“Exactly,” Sarah replied. “The next frontier isn’t just helping people collaborate on documents—it’s helping them understand and improve how they collaborate as humans.”

Alex grinned. “Well, we’ve gotten pretty good at impossible technical challenges.”

And with that, MeetMind’s next chapter began.


Maya and Alex had built something remarkable: a collaborative editing platform that truly got out of the users’ way. But Sarah’s vision for real-time collaboration analysis would push them into entirely new technical territory, combining the stability of their CRDT foundation with the cutting edge of AI-powered conversation analysis.

The question wasn’t whether they could build it—it was whether they could build it without losing the elegant simplicity that had made their collaboration features so successful.


Technical Deep Dive: Real-Time Frameworks for AI Applications

Now that we’ve seen Maya, Alex, and Sarah’s journey from polling to CRDTs, let’s step back and examine the technical landscape they navigated. Here’s a practical guide to choosing the right real-time framework for AI-powered applications, with a focus on text, voice, and multimodal collaboration.

The Real-Time Technology Stack

The journey from REST polling to CRDTs represents a fundamental shift in how we think about real-time data synchronization. Each approach has its place, and understanding when to use which technology is crucial for building AI applications that feel responsive and intelligent.

REST + Polling: The Foundation**

  • Best for: Simple status updates, periodic data fetching, prototype validation
  • AI use cases: Batch processing results, model training status, scheduled report generation
  • Pros: Simple to implement, works with any HTTP client, easy to debug
  • Cons: High latency, server resource waste, poor user experience for real-time needs
  • When to use: When updates happen infrequently (>30 seconds apart) or during MVP phases

Server-Sent Events (SSE): One-Way Streams**

  • Best for: Live data feeds, streaming AI model outputs, real-time notifications
  • AI use cases: LLM token streaming, live transcription display, model inference results
  • Pros: Simple HTTP-based protocol, automatic reconnection, works through proxies
  • Cons: One-way communication only, limited browser connection pools
  • When to use: When you need to stream AI-generated content (text, audio) but don’t need client interaction
1
2
3
4
5
6
// Perfect for streaming LLM responses
const eventSource = new EventSource('/api/chat/stream');
eventSource.onmessage = (event) => {
  const token = JSON.parse(event.data);
  displayToken(token); // Real-time text generation
};

WebSockets: Bidirectional Real-Time**

  • Best for: Interactive AI applications, collaborative editing, voice/video communication
  • AI use cases: Conversational AI, collaborative document editing, real-time voice analysis
  • Pros: Low latency, bidirectional, can handle binary data (audio/images)
  • Cons: More complex than SSE, requires custom reconnection logic, can be blocked by proxies
  • When to use: When you need two-way communication for interactive AI features
1
2
3
4
5
6
7
8
9
// Ideal for interactive AI conversations
const socket = new WebSocket('/api/voice-chat');
socket.onmessage = (event) => {
  const { type, data } = JSON.parse(event.data);
  if (type === 'transcription') showTranscription(data);
  if (type === 'ai_response') playAudioResponse(data);
};
// Send audio chunks as user speaks
socket.send(audioBuffer);

CRDTs: Conflict-Free Collaboration

  • Best for: Offline-first applications, multi-user editing, distributed AI workflows
  • AI use cases: Collaborative AI-assisted writing, distributed model training, shared knowledge bases
  • Pros: Automatic conflict resolution, works offline, scales horizontally
  • Cons: Complex mental model, some operations can’t be made conflict-free
  • When to use: When multiple users need to collaborate on AI-generated content or when offline support is critical
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// Collaborative AI-assisted document editing
const doc = new Y.Doc();
const sharedText = doc.getText('content');
const aiSuggestions = doc.getMap('ai_suggestions');

// AI adds suggestions that merge conflict-free with user edits
aiSuggestions.set(`suggestion_${timestamp}`, {
  type: 'grammar_fix',
  range: [start, end],
  suggestion: aiGeneratedText
});

Framework Comparison for AI Applications

Framework Latency Offline Support Scalability Complexity Best AI Use Case
REST + Polling High (2-5s) Poor Low Batch AI processing
Server-Sent Events Low (100-500ms) Good Medium LLM token streaming
WebSockets Very Low (<100ms) Medium Medium Interactive AI chat
CRDTs (Yjs) Low (200-400ms) Excellent High Collaborative AI tools

WebRTC: Peer-to-Peer AI Applications

WebRTC deserves special mention for AI applications involving real-time audio, video, or large data transfers:

  • Best for: Voice-to-voice AI, real-time video processing, edge AI deployment
  • AI use cases: Voice cloning, real-time video filters, distributed AI inference
  • Pros: Ultra-low latency, peer-to-peer reduces server costs, handles multimedia natively
  • Cons: Complex NAT traversal, requires signaling server, limited by network topology
1
2
3
4
5
6
7
8
// Real-time voice AI processing
const peerConnection = new RTCPeerConnection();
const localAudio = await navigator.mediaDevices.getUserMedia({ audio: true });

// Process audio locally with AI, send enhanced version to peer
const audioProcessor = new AudioWorklet('ai-voice-enhancer');
localAudio.addTrack(audioProcessor.outputTrack);
peerConnection.addTrack(audioProcessor.outputTrack);

Socket.IO: The Pragmatic Choice

Socket.IO often represents the sweet spot for AI applications requiring real-time features:

  • Best for: Rapid prototyping, cross-platform compatibility, AI applications with mixed transport needs
  • AI use cases: Multi-modal AI interfaces, collaborative AI workflows, real-time model monitoring
  • Pros: Automatic fallbacks, built-in room management, excellent debugging tools
  • Cons: Larger bundle size, vendor lock-in, can be overkill for simple use cases
1
2
3
4
5
6
7
// Multi-modal AI interaction with automatic transport selection
const socket = io('/ai-workspace');

socket.emit('start_voice_session', { userId, language: 'en' });
socket.on('transcription_partial', (text) => updateTranscript(text));
socket.on('ai_response', (response) => displayResponse(response));
socket.on('collaboration_update', (changes) => applyChanges(changes));

Choosing the Right Framework: Decision Matrix

For LLM-Powered Text Applications:

  • Simple chat interface: WebSockets or Socket.IO
  • Streaming text generation: Server-Sent Events
  • Collaborative writing: CRDTs (Yjs) + WebSockets for presence
  • Document analysis: REST for processing, SSE for results

For Voice AI Applications:

  • Real-time transcription: WebSockets with binary data support
  • Voice-to-voice AI: WebRTC for ultra-low latency
  • Meeting analysis: WebSockets for audio streams + server processing
  • Offline voice notes: CRDTs with local AI processing

For Visual AI Applications:

  • Image generation: REST for heavy processing, SSE for progress updates
  • Real-time filters: WebRTC with Canvas API
  • Collaborative design: CRDTs for document state, WebSockets for cursors/selections
  • Video analysis: WebSockets for metadata, CDN for video streams

Implementation Patterns

The Hybrid Approach (Most Recommended): Many successful AI applications combine multiple technologies:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// Use the right tool for each job
class AIApplication {
  constructor() {
    this.httpClient = axios.create(); // REST for heavy operations
    this.eventSource = null;          // SSE for streaming results  
    this.socket = io();              // WebSockets for interaction
    this.collaborativeDoc = new Y.Doc(); // CRDTs for shared state
  }
  
  async processDocument(doc) {
    // Heavy AI processing via REST
    const jobId = await this.httpClient.post('/api/ai/analyze', doc);
    
    // Stream results via SSE  
    this.eventSource = new EventSource(`/api/ai/stream/${jobId}`);
    this.eventSource.onmessage = (event) => {
      this.updateProgress(JSON.parse(event.data));
    };
  }
  
  startRealTimeChat() {
    // Interactive features via WebSockets
    this.socket.on('ai_message', (message) => this.displayMessage(message));
    this.socket.emit('user_message', userInput);
  }
  
  enableCollaboration() {
    // Shared editing via CRDTs
    this.collaborativeDoc.getText().observe(() => this.syncChanges());
  }
}

Production Considerations

Authentication & Security:

  • JWT tokens work across all transports
  • WebSocket connections need careful authorization
  • CRDTs require access control at the document level

Scaling & Performance:

  • SSE: Use CDN for static content, Redis for pub/sub
  • WebSockets: Sticky sessions or Redis adapter for horizontal scaling
  • CRDTs: Periodic snapshots, update log compaction

Error Handling:

  • Implement exponential backoff for all reconnections
  • Graceful degradation when real-time features fail
  • User feedback for network issues and processing delays

The key lesson from Maya’s journey is that real-time isn’t about choosing the fastest technology—it’s about choosing the right technology for each specific use case. Modern AI applications often benefit from a thoughtful combination of approaches, each handling the aspects they’re best suited for.

Whether you’re building the next collaborative AI writing tool, real-time voice assistant, or distributed machine learning platform, understanding these trade-offs will help you create experiences that feel both intelligent and responsive.


(Written by Human, improved using AI where applicable.)