About the author: I'm Charles Sieg, a cloud architect and platform engineer who builds apps, services, and infrastructure for Fortune 1000 clients through Vantalect. If your organization is rethinking its software strategy in the age of AI-assisted engineering, let's talk.
I have built real-time features into more systems than I can count: chat, live dashboards, IoT telemetry pipelines, collaborative editors, trading feeds, notification systems. Every one of them started with the same question: which protocol? The answer has never been the same twice, and the wrong choice has cost me weeks of rework more than once. WebSockets get all the attention. SSE gets overlooked. gRPC streaming gets misunderstood. Long polling gets dismissed too quickly. MQTT gets ignored entirely outside IoT circles. Each of these protocols solves a different problem, and the differences become painfully obvious only after you have built, deployed, and tried to scale the wrong one.
This is an architecture reference comparing WebSockets, Server-Sent Events, gRPC streaming, HTTP long polling, and MQTT. It covers how each protocol works at the wire level, where each one breaks under load, and when each one earns its place in a production architecture. WebTransport (HTTP/3) gets a section too, because it is coming whether you are ready or not.
The Protocol Landscape
Before diving into each protocol, here is the comparison that would have saved me several bad decisions early in my career:
| Protocol | Direction | Transport | Data Format | Browser Support | Connection Model |
|---|---|---|---|---|---|
| WebSocket | Bidirectional | TCP (upgraded from HTTP) | Text or binary | Full | Persistent, stateful |
| SSE | Server-to-client only | HTTP/1.1 or HTTP/2 | UTF-8 text only | Full (except IE) | Persistent, stateless |
| gRPC | Four modes (unary, server/client/bidi streaming) | HTTP/2 | Protocol Buffers (binary) | Via gRPC-Web proxy | Persistent, multiplexed |
| Long Polling | Server-to-client (simulated) | HTTP/1.1 | Any | Full | Repeated short-lived |
| MQTT | Bidirectional (pub/sub) | TCP or WebSocket | Binary (topic-based) | Via WebSocket bridge | Persistent, lightweight |
| WebTransport | Bidirectional + datagrams | HTTP/3 (QUIC) | Binary or text | Chrome, Edge (partial) | Persistent, multiplexed |
The direction column is the first filter. If you only need server-to-client updates, SSE handles it with a fraction of the operational complexity of WebSockets. If you need bidirectional communication between microservices, gRPC streaming beats WebSockets on type safety, multiplexing, and protocol efficiency. If your clients are constrained IoT devices on unreliable cellular networks, MQTT was designed for exactly that scenario.
WebSockets
How WebSockets Work
A WebSocket connection starts as an HTTP request. The client sends an Upgrade: websocket header, the server responds with 101 Switching Protocols, and the TCP connection transitions from HTTP to the WebSocket frame protocol. From that point, both sides can send messages at any time with minimal framing overhead (2-14 bytes per frame, depending on payload size and masking).
The protocol operates at a lower level than HTTP. No headers per message. No request/response semantics. Just frames flowing in both directions over a single TCP connection. This makes WebSockets fast: round-trip latency is limited only by network propagation, with no HTTP overhead per message.
Scaling WebSockets
WebSocket scaling is where the engineering gets real. Every active connection consumes server memory (the connection object, send/receive buffers, application state). A single well-tuned server with an event-driven architecture can handle 100,000-240,000 concurrent connections before CPU becomes the bottleneck, primarily from SSL/TLS termination and packet processing.
Getting to one million concurrent connections requires horizontal scaling, and horizontal scaling with WebSockets is harder than with stateless HTTP. The core problem: connections are sticky. User A on Server 1 needs to receive a message from User B on Server 50. You need a message routing layer between servers.
| Scaling Approach | Throughput | Complexity | Use Case |
|---|---|---|---|
| Single server (vertical) | 100K-240K connections | Low | Prototypes, small-scale apps |
| Redis Pub/Sub fan-out | 500K-1M connections | Medium | Chat, notifications |
| NATS JetStream routing | 1M+ connections | Medium-High | High-throughput event systems |
| Dedicated WebSocket platform (Ably, Pusher) | Millions | Low (managed) | When you do not want to operate infrastructure |
| Custom sharded architecture | Millions | High | Large-scale consumer apps (Discord, Slack) |
Failure Modes
WebSocket connections die silently. Unlike HTTP requests that return error codes, a WebSocket connection can enter a half-open state where one side believes the connection is alive while the other has moved on. TCP keepalives help, but they operate on timescales of minutes. Application-level heartbeats (ping/pong frames every 15-30 seconds) are mandatory in production.
Proxies and load balancers add another failure layer. Many HTTP proxies terminate idle connections after 60 seconds. Nginx defaults to proxy_read_timeout 60s for WebSocket connections. Cloud load balancers have their own idle timeouts: ALB defaults to 60 seconds, and you need to set the idle timeout higher for long-lived WebSocket connections. I have debugged WebSocket disconnection bugs that turned out to be a proxy three hops away silently closing connections after 90 seconds of inactivity.
Reconnection logic is entirely your responsibility. The browser's WebSocket API has no auto-reconnect. You implement exponential backoff, state reconciliation after reconnection, and buffering of messages that arrived while disconnected. Libraries like Socket.IO and reconnecting-websocket handle some of this, but they add their own protocol layer and trade-offs.
Server-Sent Events (SSE)
How SSE Works
SSE uses plain HTTP. The client sends a GET request; the server responds with Content-Type: text/event-stream and keeps the connection open. The server writes events in a simple text format:
event: price-update
data: {"symbol": "AAPL", "price": 187.42}
id: 12847
event: price-update
data: {"symbol": "GOOG", "price": 174.89}
id: 12848
That is the entire protocol. No handshake upgrade. No binary framing. No special headers beyond the content type. The simplicity is the feature.
Built-In Reconnection
SSE's reconnection handling is automatic and robust. When a connection drops, the browser's EventSource API reconnects after a configurable retry interval (default: 3 seconds). On reconnection, it sends a Last-Event-ID header containing the ID of the last event received. The server can use this to resume the stream exactly where the client left off.
Compare this to WebSockets, where you build all reconnection and state recovery logic from scratch. For server-to-client streaming, SSE gives you production-grade reliability out of the box.
The LLM Streaming Connection
SSE has experienced a resurgence because of large language models. When ChatGPT, Claude, and every other LLM streams tokens back to the browser, they use SSE. The pattern fits perfectly: unidirectional server-to-client streaming of text data with automatic reconnection. OpenAI's streaming API, Anthropic's Messages API, and nearly every LLM provider uses SSE for token streaming. If you are building anything that wraps an LLM, SSE is the protocol you will use for the frontend.
Limitations and Gotchas
SSE has real constraints that surface in production:
| Limitation | Impact | Workaround |
|---|---|---|
| Text only (UTF-8) | Cannot stream binary data natively | Base64 encoding (33% overhead) or use WebSockets for binary |
| Unidirectional | Client cannot send data over the SSE connection | Use standard HTTP POST/PUT for client-to-server messages |
| Connection limit (HTTP/1.1) | 6 connections per domain per browser, across all tabs | Use HTTP/2 (raises limit to ~100) or a single multiplexed stream |
| Proxy buffering | Intermediaries buffer the stream, delaying delivery | Disable buffering in Nginx (proxy_buffering off), add X-Accel-Buffering: no header |
| No built-in backpressure | Fast server can overwhelm slow clients | Application-level flow control or rate limiting |
The proxy buffering issue is the most common production surprise. A reverse proxy between your server and the client can legally buffer the entire stream and only forward it when the connection closes. I have seen SSE streams arrive as a single burst of hundreds of events after a proxy accumulated 30 seconds of data. The fix is always the same: disable response buffering in every proxy in the chain.
gRPC Streaming
Architecture and HTTP/2 Foundation
gRPC is an RPC framework, not a messaging protocol. That distinction matters. You define service contracts in Protocol Buffer .proto files, generate client and server stubs, and make calls that look like local function invocations. The transport layer is HTTP/2, which gives gRPC connection multiplexing (many concurrent RPCs over a single TCP connection), header compression (HPACK), and flow control (built-in backpressure).
Protocol Buffers serialize data into compact binary format, typically 3-10x smaller than equivalent JSON. Combined with HTTP/2's binary framing and header compression, gRPC moves data over the wire significantly more efficiently than any text-based protocol.
Four Streaming Modes
gRPC supports four interaction patterns:
| Mode | Client Sends | Server Sends | Use Case |
|---|---|---|---|
| Unary | 1 request | 1 response | Standard RPC calls |
| Server streaming | 1 request | Stream of responses | Fetching large datasets, real-time feeds |
| Client streaming | Stream of requests | 1 response | File uploads, telemetry ingestion |
| Bidirectional streaming | Stream of requests | Stream of responses | Chat, collaborative editing, interactive sessions |
Server streaming is the gRPC equivalent of SSE, with better performance characteristics. Bidirectional streaming competes with WebSockets, with the advantage of strong typing, built-in flow control, and no custom protocol to design.
Performance Characteristics
gRPC delivers roughly 10x the throughput of REST+JSON for equivalent payloads, primarily due to Protocol Buffer serialization and HTTP/2 framing. A single HTTP/2 connection supports up to 100 concurrent streams (configurable), and each stream can carry an independent RPC without head-of-line blocking at the HTTP level (though TCP head-of-line blocking still applies).
Flow control in HTTP/2 prevents fast producers from overwhelming slow consumers. The receiver advertises a window size, and the sender stops transmitting when the window fills. This built-in backpressure eliminates an entire class of buffer overflow bugs that plague WebSocket implementations.
Browser Limitations
gRPC does not work natively in browsers. HTTP/2's binary framing requires a level of network control that browser APIs do not expose. gRPC-Web exists as a workaround: a JavaScript client that speaks a modified gRPC protocol, with a proxy (Envoy, typically) translating between gRPC-Web and native gRPC. Client streaming and bidirectional streaming are not supported through gRPC-Web; only unary and server streaming work.
For browser-facing real-time features, gRPC-Web with server streaming is viable. For bidirectional browser communication, use WebSockets or SSE. For backend-to-backend communication, native gRPC is the clear winner.
HTTP Long Polling
How Long Polling Works
Long polling simulates server push using standard HTTP. The client sends a request; the server holds the connection open until it has new data or a timeout expires. When the server responds, the client immediately sends another request. The result: near-real-time server-to-client updates using nothing more than HTTP/1.1.
sequenceDiagram
participant Client
participant Server
Client->>Server: GET /events (request 1)
Note over Server: Holds connection open
waiting for data...
Server-->>Client: Response with new data
Client->>Server: GET /events (request 2)
Note over Server: Holds connection open
30s timeout...
Server-->>Client: Empty response (timeout)
Client->>Server: GET /events (request 3)
Note over Server: New data arrives
after 5 seconds
Server-->>Client: Response with new data When Long Polling Makes Sense
Long polling is the protocol of last resort and the protocol of first deployment. It works through every proxy, firewall, and load balancer ever built. It requires no special server configuration. It uses standard HTTP semantics that every monitoring tool, debugging proxy, and CDN understands.
I use long polling in two scenarios. First, when the infrastructure is locked down and WebSocket upgrades are blocked (corporate networks, heavily proxied environments). Second, as the initial implementation before I know whether WebSockets justify the operational complexity. Many teams start with long polling and never migrate because the update frequency (a few times per minute) never demands anything faster.
The Overhead Tax
Long polling pays a tax on every message: TCP connection setup, TLS handshake (if HTTPS), HTTP headers, and response processing. For infrequent updates (every 5-30 seconds), this overhead is negligible. For high-frequency updates (multiple per second), the overhead compounds rapidly. At 10 updates per second, long polling generates 10 full HTTP request-response cycles per second per client. WebSockets would handle the same throughput with near-zero per-message overhead after the initial handshake.
MQTT
Protocol Design for Constrained Networks
MQTT was designed in 1999 by Andy Stanford-Clark (IBM) and Arlen Nipper for connecting oil pipeline sensors over satellite links. That origin explains everything about its design: minimal bandwidth consumption, tolerance for unreliable networks, and a tiny client footprint. The protocol header is 2 bytes. A connect packet is as small as 14 bytes. Implementations exist for microcontrollers with 32KB of RAM.
Publish/Subscribe Architecture
MQTT uses a broker-based pub/sub model. Clients publish messages to topics and subscribe to topic patterns. The broker handles all routing. Clients never communicate directly.
flowchart LR
S1[Temperature
Sensor] -->|Publish
QoS 0| B[MQTT
Broker]
S2[Pressure
Sensor] -->|Publish
QoS 1| B
S3[Safety
Valve] -->|Publish
QoS 2| B
B -->|Subscribe
sensors/#| D1[Dashboard]
B -->|Subscribe
sensors/pressure| D2[Alert
Service]
B -->|Subscribe
sensors/safety| D3[Emergency
System] Quality of Service Levels
MQTT defines three QoS levels that let you trade delivery guarantees against network overhead:
| QoS Level | Name | Delivery Guarantee | Overhead | Use Case |
|---|---|---|---|---|
| 0 | At most once | Fire and forget. Message may be lost. | Lowest (1 packet) | Sensor readings where occasional loss is acceptable |
| 1 | At least once | Guaranteed delivery. Duplicates possible. | Medium (2 packets) | Commands, alerts, state updates |
| 2 | Exactly once | Guaranteed single delivery. No duplicates. | Highest (4 packets) | Financial transactions, billing events |
Most production IoT deployments use QoS 1. QoS 0 is appropriate for high-frequency sensor telemetry where individual readings are expendable. QoS 2's four-packet handshake adds enough overhead that I avoid it unless the business logic requires exactly-once semantics, which is rarer than most teams think.
AWS IoT Core supports QoS 0 and QoS 1 only. No QoS 2. If you need exactly-once delivery on AWS, you need to implement idempotency at the application layer, which is what you should be doing regardless.
Scaling MQTT Brokers
Self-hosted brokers like EMQX scale horizontally to millions of concurrent connections. EMQX clusters handle 100 million+ connections in production deployments. AWS IoT Core handles scaling transparently but imposes limits: 500,000 connections per account per region by default, with a maximum publish rate of 20,000 messages per second per account.
For IoT architectures on AWS, see the telemetry pipeline patterns in iOS Telemetry Pipeline with Kinesis, Glue, and Athena.
WebTransport: The HTTP/3 Future
What WebTransport Changes
WebTransport is a browser API built on HTTP/3 and QUIC. It provides bidirectional communication (like WebSockets), multiplexed streams (like HTTP/2), unreliable datagrams (like UDP), and zero head-of-line blocking (unlike TCP-based protocols).
QUIC uses a connection ID rather than the source IP/port tuple, so connections survive network transitions. A user switching from Wi-Fi to cellular keeps their WebTransport connection alive. WebSocket connections die on network change.
Current Status
WebTransport works in Chrome and Edge today. Firefox and Safari support is in progress. Server support is limited: Go, Rust, and .NET have experimental implementations. Node.js does not have production-grade WebTransport support yet.
For production systems in 2026, WebTransport is a technology to prototype with, not to ship on. WebSockets and SSE remain the production-grade choices. In 2-3 years, as browser support reaches 95%+ and server libraries mature, WebTransport will replace WebSockets for new projects.
AWS Integration Patterns
API Gateway WebSocket APIs
AWS API Gateway supports WebSocket APIs with managed scaling. You define routes ($connect, $disconnect, $default, and custom routes), and API Gateway handles connection management. Lambda functions process messages. DynamoDB typically stores connection state.
The pricing model: $1.00 per million messages (first billion), plus connection minutes at $0.25 per million. The connection rate limit is 500 new connections per second by default, supporting up to 3.6 million concurrent connections over the two-hour maximum connection duration.
For API Gateway architecture details, see Amazon API Gateway: An Architecture Deep-Dive.
ALB WebSocket Support
Application Load Balancers support WebSocket connections natively. No configuration required; ALB detects the Upgrade header and maintains the persistent connection. Set the idle timeout appropriately (default 60 seconds is too low for most WebSocket use cases). Sticky sessions ensure a client reconnects to the same target, which matters for applications that store connection state in memory.
For load balancer configuration, see AWS Elastic Load Balancing: An Architecture Deep-Dive.
IoT Core for MQTT
AWS IoT Core is a managed MQTT broker with device authentication, rules engine, and integration with 20+ AWS services. The rules engine can route MQTT messages directly to Kinesis, Lambda, S3, DynamoDB, or SNS without any intermediate processing code. For MQTT workloads on AWS, IoT Core eliminates the operational burden of running your own broker.
AppSync for GraphQL Subscriptions
AWS AppSync provides real-time GraphQL subscriptions over WebSockets. If your application already uses GraphQL, AppSync subscriptions are the simplest path to real-time updates. Behind the scenes, AppSync manages WebSocket connections, handles fan-out, and integrates with DynamoDB, Lambda, and other data sources.
Choosing the Right Protocol
The Decision Framework
| Scenario | Recommended Protocol | Why |
|---|---|---|
| Live dashboard, stock ticker, score updates | SSE | Server-to-client only. Auto-reconnect. Simplest to operate. |
| LLM token streaming | SSE | Industry standard. Every LLM API uses it. |
| Chat, collaborative editing | WebSocket | Bidirectional. Low latency. Client sends frequently. |
| Multiplayer game state | WebSocket (or WebTransport) | Bidirectional. Sub-frame latency required. |
| Microservice-to-microservice streaming | gRPC | Type safety. Multiplexing. Built-in flow control. Binary efficiency. |
| IoT sensor telemetry | MQTT | Designed for constrained devices. QoS levels. Tiny overhead. |
| Corporate environment with aggressive proxies | Long polling | Works through everything. No special infrastructure. |
| Low-frequency notifications (< 1/min) | Long polling or SSE | Both work. Long polling is simpler to debug. |
| Real-time + unreliable datagrams needed | WebTransport | Only option for UDP-like browser communication. Not production-ready in 2026. |
The Three-Question Filter
When I evaluate protocols for a new service, I apply three questions in order:
1. Does the client need to send data in real time? If the answer is no (dashboards, feeds, notifications, LLM streaming), use SSE. Full stop. SSE is simpler to implement, simpler to scale, simpler to debug, and automatically handles reconnection. The only reason to use WebSockets for a server-to-client stream is if you also need the client-to-server direction.
2. Are both endpoints services you control? If you are building backend-to-backend communication (microservices, data pipelines, internal tools), use gRPC streaming. You get type safety, efficient serialization, multiplexing, and flow control without designing a custom message protocol. WebSockets between services you control is almost always the wrong choice.
3. What are your client constraints? If your clients are IoT devices with limited CPU and memory on unreliable networks, use MQTT. If your clients are browsers in corporate environments where WebSocket upgrades are blocked, use long polling. If your clients are modern browsers with no proxy restrictions, WebSockets give you the best bidirectional experience today.
Key Patterns
Default to SSE for server-to-client streaming. Most real-time features are unidirectional. SSE provides automatic reconnection, event IDs for resumable streams, and works with standard HTTP infrastructure. Reaching for WebSockets when you only need server push adds operational complexity with no benefit.
Use gRPC streaming between services, WebSockets between browser and server. gRPC gives you a contract-first protocol with strong typing, multiplexing, and backpressure. WebSockets give you the browser API you need for bidirectional user-facing features. Do not mix these up.
Plan your reconnection strategy before you write your first WebSocket handler. Silent connection death is the default behavior. Implement application-level heartbeats, exponential backoff on reconnection, and state reconciliation after reconnection. These are day-one requirements for production, not day-thirty optimizations.
Audit your proxy chain for SSE buffering. Every reverse proxy, CDN, and load balancer between your server and the client is a potential buffering point. Disable response buffering explicitly at every hop. One misconfigured proxy turns your real-time stream into a batch delivery system.
MQTT belongs in IoT architectures. Using MQTT for browser-based real-time features adds a broker dependency and a WebSocket bridge without meaningful benefit over native WebSockets. Using WebSockets for IoT devices ignores MQTT's purpose-built features: QoS levels, retained messages, last-will-and-testament, and a 2-byte header.
Watch WebTransport. HTTP/3 and QUIC solve TCP's head-of-line blocking and network migration problems. When browser support reaches 95% and server libraries mature, WebTransport will replace WebSockets for new projects. Start prototyping now; ship with WebSockets until the ecosystem is ready.
Additional Resources
- MDN: Using Server-Sent Events
- MDN: WebSocket API
- gRPC Performance Best Practices
- AWS API Gateway WebSocket API documentation
- AWS IoT Core MQTT documentation
- The Challenge of Scaling WebSockets (Ably)
- MDN: WebTransport API
- gRPC Stream Performance Gotchas (Ably)
- MQTT Protocol Specification (OASIS)
- AWS AppSync Real-Time Subscriptions
Let's Build Something!
I help teams ship cloud infrastructure that actually works at scale. Whether you're modernizing a legacy platform, designing a multi-region architecture from scratch, or figuring out how AI fits into your engineering workflow, I've seen your problem before. Let me help.
Currently taking on select consulting engagements through Vantalect.

