Real-Time Messaging Protocols: WebSockets, SSE, gRPC, Long Polling, and MQTT Compared

February 26, 2026 at 00:00Architecture Networking AWS Protocols

About the author: I'm Charles Sieg, a cloud architect and platform engineer who builds apps, services, and infrastructure for Fortune 1000 clients through Vantalect. If your organization is rethinking its software strategy in the age of AI-assisted engineering, let's talk.

I have built real-time features into more systems than I can count: chat, live dashboards, IoT telemetry pipelines, collaborative editors, trading feeds, notification systems. Every one of them started with the same question: which protocol? The answer has never been the same twice, and the wrong choice has cost me weeks of rework more than once. WebSockets get all the attention. SSE gets overlooked. gRPC streaming gets misunderstood. Long polling gets dismissed too quickly. MQTT gets ignored entirely outside IoT circles. Each of these protocols solves a different problem, and the differences become painfully obvious only after you have built, deployed, and tried to scale the wrong one.

This is an architecture reference comparing WebSockets, Server-Sent Events, gRPC streaming, HTTP long polling, and MQTT. It covers how each protocol works at the wire level, where each one breaks under load, and when each one earns its place in a production architecture. WebTransport (HTTP/3) gets a section too, because it is coming whether you are ready or not.

The Protocol Landscape

Before diving into each protocol, here is the comparison that would have saved me several bad decisions early in my career:

Protocol	Direction	Transport	Data Format	Browser Support	Connection Model
WebSocket	Bidirectional	TCP (upgraded from HTTP)	Text or binary	Full	Persistent, stateful
SSE	Server-to-client only	HTTP/1.1 or HTTP/2	UTF-8 text only	Full (except IE)	Persistent, stateless
gRPC	Four modes (unary, server/client/bidi streaming)	HTTP/2	Protocol Buffers (binary)	Via gRPC-Web proxy	Persistent, multiplexed
Long Polling	Server-to-client (simulated)	HTTP/1.1	Any	Full	Repeated short-lived
MQTT	Bidirectional (pub/sub)	TCP or WebSocket	Binary (topic-based)	Via WebSocket bridge	Persistent, lightweight
WebTransport	Bidirectional + datagrams	HTTP/3 (QUIC)	Binary or text	Chrome, Edge (partial)	Persistent, multiplexed

The direction column is the first filter. If you only need server-to-client updates, SSE handles it with a fraction of the operational complexity of WebSockets. If you need bidirectional communication between microservices, gRPC streaming beats WebSockets on type safety, multiplexing, and protocol efficiency. If your clients are constrained IoT devices on unreliable cellular networks, MQTT was designed for exactly that scenario.

WebSockets

How WebSockets Work

A WebSocket connection starts as an HTTP request. The client sends an Upgrade: websocket header, the server responds with 101 Switching Protocols, and the TCP connection transitions from HTTP to the WebSocket frame protocol. From that point, both sides can send messages at any time with minimal framing overhead (2-14 bytes per frame, depending on payload size and masking).

The protocol operates at a lower level than HTTP. No headers per message. No request/response semantics. Just frames flowing in both directions over a single TCP connection. This makes WebSockets fast: round-trip latency is limited only by network propagation, with no HTTP overhead per message.

Scaling WebSockets

WebSocket scaling is where the engineering gets real. Every active connection consumes server memory (the connection object, send/receive buffers, application state). A single well-tuned server with an event-driven architecture can handle 100,000-240,000 concurrent connections before CPU becomes the bottleneck, primarily from SSL/TLS termination and packet processing.

Getting to one million concurrent connections requires horizontal scaling, and horizontal scaling with WebSockets is harder than with stateless HTTP. The core problem: connections are sticky. User A on Server 1 needs to receive a message from User B on Server 50. You need a message routing layer between servers.

Scaling Approach	Throughput	Complexity	Use Case
Single server (vertical)	100K-240K connections	Low	Prototypes, small-scale apps
Redis Pub/Sub fan-out	500K-1M connections	Medium	Chat, notifications
NATS JetStream routing	1M+ connections	Medium-High	High-throughput event systems
Dedicated WebSocket platform (Ably, Pusher)	Millions	Low (managed)	When you do not want to operate infrastructure
Custom sharded architecture	Millions	High	Large-scale consumer apps (Discord, Slack)

Failure Modes

WebSocket connections die silently. Unlike HTTP requests that return error codes, a WebSocket connection can enter a half-open state where one side believes the connection is alive while the other has moved on. TCP keepalives help, but they operate on timescales of minutes. Application-level heartbeats (ping/pong frames every 15-30 seconds) are mandatory in production.

Proxies and load balancers add another failure layer. Many HTTP proxies terminate idle connections after 60 seconds. Nginx defaults to proxy_read_timeout 60s for WebSocket connections. Cloud load balancers have their own idle timeouts: ALB defaults to 60 seconds, and you need to set the idle timeout higher for long-lived WebSocket connections. I have debugged WebSocket disconnection bugs that turned out to be a proxy three hops away silently closing connections after 90 seconds of inactivity.

Reconnection logic is entirely your responsibility. The browser's WebSocket API has no auto-reconnect. You implement exponential backoff, state reconciliation after reconnection, and buffering of messages that arrived while disconnected. Libraries like Socket.IO and reconnecting-websocket handle some of this, but they add their own protocol layer and trade-offs.

Server-Sent Events (SSE)

How SSE Works

SSE uses plain HTTP. The client sends a GET request; the server responds with Content-Type: text/event-stream and keeps the connection open. The server writes events in a simple text format:

event: price-update
data: {"symbol": "AAPL", "price": 187.42}
id: 12847

event: price-update
data: {"symbol": "GOOG", "price": 174.89}
id: 12848

That is the entire protocol. No handshake upgrade. No binary framing. No special headers beyond the content type. The simplicity is the feature.

Built-In Reconnection

SSE's reconnection handling is automatic and robust. When a connection drops, the browser's EventSource API reconnects after a configurable retry interval (default: 3 seconds). On reconnection, it sends a Last-Event-ID header containing the ID of the last event received. The server can use this to resume the stream exactly where the client left off.

Compare this to WebSockets, where you build all reconnection and state recovery logic from scratch. For server-to-client streaming, SSE gives you production-grade reliability out of the box.

The LLM Streaming Connection

SSE has experienced a resurgence because of large language models. When ChatGPT, Claude, and every other LLM streams tokens back to the browser, they use SSE. The pattern fits perfectly: unidirectional server-to-client streaming of text data with automatic reconnection. OpenAI's streaming API, Anthropic's Messages API, and nearly every LLM provider uses SSE for token streaming. If you are building anything that wraps an LLM, SSE is the protocol you will use for the frontend.

Limitations and Gotchas

SSE has real constraints that surface in production:

Limitation	Impact	Workaround
Text only (UTF-8)	Cannot stream binary data natively	Base64 encoding (33% overhead) or use WebSockets for binary
Unidirectional	Client cannot send data over the SSE connection	Use standard HTTP POST/PUT for client-to-server messages
Connection limit (HTTP/1.1)	6 connections per domain per browser, across all tabs	Use HTTP/2 (raises limit to ~100) or a single multiplexed stream
Proxy buffering	Intermediaries buffer the stream, delaying delivery	Disable buffering in Nginx (`proxy_buffering off`), add `X-Accel-Buffering: no` header
No built-in backpressure	Fast server can overwhelm slow clients	Application-level flow control or rate limiting

The proxy buffering issue is the most common production surprise. A reverse proxy between your server and the client can legally buffer the entire stream and only forward it when the connection closes. I have seen SSE streams arrive as a single burst of hundreds of events after a proxy accumulated 30 seconds of data. The fix is always the same: disable response buffering in every proxy in the chain.

gRPC Streaming

Architecture and HTTP/2 Foundation

gRPC is an RPC framework, not a messaging protocol. That distinction matters. You define service contracts in Protocol Buffer .proto files, generate client and server stubs, and make calls that look like local function invocations. The transport layer is HTTP/2, which gives gRPC connection multiplexing (many concurrent RPCs over a single TCP connection), header compression (HPACK), and flow control (built-in backpressure).

Protocol Buffers serialize data into compact binary format, typically 3-10x smaller than equivalent JSON. Combined with HTTP/2's binary framing and header compression, gRPC moves data over the wire significantly more efficiently than any text-based protocol.

Four Streaming Modes

gRPC supports four interaction patterns:

Mode	Client Sends	Server Sends	Use Case
Unary	1 request	1 response	Standard RPC calls
Server streaming	1 request	Stream of responses	Fetching large datasets, real-time feeds
Client streaming	Stream of requests	1 response	File uploads, telemetry ingestion
Bidirectional streaming	Stream of requests	Stream of responses	Chat, collaborative editing, interactive sessions

Server streaming is the gRPC equivalent of SSE, with better performance characteristics. Bidirectional streaming competes with WebSockets, with the advantage of strong typing, built-in flow control, and no custom protocol to design.

Performance Characteristics

gRPC delivers roughly 10x the throughput of REST+JSON for equivalent payloads, primarily due to Protocol Buffer serialization and HTTP/2 framing. A single HTTP/2 connection supports up to 100 concurrent streams (configurable), and each stream can carry an independent RPC without head-of-line blocking at the HTTP level (though TCP head-of-line blocking still applies).

Flow control in HTTP/2 prevents fast producers from overwhelming slow consumers. The receiver advertises a window size, and the sender stops transmitting when the window fills. This built-in backpressure eliminates an entire class of buffer overflow bugs that plague WebSocket implementations.

Browser Limitations

gRPC does not work natively in browsers. HTTP/2's binary framing requires a level of network control that browser APIs do not expose. gRPC-Web exists as a workaround: a JavaScript client that speaks a modified gRPC protocol, with a proxy (Envoy, typically) translating between gRPC-Web and native gRPC. Client streaming and bidirectional streaming are not supported through gRPC-Web; only unary and server streaming work.

For browser-facing real-time features, gRPC-Web with server streaming is viable. For bidirectional browser communication, use WebSockets or SSE. For backend-to-backend communication, native gRPC is the clear winner.

HTTP Long Polling

How Long Polling Works

Long polling simulates server push using standard HTTP. The client sends a request; the server holds the connection open until it has new data or a timeout expires. When the server responds, the client immediately sends another request. The result: near-real-time server-to-client updates using nothing more than HTTP/1.1.

sequenceDiagram
    participant Client
    participant Server
    Client->>Server: GET /events (request 1)
    Note over Server: Holds connection open
waiting for data...
    Server-->>Client: Response with new data
    Client->>Server: GET /events (request 2)
    Note over Server: Holds connection open
30s timeout...
    Server-->>Client: Empty response (timeout)
    Client->>Server: GET /events (request 3)
    Note over Server: New data arrives
after 5 seconds
    Server-->>Client: Response with new data

Long polling request lifecycle

When Long Polling Makes Sense

Long polling is the protocol of last resort and the protocol of first deployment. It works through every proxy, firewall, and load balancer ever built. It requires no special server configuration. It uses standard HTTP semantics that every monitoring tool, debugging proxy, and CDN understands.

I use long polling in two scenarios. First, when the infrastructure is locked down and WebSocket upgrades are blocked (corporate networks, heavily proxied environments). Second, as the initial implementation before I know whether WebSockets justify the operational complexity. Many teams start with long polling and never migrate because the update frequency (a few times per minute) never demands anything faster.

The Overhead Tax

Long polling pays a tax on every message: TCP connection setup, TLS handshake (if HTTPS), HTTP headers, and response processing. For infrequent updates (every 5-30 seconds), this overhead is negligible. For high-frequency updates (multiple per second), the overhead compounds rapidly. At 10 updates per second, long polling generates 10 full HTTP request-response cycles per second per client. WebSockets would handle the same throughput with near-zero per-message overhead after the initial handshake.

MQTT

Protocol Design for Constrained Networks

MQTT was designed in 1999 by Andy Stanford-Clark (IBM) and Arlen Nipper for connecting oil pipeline sensors over satellite links. That origin explains everything about its design: minimal bandwidth consumption, tolerance for unreliable networks, and a tiny client footprint. The protocol header is 2 bytes. A connect packet is as small as 14 bytes. Implementations exist for microcontrollers with 32KB of RAM.

Publish/Subscribe Architecture

MQTT uses a broker-based pub/sub model. Clients publish messages to topics and subscribe to topic patterns. The broker handles all routing. Clients never communicate directly.

flowchart LR
    S1[Temperature
Sensor] -->|Publish
QoS 0| B[MQTT
Broker]
    S2[Pressure
Sensor] -->|Publish
QoS 1| B
    S3[Safety
Valve] -->|Publish
QoS 2| B
    B -->|Subscribe
sensors/#| D1[Dashboard]
    B -->|Subscribe
sensors/pressure| D2[Alert
Service]
    B -->|Subscribe
sensors/safety| D3[Emergency
System]

MQTT pub/sub architecture with QoS levels

Quality of Service Levels

MQTT defines three QoS levels that let you trade delivery guarantees against network overhead:

QoS Level	Name	Delivery Guarantee	Overhead	Use Case
0	At most once	Fire and forget. Message may be lost.	Lowest (1 packet)	Sensor readings where occasional loss is acceptable
1	At least once	Guaranteed delivery. Duplicates possible.	Medium (2 packets)	Commands, alerts, state updates
2	Exactly once	Guaranteed single delivery. No duplicates.	Highest (4 packets)	Financial transactions, billing events

Most production IoT deployments use QoS 1. QoS 0 is appropriate for high-frequency sensor telemetry where individual readings are expendable. QoS 2's four-packet handshake adds enough overhead that I avoid it unless the business logic requires exactly-once semantics, which is rarer than most teams think.

AWS IoT Core supports QoS 0 and QoS 1 only. No QoS 2. If you need exactly-once delivery on AWS, you need to implement idempotency at the application layer, which is what you should be doing regardless.

Scaling MQTT Brokers

Self-hosted brokers like EMQX scale horizontally to millions of concurrent connections. EMQX clusters handle 100 million+ connections in production deployments. AWS IoT Core handles scaling transparently but imposes limits: 500,000 connections per account per region by default, with a maximum publish rate of 20,000 messages per second per account.

For IoT architectures on AWS, see the telemetry pipeline patterns in iOS Telemetry Pipeline with Kinesis, Glue, and Athena.

WebTransport: The HTTP/3 Future

What WebTransport Changes

WebTransport is a browser API built on HTTP/3 and QUIC. It provides bidirectional communication (like WebSockets), multiplexed streams (like HTTP/2), unreliable datagrams (like UDP), and zero head-of-line blocking (unlike TCP-based protocols).

QUIC uses a connection ID rather than the source IP/port tuple, so connections survive network transitions. A user switching from Wi-Fi to cellular keeps their WebTransport connection alive. WebSocket connections die on network change.

Current Status

WebTransport works in Chrome and Edge today. Firefox and Safari support is in progress. Server support is limited: Go, Rust, and .NET have experimental implementations. Node.js does not have production-grade WebTransport support yet.

For production systems in 2026, WebTransport is a technology to prototype with, not to ship on. WebSockets and SSE remain the production-grade choices. In 2-3 years, as browser support reaches 95%+ and server libraries mature, WebTransport will replace WebSockets for new projects.

AWS Integration Patterns

API Gateway WebSocket APIs

AWS API Gateway supports WebSocket APIs with managed scaling. You define routes ($connect, $disconnect, $default, and custom routes), and API Gateway handles connection management. Lambda functions process messages. DynamoDB typically stores connection state.

The pricing model: $1.00 per million messages (first billion), plus connection minutes at $0.25 per million. The connection rate limit is 500 new connections per second by default, supporting up to 3.6 million concurrent connections over the two-hour maximum connection duration.

For API Gateway architecture details, see Amazon API Gateway: An Architecture Deep-Dive.

ALB WebSocket Support

Application Load Balancers support WebSocket connections natively. No configuration required; ALB detects the Upgrade header and maintains the persistent connection. Set the idle timeout appropriately (default 60 seconds is too low for most WebSocket use cases). Sticky sessions ensure a client reconnects to the same target, which matters for applications that store connection state in memory.

For load balancer configuration, see AWS Elastic Load Balancing: An Architecture Deep-Dive.

IoT Core for MQTT

AWS IoT Core is a managed MQTT broker with device authentication, rules engine, and integration with 20+ AWS services. The rules engine can route MQTT messages directly to Kinesis, Lambda, S3, DynamoDB, or SNS without any intermediate processing code. For MQTT workloads on AWS, IoT Core eliminates the operational burden of running your own broker.

AppSync for GraphQL Subscriptions

AWS AppSync provides real-time GraphQL subscriptions over WebSockets. If your application already uses GraphQL, AppSync subscriptions are the simplest path to real-time updates. Behind the scenes, AppSync manages WebSocket connections, handles fan-out, and integrates with DynamoDB, Lambda, and other data sources.

Choosing the Right Protocol

The Decision Framework

Scenario	Recommended Protocol	Why
Live dashboard, stock ticker, score updates	SSE	Server-to-client only. Auto-reconnect. Simplest to operate.
LLM token streaming	SSE	Industry standard. Every LLM API uses it.
Chat, collaborative editing	WebSocket	Bidirectional. Low latency. Client sends frequently.
Multiplayer game state	WebSocket (or WebTransport)	Bidirectional. Sub-frame latency required.
Microservice-to-microservice streaming	gRPC	Type safety. Multiplexing. Built-in flow control. Binary efficiency.
IoT sensor telemetry	MQTT	Designed for constrained devices. QoS levels. Tiny overhead.
Corporate environment with aggressive proxies	Long polling	Works through everything. No special infrastructure.
Low-frequency notifications (< 1/min)	Long polling or SSE	Both work. Long polling is simpler to debug.
Real-time + unreliable datagrams needed	WebTransport	Only option for UDP-like browser communication. Not production-ready in 2026.

The Three-Question Filter

When I evaluate protocols for a new service, I apply three questions in order:

1. Does the client need to send data in real time? If the answer is no (dashboards, feeds, notifications, LLM streaming), use SSE. Full stop. SSE is simpler to implement, simpler to scale, simpler to debug, and automatically handles reconnection. The only reason to use WebSockets for a server-to-client stream is if you also need the client-to-server direction.

2. Are both endpoints services you control? If you are building backend-to-backend communication (microservices, data pipelines, internal tools), use gRPC streaming. You get type safety, efficient serialization, multiplexing, and flow control without designing a custom message protocol. WebSockets between services you control is almost always the wrong choice.

3. What are your client constraints? If your clients are IoT devices with limited CPU and memory on unreliable networks, use MQTT. If your clients are browsers in corporate environments where WebSocket upgrades are blocked, use long polling. If your clients are modern browsers with no proxy restrictions, WebSockets give you the best bidirectional experience today.

Key Patterns

Default to SSE for server-to-client streaming. Most real-time features are unidirectional. SSE provides automatic reconnection, event IDs for resumable streams, and works with standard HTTP infrastructure. Reaching for WebSockets when you only need server push adds operational complexity with no benefit.

Use gRPC streaming between services, WebSockets between browser and server. gRPC gives you a contract-first protocol with strong typing, multiplexing, and backpressure. WebSockets give you the browser API you need for bidirectional user-facing features. Do not mix these up.

Plan your reconnection strategy before you write your first WebSocket handler. Silent connection death is the default behavior. Implement application-level heartbeats, exponential backoff on reconnection, and state reconciliation after reconnection. These are day-one requirements for production, not day-thirty optimizations.

Audit your proxy chain for SSE buffering. Every reverse proxy, CDN, and load balancer between your server and the client is a potential buffering point. Disable response buffering explicitly at every hop. One misconfigured proxy turns your real-time stream into a batch delivery system.

MQTT belongs in IoT architectures. Using MQTT for browser-based real-time features adds a broker dependency and a WebSocket bridge without meaningful benefit over native WebSockets. Using WebSockets for IoT devices ignores MQTT's purpose-built features: QoS levels, retained messages, last-will-and-testament, and a 2-byte header.

Watch WebTransport. HTTP/3 and QUIC solve TCP's head-of-line blocking and network migration problems. When browser support reaches 95% and server libraries mature, WebTransport will replace WebSockets for new projects. Start prototyping now; ship with WebSockets until the ecosystem is ready.

Additional Resources

Let's Build Something!

I help teams ship cloud infrastructure that actually works at scale. Whether you're modernizing a legacy platform, designing a multi-region architecture from scratch, or figuring out how AI fits into your engineering workflow, I've seen your problem before. Let me help.

Currently taking on select consulting engagements through Vantalect.

Get in Touch View Background LinkedIn