AWS Elastic Load Balancing: An Architecture Deep-Dive

October 15, 2025 at 10:25AWS Architecture Networking Load Balancing

About the author: I'm Charles Sieg, a cloud architect and platform engineer who builds apps, services, and infrastructure for Fortune 1000 clients through Vantalect. If your organization is rethinking its software strategy in the age of AI-assisted engineering, let's talk.

I've yet to ship a production architecture on AWS that doesn't involve Elastic Load Balancing somewhere. Most teams slap a load balancer in front of their service and move on. Fair enough. That works until it doesn't. After debugging enough 502 cascades at 2 AM, I can tell you: the differences between the four ELB types, and what happens when you pick wrong, deserve way more attention than they typically get. So here it is. Patterns, trade-offs, and operational scars from years of running load-balanced architectures at scale.

Skip this if you want a getting-started tutorial. AWS has plenty. What follows is an architecture reference for engineers who want to know what these load balancers actually do under the hood, when each type earns its place, and which failure modes will bite you in production.

Overview of Elastic Load Balancing

Elastic Load Balancing (ELB) is an umbrella service. Four distinct load balancer types, each targeting different network layers and use cases:

Load Balancer	Layer	Protocol Support	Primary Use Case
Application Load Balancer (ALB)	Layer 7	HTTP, HTTPS, gRPC, WebSocket	Web applications, microservices, API routing
Network Load Balancer (NLB)	Layer 4	TCP, UDP, TLS	Ultra-low latency, static IPs, extreme throughput
Gateway Load Balancer (GLB)	Layer 3	IP (GENEVE encapsulation)	Inline network appliances: firewalls, IDS/IPS
Classic Load Balancer (CLB)	Layer 4/7	HTTP, HTTPS, TCP, SSL	Legacy: do not use for new deployments

AWS shipped these incrementally. CLB came first in 2009, ALB in 2016, NLB in 2017, GLB in 2020. Each exists because the previous types couldn't handle certain architectural patterns well. That history explains why features land where they do, and why CLB is dead for anything new.

Don't think of ALB, NLB, and GLB as interchangeable. Different network layers. Different connection handling. Different scaling behavior. Pick wrong and you'll get performance problems and failure modes that are genuinely painful to diagnose while production is on fire.

flowchart TD
  A[New Load Balancer
Needed] --> B{Need inline
network appliance
inspection?}
  B -->|Yes| GLB[Gateway Load
Balancer]
  B -->|No| C{Need HTTP
routing, WAF, or
authentication?}
  C -->|Yes| ALB[Application Load
Balancer]
  C -->|No| D{Need static IPs,
PrivateLink, or
non-HTTP protocol?}
  D -->|Yes| NLB[Network Load
Balancer]
  D -->|No| E{Primarily HTTP
traffic?}
  E -->|Yes| ALB
  E -->|No| NLB

Load balancer type decision tree

How Load Balancers Actually Work in AWS

All four types share common infrastructure mechanics, and they matter more than most people realize.

Elastic Network Interfaces and AZ Mechanics

When you create a load balancer and enable it across multiple Availability Zones, AWS provisions Elastic Network Interfaces (ENIs) in each enabled AZ. One or more per AZ. These ENIs are the actual network endpoints receiving traffic. Each gets a private IP in your subnet; internet-facing load balancers also get a public IP.

Here's why this matters operationally. Your load balancer is really a collection of nodes spread across AZs, each with its own ENI and IP address. DNS resolution for the load balancer hostname returns all of these IPs, and clients should round-robin across them. If your application pins to a single resolved IP (I see this constantly with long-lived DNS caches), you're funneling all traffic through one AZ node. Distribution gone. Fault tolerance gone. The whole point of multi-AZ disappears.

Control Plane vs. Data Plane

AWS load balancers maintain a clean separation between control plane and data plane:

Control plane: Handles configuration changes: creating listeners, registering targets, modifying rules. These are API calls that propagate asynchronously. When you register a new target, it doesn't receive traffic instantly. The registration has to propagate to all load balancer nodes.
Data plane: Handles actual traffic. Accepts connections, evaluates routing rules, forwards requests, returns responses. The data plane operates independently of the control plane, so once configured, a load balancer keeps routing traffic even if the control plane is having a bad day.

That separation is why load balancers stay resilient during AWS control plane outages. Your existing configuration keeps working. You just can't change anything until the control plane recovers. I learned early on to pre-configure load balancers during quiet periods. Scrambling to make changes mid-incident when the control plane is also degraded? Don't recommend it.

How Load Balancer Nodes Scale

ALB and NLB nodes scale automatically based on traffic. The mechanics differ, though, in ways that matter:

ALB adds more nodes per AZ when connection counts, request rates, or active rule evaluations exceed current capacity. Transparent scaling. Not instant. During sudden massive spikes (the classic Super Bowl ad scenario, millions of simultaneous requests), ALB can take several minutes to catch up. You'll see 503s in the gap.
NLB operates at Layer 4, so each node handles millions of connections per second. Scaling events are rare, and when they do happen, they complete faster because NLB skips HTTP parsing and rule evaluation entirely.

Got a predictable traffic spike coming? Contact AWS Support to pre-warm ALB capacity. There's no self-service API for this. You file a support case specifying expected request rate, average request/response size, and the time window.

Cross-Zone Load Balancing

Cross-zone load balancing controls whether a load balancer node in one AZ can send traffic to targets in other AZs, or only to targets in its own AZ.

Load Balancer	Cross-Zone Default	Configurable	Cost Implications
ALB	Enabled	Yes, at target group level	No additional data transfer charges
NLB	Disabled	Yes, at target group level	Cross-AZ data transfer charges apply
GLB	Disabled	Yes	Cross-AZ data transfer charges apply
CLB	Disabled	Yes	No additional data transfer charges

The default difference between ALB and NLB trips people up constantly. ALB has cross-zone on by default, so targets get roughly equal traffic regardless of AZ. NLB has it off by default, meaning each AZ node sends traffic only to targets in its own AZ.

Think about what that means with uneven target distribution. Say you've got 3 instances in us-east-1a and 1 in us-east-1b. NLB sends 50% of traffic to each AZ. That lone instance in 1b absorbs the same total traffic as all three in 1a combined. I've watched this melt a single instance while three others sat idle. Happens more often than you'd expect.

Why does NLB default to cross-zone disabled? Cost. NLB often fronts high-throughput workloads where cross-AZ data transfer charges add up fast. AWS charges for cross-AZ data transfer with NLB but not with ALB. For latency-sensitive workloads, keeping traffic within the same AZ also dodges the 1-2ms cross-AZ penalty.

flowchart TB
  subgraph CZ_ON["Cross-Zone ON (ALB Default)"]
    direction TB
    LB1["ALB Node
AZ-1"] -->|25%| T1["Target 1
AZ-1"]
    LB1 -->|25%| T2["Target 2
AZ-1"]
    LB1 -->|25%| T3["Target 3
AZ-2"]
    LB1 -->|25%| T4["Target 4
AZ-2"]
    LB2["ALB Node
AZ-2"] -->|25%| T1
    LB2 -->|25%| T2
    LB2 -->|25%| T3
    LB2 -->|25%| T4
  end
  subgraph CZ_OFF["Cross-Zone OFF (NLB Default)"]
    direction TB
    LB3["NLB Node
AZ-1
50% traffic"] -->|50%| T5["Target 1
AZ-1"]
    LB4["NLB Node
AZ-2
50% traffic"] -->|50%| T6["Target 2
AZ-2"]
  end

Cross-zone load balancing: ALB (enabled) vs. NLB (disabled)

Application Load Balancer (ALB)

ALB operates at Layer 7 (HTTP/HTTPS). For most web application and API workloads, it's the right choice. Richest feature set of the four types: content-based routing, native authentication, deep HTTP protocol awareness.

Layer 7 Architecture

ALB fully terminates TCP connections from clients, parses the HTTP request, evaluates routing rules, then opens a new connection to the backend target. Pure proxy architecture. Client talks to ALB. ALB talks to target. Two separate connections. Here's what that means in practice:

ALB can inspect and modify HTTP headers, paths, query strings, and response codes.
Connection characteristics (TLS version, cipher suite, client certificate) between client and ALB are independent of the connection between ALB and target.
The target sees the ALB's IP address as the source, not the client's. The original client IP lives in the X-Forwarded-For header.

Request Routing

ALB's routing engine evaluates rules on each listener (a listener binds to a port; typically 80 and 443). Each rule has conditions and actions, evaluated in priority order:

Condition Type	Matches On	Example
Host header	HTTP Host header	`api.example.com`, `*.staging.example.com`
Path	URL path	`/api/v2/`, `/images/`
HTTP header	Any HTTP header	`X-Custom-Header: blue`
HTTP method	Request method	`GET`, `POST`
Query string	Query parameters	`?version=2&format=json`
Source IP	Client IP CIDR	`10.0.0.0/8`, `203.0.113.0/24`

Conditions combine with AND logic within a single rule. Each listener supports up to 100 rules (soft limit, increasable). This routing flexibility is exactly why ALB anchors so many microservice architectures. One ALB routing to dozens of different target groups based on request characteristics. Pretty powerful.

Routing Evaluation Order

Rules evaluate by priority number, lowest first. The default rule (you can't delete it) catches everything else. Typical pattern:

Priority 1: /api/auth/* routes to the authentication service target group.
Priority 2: /api/v2/* routes to the v2 API target group.
Priority 3: /api/* routes to the v1 API target group.
Default: All other requests route to the frontend target group.

Order matters here. Place /api/* before /api/v2/* and all v2 requests match the broader pattern first. They never reach the v2 target group. I've fixed this exact misconfiguration in production more than once. Embarrassingly easy to get wrong.

flowchart TD
  R[Incoming Request] --> P1["Priority 1
/api/auth/*"]
  P1 -->|Match| TG1[Auth Service
Target Group]
  P1 -->|No match| P2["Priority 2
/api/v2/*"]
  P2 -->|Match| TG2[V2 API
Target Group]
  P2 -->|No match| P3["Priority 3
/api/*"]
  P3 -->|Match| TG3[V1 API
Target Group]
  P3 -->|No match| DEF["Default Rule
*"]
  DEF --> TG4[Frontend
Target Group]

ALB listener rule evaluation order

Target Types

ALB supports three target types, each for different architectures:

Target Type	Description	Use Case
Instance	EC2 instance by instance ID	Traditional EC2-based applications, Auto Scaling Groups
IP	Specific IP address (can be outside VPC)	ECS tasks with `awsvpc` networking, on-premises targets via Direct Connect, peered VPC targets
Lambda	AWS Lambda function	Serverless HTTP APIs, lightweight webhooks

IP targets deserve a closer look for modern container architectures. When ECS tasks use awsvpc networking mode (giving each task its own ENI), the target is the task's IP address. Direct routing to individual containers, no port mapping needed.

HTTP/2 and gRPC Support

ALB supports HTTP/2 on the frontend (client to ALB) by default with an HTTPS listener. On the backend (ALB to target), you configure the target group's protocol version:

Protocol Version	Frontend	Backend	Use Case
HTTP/1.1	HTTP/2 negotiated via ALPN	HTTP/1.1 to targets	Traditional web applications
HTTP/2	HTTP/2	HTTP/2 to targets	gRPC services, HTTP/2-native backends
gRPC	HTTP/2	HTTP/2 with gRPC framing	gRPC microservices

There's a subtle gotcha here. ALB defaults to multiplexing HTTP/2 streams from the frontend into separate HTTP/1.1 requests on the backend. If your targets support HTTP/2, you need to explicitly set the target group protocol version. Otherwise you lose end-to-end multiplexing: more connection overhead, higher latency for services making lots of requests. I've seen teams puzzled by poor gRPC performance, only to discover they'd never changed this default.

WebSocket Support

ALB handles WebSocket connections natively. Client sends an HTTP Upgrade request; ALB establishes a persistent, bidirectional WebSocket connection to the target. Connection stays open until either side closes it. The idle timeout setting doesn't apply to active WebSocket connections.

One operational wrinkle worth knowing: WebSocket connections are long-lived, which complicates connection draining during deployments. Set a reasonable deregistration delay and make sure your application handles reconnection cleanly.

Sticky Sessions

ALB supports two types of session affinity:

Duration-based stickiness: ALB generates a cookie (AWSALB) routing subsequent requests from the same client to the same target for a configurable duration. Simple but coarse.
Application-based stickiness: Your application generates a cookie (custom name), and ALB uses it for affinity. Your application controls reset and expiry, giving you finer control.

I'll be blunt: avoid sticky sessions. They create uneven load distribution. Popular sessions pile onto specific targets. You can't safely remove a target with active sticky sessions without breaking those sessions. Failure recovery gets unpredictable. Externalize session state to ElastiCache or DynamoDB instead and let the load balancer distribute freely.

If sticky sessions are truly unavoidable (legacy apps with in-memory session state), go with application-based stickiness and the shortest practical duration.

Authentication Integration

ALB integrates natively with Amazon Cognito and any OIDC-compliant identity provider (Okta, Auth0, Azure AD, Google) to authenticate users before requests hit your application:

User sends a request to ALB.
ALB redirects the user to the IdP login page.
After successful authentication, the IdP sends a token back to ALB.
ALB validates the token, sets user claims in HTTP headers (x-amzn-oidc-data, x-amzn-oidc-accesstoken, x-amzn-oidc-identity), and forwards the request to the target.

I use this pattern heavily for internal tools and admin interfaces. Bolt authentication onto any HTTP service without touching the application code. The claims headers are cryptographically signed, so the backend trusts them without contacting the IdP. Dead simple. Works great.

WAF Integration

ALB integrates directly with AWS WAF. Attach a Web ACL to the load balancer and WAF rules evaluate before routing rules. Malicious requests get killed before reaching your application. Common configurations:

AWS Managed Rules for common vulnerabilities (SQL injection, XSS, known bad IPs)
Rate-based rules to throttle abusive clients
Geographic restrictions
Bot control rules

WAF pricing is per-rule and per-million-requests, and that adds up at high traffic volumes. For any internet-facing ALB, though, the cost is justified. Skip WAF on a public ALB and you'll regret it eventually. Ask me how I know.

Slow Start Mode

When a new target registers with an ALB target group, it normally gets its full traffic share immediately. Fine for stateless services that start fast. For applications that need warm-up time (populating caches, initializing connection pools, JIT compilation), this hammers the new target with load before it's ready. The others sit underutilized while the new one struggles.

Slow start mode fixes this. Ramps traffic to a new target linearly over a configurable window (30-900 seconds). JVM-based applications benefit enormously. Those first few thousand requests trigger class loading and JIT compilation that can add hundreds of milliseconds per request. Without slow start, you're effectively DDoS-ing your own freshly launched instances.

Network Load Balancer (NLB)

NLB operates at Layer 4 (TCP/UDP). It exists for extreme performance, static IP addresses, and protocol transparency. ALB is a feature-rich HTTP proxy. NLB is a high-performance packet router. Different tools, different problems.

Layer 4 Architecture

NLB uses a flow-based model. It doesn't terminate TCP connections the way ALB does. For each new TCP connection (or UDP flow), NLB selects a target and then passes packets directly between client and target for the connection's lifetime. No HTTP header parsing. No URL path evaluation. No application-layer awareness at all. Routing decisions use IP address, port, and protocol. That's it.

The performance implications are significant:

Latency: NLB adds microseconds. ALB, which has to parse HTTP, adds measurable milliseconds.
Throughput: NLB handles millions of requests per second and millions of simultaneous connections per AZ node. ALB, constrained by HTTP parsing overhead, handles substantially less.
Protocol agnosticism: NLB works with any TCP or UDP protocol. Databases, MQTT, custom binary protocols, SIP, whatever you throw at it. ALB only speaks HTTP-family protocols.

Static IPs and Elastic IPs

This is the feature that drives many NLB adoption decisions. Each NLB node in each AZ gets a static IP address that never changes for the load balancer's lifetime. You can also assign Elastic IPs for even more control.

Why it matters in practice:

Firewall allowlisting: Partners and clients who need to allowlist your endpoint by IP. ALB IPs shift as nodes scale, which makes IP-based allowlisting impossible. I've watched teams burn weeks trying to work around this with ALB before accepting they needed NLB.
DNS-independent routing: Some protocols and client libraries handle DNS-based load balancing poorly. Static IPs sidestep the whole problem.
Hybrid connectivity: On-premises systems over Direct Connect or VPN frequently require static IPs.

Source IP Preservation

NLB preserves the client's source IP by default when forwarding packets to targets. The target sees the actual client IP in the TCP packet's source address field. No X-Forwarded-For parsing required. Helpful for:

Applications needing client IP for logging, rate limiting, or security decisions at the network layer.
Network appliances (firewalls, IDS) operating on packet-level source IPs.
Protocols where header modification is impossible.

One caveat, though. When the target sits in a different AZ than the NLB node (cross-zone enabled), return packets must route back through the NLB node. NLB handles this transparently, but it adds a cross-AZ hop on the return path. Same-AZ targets get a more direct return.

And here's one that bites people: IP-address targets (like ECS awsvpc tasks) behave differently. With IP targets, the source IP becomes the NLB node's private IP. To recover the client IP, enable Proxy Protocol v2. It prepends a header to the TCP stream containing the original client IP and port.

TLS Termination at NLB

NLB supports TLS termination (the TLS listener protocol). Decrypts incoming TLS, forwards plaintext TCP to targets. You get:

Centralized certificate management via ACM.
TLS offloading from targets, reducing their CPU load.
SNI support for hosting multiple TLS domains on a single NLB.

The alternative? Use a TCP listener and pass TLS traffic straight through to targets for end-to-end encryption. You'll need this when the target requires the client certificate (mutual TLS) or when regulations mandate end-to-end encryption with no intermediate termination.

NLB and AWS PrivateLink

NLB is the only load balancer type that works as a PrivateLink service provider. PrivateLink lets you expose your service to other AWS accounts (or the AWS Marketplace) via a VPC Endpoint. Traffic stays off the public internet. No VPC peering needed.

Here's the flow:

Deploy your service behind an NLB.
Create a VPC Endpoint Service pointing to the NLB.
Consumers in other accounts create interface VPC Endpoints connecting to your endpoint service.
Traffic flows privately over the AWS network, with ENIs in the consumer's VPC.

Standard pattern for SaaS providers building AWS-native integrations. Internal platform teams also use it to expose shared services across accounts in an organization. I've set this up dozens of times. Works well once you sort out the acceptance workflow and DNS configuration.

Gateway Load Balancer (GLB)

GLB is the newest and most specialized of the bunch. Solves one specific problem: transparently inserting third-party network appliances (firewalls, IDS, deep packet inspection) into VPC traffic flows.

The GENEVE Protocol

GLB uses GENEVE (Generic Network Virtualization Encapsulation) to tunnel traffic between the GLB and appliance targets. Encapsulates the original packet (all headers preserved, including original source and destination IP) inside a UDP envelope on port 6081. The appliance receives the encapsulated packet, inspects or modifies it, and returns it through the same GENEVE tunnel.

Fundamentally different from ALB and NLB. GLB doesn't terminate connections. Doesn't modify packets. It's a transparent bump-in-the-wire that routes traffic through appliances while both source and destination remain unaware of the detour.

How Traffic Flows Through GLB

Three components in the GLB traffic flow:

GLB Endpoints (GWLBe): VPC endpoints (type: Gateway Load Balancer) in the consumer VPC's route tables. Traffic destined for inspection routes to a GWLBe via route table entries.
GLB: Receives traffic from GWLBe via PrivateLink, distributes it across appliance targets using GENEVE encapsulation.
Appliance targets: EC2 instances or IP addresses running firewall/IDS software that processes GENEVE-encapsulated traffic.

A typical flow:

An EC2 instance sends outbound traffic.
The VPC route table directs the traffic to a GWLBe.
The GWLBe sends the traffic to the GLB via PrivateLink.
GLB encapsulates the packet in GENEVE and forwards it to an appliance target.
The appliance inspects the packet, decides to allow or drop it, and returns it to GLB via GENEVE.
GLB sends the (potentially modified) packet back through the GWLBe.
The GWLBe routes the packet to its original destination.

Use Cases

Use Case	Appliance Type	Examples
Next-gen firewall	Palo Alto, Fortinet, Check Point	Centralized egress/ingress filtering
Intrusion detection/prevention	Suricata, Snort, CrowdStrike	Real-time threat detection
Deep packet inspection	Custom appliances	Compliance monitoring, data loss prevention
Traffic mirroring/analysis	Network monitoring tools	Performance analysis, forensics

Most teams never touch GLB directly. It serves organizations with compliance mandates for inline network inspection, or network security teams centralizing appliance management across VPCs and accounts. If you're reading about GLB and wondering whether you need it, you probably don't. The teams that need it already know.

Classic Load Balancer

The original. Launched 2009. Operates at both Layer 4 and Layer 7, does neither well by today's standards. No host-based routing. No path-based routing. No WebSocket support. No HTTP/2. No SNI (one certificate per load balancer). No native container support. Yikes.

AWS has been pushing customers off CLB for years, and the message is clear. Use ALB for Layer 7, NLB for Layer 4. There's a Migration Wizard in the EC2 console that analyzes your CLB and creates an equivalent ALB or NLB.

Still running Classic Load Balancers in production? Migrate them. No feature advantage. No cost advantage. CLB receives zero new features. The only defensible reason to delay: your application depends on TCP passthrough with HTTP health checks simultaneously. CLB supports that combination natively, and reproducing it with NLB requires careful handling.

ALB vs. NLB: The Core Decision

Most teams face this choice. Getting it wrong creates real operational pain.

Dimension	ALB	NLB
OSI Layer	Layer 7 (HTTP/HTTPS)	Layer 4 (TCP/UDP/TLS)
Latency added	Low milliseconds	Microseconds
Requests per second	Hundreds of thousands	Millions
Static IP	No (DNS-based)	Yes (one per AZ, supports EIP)
Source IP to target	In `X-Forwarded-For` header	Preserved natively (instance targets)
Content-based routing	Yes (host, path, header, query, method, source IP)	No
WebSocket	Yes (native)	Yes (TCP passthrough)
gRPC	Yes (native)	Yes (TCP passthrough)
WAF integration	Yes	No
Authentication	Yes (Cognito, OIDC)	No
PrivateLink provider	No	Yes
Lambda targets	Yes	No
Health check	HTTP/HTTPS with path, status code matching	TCP, HTTP, HTTPS
Cross-zone default	Enabled (free)	Disabled (charges apply)
Idle timeout	Configurable (1-4000s)	350s (TCP), not configurable
Connection draining	Configurable deregistration delay	Configurable deregistration delay
Sticky sessions	Cookie-based	Source IP affinity, or flow-hash for UDP
Certificate management	ACM, SNI for multiple certs	ACM, SNI for multiple certs

When to Use ALB

You need content-based routing (host, path, headers).
You need built-in authentication (Cognito, OIDC).
You need WAF protection.
You are running an HTTP/HTTPS web application or REST API.
You need Lambda as a target.
You want detailed HTTP-level access logs and metrics.

When to Use NLB

You need static IP addresses or Elastic IPs.
You need ultra-low latency (microseconds, not milliseconds).
You need to handle millions of requests per second.
You are load balancing non-HTTP protocols (databases, MQTT, gRPC-direct, custom TCP/UDP).
You need to be a PrivateLink service provider.
You need the client's source IP preserved at the network layer without header parsing.
You need to pass TLS through to the target without termination.

Common Mistakes

Using ALB when NLB is needed. Classic case: a partner requires IP allowlisting. ALB IPs are dynamic. You can't give them stable IPs. Period. Fix: NLB with Elastic IPs. You can also place an NLB in front of an ALB (valid, adds cost) when you need both static IPs and Layer 7 routing.

Using NLB when ALB is needed. I see teams pick NLB for raw performance when they also need path-based routing or WAF. NLB supports neither. The resulting workaround (routing logic jammed into the application, or a separate WAF appliance) costs more and adds more complexity than just using ALB would have.

Using ALB for gRPC without configuring HTTP/2 backend. ALB defaults to HTTP/1.1 on the backend. gRPC requires HTTP/2. Skip the target group protocol version setting and connections fail with cryptic errors. This one wastes hours every time.

Health Checks

Health checks determine whether targets can receive traffic. Sounds simple enough. Getting them wrong is one of the most common sources of production incidents I run into. Too aggressive and healthy targets get yanked during transient blips. Too lenient and traffic keeps flowing to dead targets.

Health Check Mechanics by Type

Parameter	ALB	NLB	GLB
Protocol	HTTP, HTTPS	TCP, HTTP, HTTPS	TCP, HTTP, HTTPS
Path	Configurable (e.g., `/health`)	Configurable for HTTP/HTTPS	Configurable for HTTP/HTTPS
Interval	5-300s (default 30s)	10 or 30s	5-300s
Healthy threshold	2-10 (default 5)	2-10 (default 5)	2-10 (default 5)
Unhealthy threshold	2-10 (default 2)	2-10 (default 2)	2-10 (default 2)
Timeout	2-120s (default 5s)	10s (TCP), 6s (HTTP)	2-120s
Success codes	200-499 configurable	200-399 for HTTP	200-399 for HTTP

Health Check Recommendations

Build a dedicated health check endpoint. Don't point health checks at your homepage. Create /health or /healthz that verifies the application can actually serve requests: database connectivity, critical dependency availability, memory pressure. You want granular control over what "healthy" means.

Tune the unhealthy threshold carefully. The default (2 consecutive failures) means two missed checks at 30-second intervals removes a target. Sixty seconds from first failure to removal. For latency-sensitive services, drop the interval to 5-10 seconds and keep the unhealthy threshold at 2-3 for faster detection. For services prone to transient blips (JVM GC pauses, brief network hiccups), raise the unhealthy threshold to 3-5 so you don't churn targets needlessly.

Mind the grace period. When a target first registers, the health check grace period (set on the ECS service or Auto Scaling Group, not on the target group) prevents premature removal while the target starts up. Application takes 60 seconds to initialize? Set the grace period to at least 90 seconds. Without it, the load balancer marks the target unhealthy and kills it before it finishes booting. I've watched entire deployments fail in a loop because someone forgot to set the grace period. Really fun to debug at 3 AM.

Health Check Cascading

Multi-layer load balancer setups (NLB fronting ALB, or a load balancer routing to ECS services) create cascading health check failures. One target behind an ALB fails its check, ALB removes it. All targets in a target group fail, the group itself goes unhealthy. An NLB health-checking that ALB then marks the ALB target unhealthy and stops sending traffic entirely. Even if the ALB has other healthy target groups serving different paths. Total traffic blackout from one bad target group.

Fix: make health check endpoints specific to each target group's purpose. Make sure target groups have enough healthy targets to absorb individual failures without the whole group going unhealthy.

TLS/SSL Handling

Certificate and TLS management used to be a real operational burden. AWS has made it almost trivial, provided you understand the patterns.

Certificate Management with ACM

AWS Certificate Manager (ACM) provides free public TLS certificates with automatic renewal. ALB and NLB integrate directly. Select a certificate when configuring an HTTPS or TLS listener. ACM renews it. You do nothing. It's one of those AWS features that just works.

Multiple certificates on a single listener work through SNI (Server Name Indication). The load balancer reads the SNI field in the client's TLS ClientHello and picks the right certificate. One load balancer serving dozens or hundreds of domains. Easy.

Feature	ALB	NLB
ACM integration	Yes	Yes
SNI support	Yes (up to 25 certs per listener, plus smart cert selection for more)	Yes
Custom certificates	Yes (IAM uploaded)	Yes (IAM uploaded)
Mutual TLS (mTLS)	Yes (client certificate verification)	No (requires TLS passthrough)
Security policies	Multiple predefined policies	Multiple predefined policies

TLS Termination Patterns

Termination at the load balancer. Most common pattern. Load balancer decrypts TLS; backend connection uses HTTP (or optionally HTTPS for encryption in transit). Offloads TLS processing and centralizes certificate management. The trade-off: traffic between load balancer and target travels unencrypted unless you configure HTTPS on the backend.

End-to-end encryption. Load balancer terminates TLS, then re-encrypts to the target over HTTPS. Both hops encrypted. Satisfies compliance requirements for encryption in transit. The target can use a self-signed certificate since the load balancer doesn't validate backend certificates by default.

TLS passthrough (NLB only). Load balancer forwards the encrypted TCP stream directly to the target. No decryption at all. The target handles termination. You'll need this for mutual TLS (where the target needs the client certificate) or regulatory mandates prohibiting intermediate termination.

Security Policies

Both ALB and NLB offer predefined security policies specifying supported TLS protocol versions and cipher suites. Naming convention: ELBSecurityPolicy-TLS13-* for TLS 1.3 support, ELBSecurityPolicy-* for TLS 1.2.

Use the most restrictive policy your clients can handle. For modern web applications, ELBSecurityPolicy-TLS13-1-2-2021-06 (TLS 1.2 and 1.3 only, strong ciphers) works well. Serving older clients or IoT devices? You may need broader TLS 1.2 cipher support. Test with your actual client population before tightening policies in production. Don't guess.

Cost Model

ELB pricing deserves careful analysis, particularly at scale. Load balancer costs can quietly become a significant budget line if you're not watching.

ALB Pricing

Two components:

Component	Cost (us-east-1)	Notes
Hourly charge	~$0.0225/hour (~$16.20/month)	Per ALB, regardless of traffic
LCU-hour	~$0.008/LCU-hour	Based on the dimension with the highest consumption

A Load Balancer Capacity Unit (LCU) measures consumption across four dimensions. You pay for whichever dimension is highest:

LCU Dimension	1 LCU Equals	Typical Bottleneck
New connections	25/second	Short-lived API requests
Active connections	3,000 concurrent	WebSocket or long-polling applications
Processed bytes	1 GB/hour	Media streaming, large file transfers
Rule evaluations	1,000/second	Complex routing with many rules

Typical web application serving 10,000 requests/minute with 2 KB average request/response size and 10 routing rules? LCU cost is modest. Roughly $5-15/month on top of the hourly charge. High-traffic applications with complex routing or WebSocket connections? Those can easily push past $100/month per ALB.

NLB Pricing

Similar structure, different unit (NLCU):

Component	Cost (us-east-1)	Notes
Hourly charge	~$0.0225/hour (~$16.20/month)	Per NLB
NLCU-hour	~$0.006/NLCU-hour	Based on highest dimension

NLCU Dimension	1 NLCU Equals	Typical Bottleneck
New connections/flows	800/second (TCP), 400/second (UDP)	High-churn connection patterns
Active connections/flows	100,000 concurrent (TCP), 50,000 (UDP)	Long-lived connections
Processed bytes	1 GB/hour	High-throughput data transfer

NLB is generally cheaper than ALB for equivalent throughput. Less processing per request means fewer capacity units consumed. But watch out for TCP workloads with millions of concurrent connections (IoT, gaming). The active connections dimension can drive costs up fast.

GLB Pricing

Component	Cost (us-east-1)
Hourly charge	~$0.0125/hour (~$9/month)
LCU-hour	~$0.004/LCU-hour

Cheapest hourly rate of the four, reflecting that GLB typically deploys once per VPC/region as a traffic chokepoint.

Cost Optimization Strategies

Consolidate ALBs. Every ALB costs ~$16/month just to exist. I've walked into environments with separate ALBs for staging.example.com and api.staging.example.com. Merge them. Use host-based routing rules. One consolidation project I ran cut ELB costs by 60%, just by collapsing a dozen underutilized ALBs.

Right-size target groups. If a target group has 50 registered targets but only 5 ever see traffic (sticky sessions, extreme request imbalance), you're paying for health check traffic to 45 targets doing nothing useful. Review periodically.

Choose NLB when ALB features are unnecessary. Don't need content-based routing, WAF, authentication, or Lambda targets? NLB is often cheaper for the same throughput.

Monitor LCU consumption. The ConsumedLCUs CloudWatch metric shows actual consumption. If it stays consistently low, the load balancer probably isn't the cost problem you think it is. Look elsewhere.

Scaling Patterns

How ALB Scales

ALB scales automatically by adding nodes within each AZ. The triggers:

Connection rate (new connections per second)
Active connection count
Request rate
Bandwidth consumption

Transparent scaling, but AWS targets a 1-5 minute response window. During that window, clients get HTTP 503 errors if traffic exceeds existing node capacity.

Pre-warming for predictable spikes. Know traffic will surge at a specific time (product launch, marketing campaign, seasonal event)? Contact AWS Support at least 48 hours out. Provide:

Expected peak requests per second
Average request size and response size
Start time and duration of the spike
The ARN of the load balancer

AWS pre-provisions extra capacity at no charge beyond normal LCU consumption. Two minutes filing the case can save hours of degraded availability.

How NLB Scales

NLB's scaling model is fundamentally different. Nodes come provisioned to handle millions of connections per second from the start. AWS distributes connections across NLB nodes per AZ via flow-hash, and nodes scale more granularly and faster than ALB nodes because per-packet processing is minimal.

In practice, NLB scaling is rarely a concern. The scenarios where NLB can't keep up involve traffic volumes that would also crush the backend targets' network interfaces. If your targets can handle it, NLB can handle it.

What Happens During Traffic Spikes

Scenario	ALB Behavior	NLB Behavior
Gradual 2x increase over 30 minutes	Scales transparently	No scaling needed
Sudden 10x spike in seconds	503s for 1-5 minutes while scaling	Handles it immediately
Sustained high traffic	Stable after scaling	Stable
Flash crowd (millions of new users)	Pre-warming required	Usually handles it; contact AWS if >10M connections/s

Common Failure Modes

Load balancer error codes and failure modes. Every production team needs to know these. Here are the ones I end up debugging most often.

HTTP 502 (Bad Gateway)

Load balancer got an invalid response from the target, or no response at all. Common causes:

Target crashed or isn't listening. Process died after passing health checks. LB sent a request into a void.
Response timeout. Target took longer to respond than the load balancer's idle timeout. ALB defaults to 60 seconds. Bump it if your application has long-running requests.
Malformed response. Target sent something that doesn't look like valid HTTP. Custom HTTP servers with non-standard headers are the usual culprit here.
Connection reset by target. Target closed the connection (RST) while the load balancer waited for a response. Happens when targets have keep-alive timeouts shorter than ALB's idle timeout.

The keep-alive race condition fix. Set your application's keep-alive timeout higher than ALB's idle timeout. ALB idle timeout is 60 seconds? Set application keep-alive to 65. That way ALB closes idle connections first, preventing the race condition. Simple rule. Fixes an enormous number of intermittent 502s.

HTTP 503 (Service Unavailable)

No healthy target available for the request. Three causes:

All targets in the target group are unhealthy.
The target group has no registered targets.
ALB is mid-scaling and lacks capacity (transient during traffic spikes).

HTTP 504 (Gateway Timeout)

Target didn't respond within the timeout. Unlike 502, the connection to the target was established successfully. The response just never came back.

For ALB: the idle timeout (default 60s, configurable up to 4000s) controls this.
For NLB: idle timeout is fixed at 350 seconds for TCP. If backend processing takes longer, implement application-level keep-alive to prevent the connection from going idle.

Connection Draining and Deregistration Delay

When a target gets deregistered (deployment, scale-in, health check failure), the load balancer stops sending new requests but continues serving in-flight requests for the deregistration delay period (default 300 seconds, configurable 0-3600 seconds).

Tune this or pay the price. Requests typically complete within 30 seconds? A 300-second deregistration delay means every deployment waits 5 minutes per target for draining. Set it slightly above your longest expected request duration. Sixty seconds works for most web applications.

NLB draining operates at the connection level. Existing TCP connections continue until they close naturally or the delay expires.

Timeout Tuning Summary

Timeout	Default	Recommendation
ALB idle timeout	60s	Match to your longest typical request. Increase for WebSocket or long-polling
NLB TCP idle timeout	350s	Not configurable. Implement application-level keep-alive if needed
Deregistration delay	300s	Set to slightly above your longest request duration
Health check interval	30s	5-10s for latency-sensitive; 30s for cost-sensitive
Health check timeout	5s (ALB)	Keep at default unless targets are consistently slow to respond
Application keep-alive	Varies	Set higher than ALB idle timeout to prevent 502 race condition

Integration Patterns

With ECS and EKS

Load balancer integration with container orchestrators is where the real architectural complexity lives. I've spent more time getting these integrations right than I'd like to admit.

ECS with ALB is the pattern I deploy most often. Each ECS service registers tasks as targets in an ALB target group. awsvpc networking mode with IP targets gives each task its own IP. Precise routing, no port conflicts. The ECS service deployment controller coordinates with the ALB during rolling updates: new tasks register, pass health checks, then old tasks deregister.

ECS with NLB serves non-HTTP services (gRPC without ALB, TCP databases, message brokers). Same awsvpc + IP targets pattern applies.

EKS with ALB runs through the AWS Load Balancer Controller (formerly ALB Ingress Controller). Creates and manages ALBs based on Kubernetes Ingress resources, mapping Ingress rules to ALB listener rules. Pod IPs become targets in IP mode; node ports become targets in instance mode. Use IP mode. Performs better, routes more directly.

EKS with NLB uses Kubernetes Service resources of type LoadBalancer with the service.beta.kubernetes.io/aws-load-balancer-type: external annotation. The AWS Load Balancer Controller handles NLB provisioning and target registration.

With Auto Scaling Groups

Auto Scaling Groups integrate natively with ALB and NLB target groups. ASG launches an instance, it auto-registers with specified target groups. Instance terminates, it deregisters with the configured delay. Clean.

The critical configuration is the health check type on the ASG:

EC2 health check: ASG only replaces instances failing EC2 status checks (hardware/hypervisor failures).
ELB health check: ASG also replaces instances failing load balancer health checks. This is the correct setting for any ASG behind a load balancer. Without it, an instance can serve errors all day (failing LB health checks) while the ASG leaves it running because the EC2 instance itself looks "healthy." I've seen this cause extended outages. Maddening to troubleshoot when you don't know to look for it.

With CloudFront

CloudFront integrates with ALB and NLB as origins. Common architecture: CloudFront in front of an ALB to cache static content, terminate TLS at edge locations, and provide DDoS protection via AWS Shield.

One security point I feel strongly about. When CloudFront is your entry point, lock down the ALB security group to accept traffic only from CloudFront. AWS publishes CloudFront IP ranges (updated regularly), or use the AWS-managed prefix list (com.amazonaws.global.cloudfront.origin-facing) in your security group rules to auto-allow CloudFront IPs.

Without this restriction, anyone can bypass CloudFront and hit the ALB directly. Your caching, WAF rules, geographic restrictions? All decoration.

With Global Accelerator

AWS Global Accelerator provides static Anycast IP addresses that route traffic to your load balancers over the AWS global network instead of the public internet. Users enter the AWS network at the nearest edge location, which cuts latency for geographically distributed traffic.

Works with ALB and NLB endpoints across multiple regions:

Static IP addresses (two Anycast IPs) as a fixed entry point.
Automatic failover between regions based on health checks.
Traffic dials for gradual blue/green deployments across regions.
Client affinity to route the same client to the same endpoint.

Global Accelerator + NLB gives you static IPs at every layer: global Anycast IPs plus regional NLB static IPs. Great for organizations that need IP-based routing end to end.

Conclusion

Elastic Load Balancing looks simple from the outside. Point traffic at a load balancer. It distributes. Done. But the decisions underneath (which type, health check tuning, TLS termination placement, failure handling) are what separate resilient architectures from fragile ones.

Here are the patterns I keep coming back to after years of running these in production:

Default to ALB for HTTP workloads. Content-based routing, WAF, and authentication justify the slightly higher latency and cost for the vast majority of web applications and APIs.
Use NLB when the requirements demand it. Static IPs, PrivateLink, non-HTTP protocols, genuine microsecond-latency requirements. Performance anxiety alone doesn't justify NLB. ALB handles hundreds of thousands of requests per second.
Always enable multiple Availability Zones. A single-AZ load balancer provides zero fault tolerance against the most common cloud failure mode. Zero.
Tune your timeouts as a system. ALB idle timeout, application keep-alive, deregistration delay: these three values must be coordinated. Mismatches cause intermittent 502s. Every single time.
Pre-warm for predictable spikes. Two minutes filing a support case can save hours of degraded availability.
Consolidate load balancers aggressively. Every ALB costs money just to exist. Use host-based and path-based routing to serve multiple services from one ALB before spinning up another.
Set the ASG health check type to ELB. One configuration line. Prevents a whole class of incidents where unhealthy instances linger because the ASG doesn't know the load balancer has marked them down.

The primitives are excellent. Combining them correctly for your specific workload? That's where it gets interesting, and it takes understanding both what each load balancer type does and how it does it.

Additional Resources

Let's Build Something!

I help teams ship cloud infrastructure that actually works at scale. Whether you're modernizing a legacy platform, designing a multi-region architecture from scratch, or figuring out how AI fits into your engineering workflow, I've seen your problem before. Let me help.

Currently taking on select consulting engagements through Vantalect.

Get in Touch View Background LinkedIn