Amazon API Gateway: An Architecture Deep-Dive

September 18, 2025 at 13:45AWS Architecture API Gateway Serverless

About the author: I'm Charles Sieg, a cloud architect and platform engineer who builds apps, services, and infrastructure for Fortune 1000 clients through Vantalect. If your organization is rethinking its software strategy in the age of AI-assisted engineering, let's talk.

Amazon API Gateway sits in front of most serverless and microservice architectures on AWS. Three distinct API types, a control plane versus data plane split, a layered throttling hierarchy, a caching layer, a rich integration model. Most teams deploying API Gateway never dig into these mechanics. I have spent years building and operating API Gateway-backed systems handling everything from low-traffic internal tools to production APIs processing tens of thousands of requests per second, and I learned most of the hard lessons the hard way.

This is the architecture reference I wish I had found when I was making foundational decisions about API design on AWS. It covers internals, trade-offs between API types, and the failure modes that only show up under real production load.

The Three API Gateway Types

API Gateway is really three distinct services sharing a name and a console. Picking the right one is the single most consequential decision you will make when adopting the service, and I still see teams get it wrong.

REST API was the original, launched in 2015. It was built as a full-featured API management platform with request/response transformation, model validation, API keys, usage plans, caching, canary deployments, and extensive CloudWatch integration. It is powerful, flexible, and relatively expensive.

HTTP API launched in late 2019 as a response to customer feedback that REST API was too complex and too expensive for the majority of use cases, particularly the common pattern of "receive HTTP request, invoke Lambda, return response." HTTP API strips out most of the advanced features and delivers a simpler, faster, cheaper product.

WebSocket API launched in late 2018 for real-time, bidirectional communication. It manages persistent WebSocket connections and routes messages to backend integrations based on message content.

Dimension	REST API	HTTP API	WebSocket API
Release year	2015	2019	2018
Protocol	HTTP/1.1	HTTP/1.1, HTTP/2	WebSocket
Endpoint types	Edge-optimized, Regional, Private	Regional	Regional
Request pricing (us-east-1)	$3.50 per million	$1.00 per million	$1.00 per million (messages) + $0.25 per million (connection minutes)
Payload limit	10 MB	10 MB	128 KB per frame
Timeout	29 seconds	29 seconds	29 seconds (integration), 2 hours (idle connection)
API key management	Yes	No	No
Usage plans and throttling	Yes (per-key, per-stage, per-method)	No	No
Request/response transformation	Yes (VTL mapping templates)	No (parameter mapping only)	Yes (VTL mapping templates)
Caching	Yes (0.5 GB - 237 GB)	No	No
WAF integration	Yes	No	No
Resource policies	Yes	No	No
Mutual TLS	Yes	Yes	No
Custom domain names	Yes	Yes	Yes
Lambda authorizers	Request and Token types	Request type (v2 payload format)	Request type
Cognito authorizers	Yes (native)	Yes (via JWT authorizer)	No
Private integrations (VPC Link)	Yes (NLB-based)	Yes (ALB, NLB, Cloud Map)	No
AWS service integrations	Yes (direct)	No	No
Mock integrations	Yes	No	No
Request validation	Yes	No	No
X-Ray tracing	Yes	No	No

The decision comes down to how much control you need over the request/response pipeline. REST API gives you full control at every stage: validate, transform, route, transform again, cache. HTTP API gives you almost none of that; it is a managed reverse proxy handling TLS termination, authorization, and routing. For most Lambda-backed APIs, HTTP API is the correct choice. Reach for REST API only when you need a specific capability that HTTP API lacks.

flowchart TD
  A[New API on AWS] --> B{Need real-time
bidirectional
communication?}
  B -->|Yes| WS[WebSocket API]
  B -->|No| C{Need caching, VTL
transformation, WAF,
usage plans, or
request validation?}
  C -->|Yes| REST[REST API]
  C -->|No| D{Need JWT auth,
HTTP/2, or
lowest cost?}
  D -->|Yes| HTTP[HTTP API]
  D -->|No| HTTP2[HTTP API
default choice]

API Gateway type decision tree

Architecture Internals

Control Plane vs. Data Plane

API Gateway follows the standard AWS pattern of separating the control plane from the data plane. The distinction matters more than most people realize.

The control plane handles API configuration: creating APIs, defining resources and methods, deploying stages, configuring authorizers, and managing domain names. Control plane operations are API calls that propagate asynchronously to the data plane. When you deploy a new stage, the configuration must propagate to all data plane nodes before it is fully active. This propagation typically completes within seconds but can take up to a minute under heavy control plane load.

The data plane handles actual request processing: accepting incoming API calls, executing authorizers, routing to integrations, transforming requests and responses, enforcing throttling, and returning results. Once an API stage is deployed, the data plane keeps serving traffic even if the control plane goes down. You just cannot make configuration changes until the control plane recovers.

The operational implication: never depend on real-time control plane operations during incident response. If you need to disable an API method or tighten throttling during a production incident, those changes require control plane availability. I keep WAF rules and Lambda authorizer kill-switches as the first line of defense for exactly this reason.

How API Gateway Processes a Request

You need to understand this pipeline to debug latency issues and design efficient APIs. For REST APIs, the sequence is:

TLS termination and HTTP parsing. API Gateway terminates TLS and parses the incoming HTTP request.
CloudFront processing (edge-optimized only). For edge-optimized endpoints, the request passes through the CloudFront network to reach the API Gateway regional endpoint. This adds latency for nearby clients but reduces latency for geographically distributed clients.
Resource policy evaluation. If a resource policy is attached, API Gateway evaluates it first. Deny results short-circuit the request.
Method request validation. If request validation is configured, API Gateway validates the request body and parameters against the defined model. Invalid requests receive a 400 error before reaching the integration.
Authorization. API Gateway evaluates the configured authorizer (IAM, Cognito, Lambda, or API key). Unauthorized requests receive a 401 or 403.
Throttling. API Gateway checks the request against account-level, stage-level, and method-level throttle settings. If the request exceeds any limit, a 429 Too Many Requests response is returned.
Cache lookup (REST API only). If caching is enabled, API Gateway checks the cache. A cache hit returns the cached response and skips the integration entirely.
Request mapping (REST API only). VTL mapping templates transform the incoming request into the format expected by the backend integration.
Integration execution. API Gateway forwards the request to the backend (Lambda, HTTP endpoint, AWS service, VPC Link target, or mock).
Response mapping (REST API only). VTL mapping templates transform the integration response into the format returned to the client.
Response return. The final response is sent back to the client.

flowchart TD
  A[TLS Termination & HTTP Parsing] --> B[CloudFront Processing
edge-optimized only]
  B --> C{Resource Policy}
  C -->|Deny| Z[403 Forbidden]
  C -->|Allow| D[Request Validation]
  D -->|Invalid| Z2[400 Bad Request]
  D -->|Valid| E{Authorization}
  E -->|Unauthorized| Z3[401 / 403]
  E -->|Authorized| F{Throttle Check}
  F -->|Over limit| Z4[429 Too Many Requests]
  F -->|Within limit| G{Cache Lookup}
  G -->|Hit| K[Return Cached Response]
  G -->|Miss| H[Request Mapping VTL]
  H --> I[Integration Execution]
  I --> J[Response Mapping VTL]
  J --> K2[Return Response]

REST API request processing pipeline

The HTTP API pipeline is far simpler. It omits VTL transformation, request validation, caching, and resource policies. That stripped-down pipeline is why HTTP APIs consistently add 5-10ms less overhead than REST APIs.

Endpoint Types

REST APIs support three endpoint types, each with different networking and performance characteristics:

Endpoint Type	Architecture	Best For	Latency Profile
Edge-optimized	Request routes through an AWS-managed CloudFront distribution to regional API Gateway	APIs with geographically distributed clients who do not need a custom CloudFront configuration	Lower latency for distant clients, higher for same-region clients due to the CloudFront hop
Regional	Request goes directly to regional API Gateway	APIs with same-region clients, or when you manage your own CloudFront distribution in front	Lowest latency for same-region clients, full control when paired with your own CloudFront
Private	Accessible only via interface VPC endpoints (AWS PrivateLink)	Internal APIs that must never be accessible from the internet	Same-region only, VPC-level network isolation

I keep seeing teams use edge-optimized endpoints for APIs where the clients live in the same region as the API. A frontend hosted in us-east-1 calling its own backend API in us-east-1, routed through CloudFront for no reason. That extra hop adds latency to same-region calls. Use regional endpoints for this pattern.

For APIs that genuinely serve global traffic, deploy a regional endpoint and place your own CloudFront distribution in front of it. You get full control over CloudFront caching behavior, WAF rules at the edge, custom error pages, and edge compute functions. None of that is configurable when API Gateway manages the CloudFront distribution for edge-optimized endpoints.

Private endpoints are the correct choice for any API that should never be accessible from the public internet. They use interface VPC endpoints (powered by AWS PrivateLink), which means traffic never leaves the AWS network. Combine private endpoints with resource policies to restrict access to specific VPCs, accounts, or VPC endpoints for defense-in-depth.

HTTP APIs and WebSocket APIs are regional only, with no edge-optimized or private variant. If you need edge optimization for an HTTP API, place CloudFront in front of it. If you need private access for HTTP backends, use a private ALB or NLB with VPC Link.

How the CloudFront Integration Works

Edge-optimized REST APIs deploy a hidden, AWS-managed CloudFront distribution behind the scenes. You cannot see it in the CloudFront console. You cannot attach WAF web ACLs, configure cache behaviors, or add Lambda@Edge or CloudFront Functions. The distribution forwards all headers and query parameters to the API Gateway regional endpoint with a single cache behavior.

So what do you actually get? TLS termination and global request routing. No edge caching. The API Gateway cache (if enabled) operates at the regional layer, after the request has already traversed CloudFront. Distant clients get lower latency from CloudFront's optimized network, but API responses are never cached at the edge.

If you want edge caching of API responses (and for read-heavy APIs, you almost certainly do), you must deploy your own CloudFront distribution in front of a regional endpoint and configure cache behaviors with appropriate cache policies.

REST API vs. HTTP API: The Core Decision

Teams agonize over this decision. They should. The two API types overlap significantly in capability but diverge in ways that are expensive to reverse later.

When to Use REST API

Choose REST API when you need any of the following capabilities that HTTP API does not support:

API Gateway caching. If you want API Gateway to cache integration responses and serve subsequent identical requests from cache without hitting the backend, you must use REST API.
Request/response transformation. VTL mapping templates for transforming request and response payloads are REST API only. HTTP API supports only simple parameter mapping (headers, query strings, path parameters).
Request validation. Validating request bodies and parameters against JSON Schema models before they reach the integration is REST API only.
WAF integration. Attaching an AWS WAF web ACL directly to your API requires REST API. HTTP API has no native WAF integration.
Usage plans and API keys. Metering and throttling per API key with usage plans is REST API only.
AWS service integrations. Direct integration with AWS services (SQS, Step Functions, DynamoDB, Kinesis, SNS, S3, EventBridge) without a Lambda function in between is REST API only.
Resource policies. Cross-account access control and IP-based access restrictions via resource policies are REST API only.
X-Ray tracing. Built-in AWS X-Ray integration for distributed tracing is REST API only.
Mock integrations. Returning static responses without any backend integration is REST API only.
Canary deployments. Routing a percentage of stage traffic to a new deployment is REST API only.
Private endpoints. VPC-only accessibility via interface VPC endpoints is REST API only.

When to Use HTTP API

Choose HTTP API when:

Your primary use case is proxying requests to Lambda functions or HTTP backends.
You want lower latency. HTTP API is measurably faster, typically adding 5-10ms less overhead than REST API.
You want lower cost. HTTP API costs $1.00 per million requests versus $3.50 per million for REST API, a 71% savings.
You need native JWT authorization. HTTP API has built-in JWT authorizer support that validates tokens from any OIDC-compliant provider without a custom Lambda authorizer.
You need ALB or Cloud Map as a VPC Link target. HTTP API's VPC Link is more flexible, supporting ALB, NLB, and Cloud Map service discovery, whereas REST API's VPC Link only supports NLB.
You want automatic deployments. HTTP API supports auto-deploy, which deploys changes to a stage immediately without a separate deployment step.
You need HTTP/2 support. HTTP API supports HTTP/2, which REST API does not.
You want simplified CORS configuration. HTTP API handles CORS at the API level with a few parameters rather than requiring manual OPTIONS methods and response header configuration.

Performance Comparison

In my production measurements, the overhead difference between REST API and HTTP API is consistent:

Metric	REST API	HTTP API
API Gateway overhead (P50)	~15-20 ms	~5-10 ms
API Gateway overhead (P99)	~50-80 ms	~20-40 ms
Cold start impact	No difference	No difference
Maximum throughput	10,000 RPS (default account limit)	10,000 RPS (default account limit)

These numbers are pure API Gateway overhead; they exclude integration execution time (Lambda duration, database queries, etc.). If your Lambda functions run 100ms+, the 10ms difference between API types is noise. But for latency-sensitive APIs where you are fighting for every millisecond, HTTP API's lower overhead matters.

Cost Comparison at Scale

The pricing difference is substantial and compounds quickly:

Monthly Request Volume	REST API Cost	HTTP API Cost	Monthly Savings	Annual Savings
1 million	$3.50	$1.00	$2.50	$30
10 million	$35.00	$10.00	$25.00	$300
100 million	$350.00	$100.00	$250.00	$3,000
1 billion	$3,500.00	$1,000.00	$2,500.00	$30,000
10 billion	$35,000.00	$10,000.00	$25,000.00	$300,000

At high volume, the 71% cost reduction of HTTP API is large enough to justify rearchitecting around its limitations. If you are spending thousands per month on REST API and you are not using caching, VTL transformations, or WAF, you are overpaying. Full stop.

Default to HTTP API. Need WAF? Put CloudFront in front and attach WAF there. Need caching? CloudFront caching. Need request validation? Do it in Lambda. The 71% cost savings and lower latency justify these architectural adjustments for most workloads.

Integration Types

The integration type determines how API Gateway connects to backend services. This is where the service earns its keep as an API management layer.

Lambda Proxy Integration

Lambda proxy integration is the most common pattern and the one I recommend as the default for new APIs. API Gateway passes the entire request (headers, body, path parameters, query parameters, request context) to the Lambda function as a structured event, and the Lambda function returns a structured response that API Gateway passes back to the client.

The advantage is simplicity. The Lambda function has full control over the response: status code, headers, body. No VTL mapping template to maintain, no model to define, no transformation logic living outside your code.

For REST API, use the AWS_PROXY integration type. For HTTP API, Lambda proxy is the default and essentially the only Lambda integration mode, using the v2 payload format.

The v2 payload format (HTTP API) is simpler and more ergonomic than the v1 format (REST API). The v2 format provides a cleaner event structure with deduplicated fields and simpler response formatting. If you are starting a new project and considering REST API solely for its richer event format, that is not a valid reason; v2 is better designed.

HTTP Proxy Integration

HTTP proxy integration forwards requests to an HTTP endpoint (any publicly accessible URL or a private endpoint via VPC Link). API Gateway acts as a transparent proxy, passing the request through to the backend and returning the response to the client.

Common use cases:

Fronting existing HTTP services with API Gateway's authentication, throttling, and monitoring capabilities.
Migrating monolithic APIs incrementally: route some paths through API Gateway to new Lambda functions while proxying the rest to the legacy service.
Connecting to third-party APIs through a controlled gateway with API key injection or request transformation.

For REST API, the HTTP_PROXY integration type passes requests through with minimal processing. The HTTP (non-proxy) integration type enables VTL mapping templates for request and response transformation before and after the backend call.

AWS Service Integrations

REST API's direct AWS service integration is its most underappreciated feature. API Gateway calls AWS services directly. No Lambda function in the middle. No cold start. No execution cost.

AWS Service	Operation	Use Case	Why Skip Lambda
SQS	SendMessage, SendMessageBatch	Async message ingestion, webhook receivers	Eliminates cold starts, eliminates Lambda execution cost for simple queue writes
Step Functions	StartExecution, StartSyncExecution	Workflow orchestration	Sync Express Workflows return results directly; no glue Lambda needed
DynamoDB	PutItem, GetItem, Query, Scan, UpdateItem, DeleteItem	Simple CRUD APIs	Eliminates Lambda overhead for straightforward key-value operations
Kinesis	PutRecord, PutRecords	High-throughput event/stream ingestion	Lowest-latency path to a Kinesis stream
SNS	Publish	Fan-out notifications	Publish to topics without Lambda intermediary
S3	GetObject, PutObject	File upload/download	Direct object serving or presigned URL pattern
EventBridge	PutEvents	Event-driven architectures	Publish events without Lambda intermediary

The implementation uses VTL mapping templates to transform the incoming HTTP request into the AWS service's API request format (and vice versa for the response). Templates get gnarly fast for non-trivial transformations. But for simple operations like putting a message on an SQS queue, starting a Step Functions execution, or writing a DynamoDB item, the templates are straightforward and the cost savings from eliminating a Lambda function compound quickly at scale.

I use the SQS integration constantly for write-heavy APIs where I want to decouple request acceptance from processing. API Gateway writes the message to SQS and returns 200 immediately. A Lambda function polls the queue and processes asynchronously. Traffic spikes get absorbed by the queue instead of throttling clients or overwhelming downstream services.

My recommendation: use direct AWS service integrations for simple, stateless operations where the API Gateway request can be mapped to a single AWS API call. Use Lambda when you need conditional logic, multiple service calls, error handling beyond what VTL can express, or any non-trivial business logic.

VPC Link

VPC Link enables API Gateway to reach resources inside your VPC (ECS services, EC2 instances, internal load balancers, or any other private resource) without exposing them to the public internet.

Feature	REST API VPC Link	HTTP API VPC Link
Target	Network Load Balancer only	ALB, NLB, or Cloud Map
Architecture	One VPC Link per NLB	One VPC Link per VPC (shared across routes)
Setup complexity	Requires NLB even if backend is behind ALB	Can target ALB directly
Cost	NLB hourly cost + NLCU; VPC Link itself is free	ALB/NLB cost if applicable; VPC Link itself is free
Service discovery	None	AWS Cloud Map integration

HTTP API's VPC Link is more flexible and usually simpler to configure. If you need to reach an ALB-backed service, HTTP API lets you point directly at the ALB. With REST API, you would need to deploy an NLB in front of the ALB (or replace the ALB with an NLB), which adds approximately $16/month in NLB cost plus complexity.

The Cloud Map integration with HTTP API VPC Link is well-suited for microservice architectures on ECS. Your ECS services register with Cloud Map, and HTTP API routes directly to healthy instances via service discovery without any load balancer at all.

Mock Integration

Mock integration (REST API only) returns a response directly from API Gateway without calling any backend. Combined with VTL mapping templates, you can return dynamic mock responses based on request parameters. This is useful for:

CORS preflight responses (OPTIONS methods): return the CORS headers without invoking a Lambda function.
API stubs during development: return realistic responses while the backend is being built.
Health check endpoints: return a 200 OK without any backend dependency.
Maintenance mode responses: return a 503 with a maintenance message when the backend is intentionally offline.

Authentication and Authorization

API Gateway provides multiple authentication mechanisms. They target different client types and security requirements, and you can layer them for defense-in-depth.

IAM Authorization

IAM authorization uses AWS Signature Version 4 (SigV4) to authenticate requests. The caller must have valid AWS credentials and the appropriate IAM permissions to invoke the API method.

Best for: service-to-service communication within AWS, where both the caller and the API are AWS resources with IAM roles. IAM authorization is free (no additional cost beyond the API request itself), has no cold start, integrates with the full IAM policy model (including conditions on source IP, VPC endpoint, time of day, and tags), and provides the strongest authentication guarantee available on API Gateway.

Operational consideration: IAM authorization requires the caller to have AWS credentials and to sign the request with SigV4. This is straightforward for AWS SDKs and Lambda functions but impractical for browser-based JavaScript clients or mobile apps that should not have long-lived AWS credentials. For those scenarios, use Cognito to obtain temporary credentials and sign requests with the AWS Amplify library, or use JWT-based authorization.

Amazon Cognito Authorizers

Cognito authorizers (REST API) validate JWT tokens issued by a Cognito User Pool. The caller obtains a token from Cognito (via the hosted UI, the Cognito API, or a federated identity provider) and passes it in the Authorization header. API Gateway validates the token signature, expiration, and optionally the audience and scopes without calling Cognito for each request, since JWT validation is performed locally against the Cognito JWKS keys that API Gateway caches.

Best for: consumer-facing REST APIs where users authenticate through Cognito. The token validation is fast (sub-millisecond) because it is purely cryptographic verification performed locally.

For HTTP API, use the built-in JWT authorizer instead, which works with any OIDC-compliant provider including Cognito. The JWT authorizer on HTTP API is more flexible because it is not Cognito-specific.

Lambda Authorizers

Lambda authorizers execute a Lambda function to make authorization decisions. The function receives the request (or a token) and returns an IAM policy document specifying which API methods the caller is allowed to invoke.

There are two types:

Type	Input	Caching Key	Best For
Token-based (REST API only)	Authorization header value	The token itself	Simple bearer token validation (JWT, OAuth, custom tokens)
Request-based (REST API, HTTP API)	Full request context (headers, query strings, path, stage variables)	Configurable (any combination of identity sources)	Complex authorization logic using multiple request attributes

Lambda authorizer results can be cached for up to 3600 seconds (1 hour). When caching is enabled, API Gateway caches the returned IAM policy and applies it to subsequent requests with the same caching key, without re-invoking the Lambda function. This dramatically reduces authorizer invocation costs and latency for APIs with repeat callers.

The caching behavior has a critical implication: the IAM policy returned by the authorizer applies to all methods and resources matching the policy's resource ARN. If your authorizer returns a policy that allows arn:aws:execute-api:*:*:*/*/*, a cached policy from one endpoint authorizes access to all endpoints for the duration of the TTL. Design your authorizer policies carefully; scope them to specific methods and resources to avoid unintended access.

The cold start trade-off: Lambda authorizers introduce a cold start risk on the first request after cache expiration. For latency-sensitive APIs, use Provisioned Concurrency on the authorizer function, keep the cache TTL long enough that cold starts are rare, or consider HTTP API's native JWT authorizer which has no cold start at all.

Mutual TLS (mTLS)

Mutual TLS requires the client to present a valid X.509 certificate during the TLS handshake, in addition to the server presenting its certificate. API Gateway validates the client certificate against a truststore (a CA bundle stored in S3).

Best for: B2B integrations, financial services APIs, healthcare APIs, and any scenario where strong client identity is required at the transport layer. mTLS is supported on REST API and HTTP API with custom domain names.

Operational consideration: mTLS certificate management is non-trivial. You need to manage the certificate authority, issue client certificates, handle revocation (API Gateway supports CRLs stored in the same S3 truststore), and rotate certificates before expiration. Plan for this operational overhead before choosing mTLS.

API Keys and Usage Plans

API keys (REST API only) serve as identification tokens for throttling and metering. API keys identify which client is making the request, and usage plans associate API keys with throttle limits and quota allocations.

Concept	Purpose	Example
API key	Identifies a client	`x-api-key: abc123def456`
Usage plan	Defines throttle and quota limits	100 RPS burst, 50 RPS steady-state, 10,000 requests/day
Association	Links API keys to usage plans and plans to stages	Client A gets the "Premium" plan with 500 RPS; Client B gets the "Free" plan with 10 RPS

Usage plans support per-client throttle rates (requests per second), burst limits, and daily/weekly/monthly quotas. This is how you implement tiered API access (free, basic, premium) without custom throttling logic.

Important: API keys are sent in plaintext in the x-api-key header. They are trivially easy to share, steal, or leak. Never use API keys as the sole authentication mechanism. Combine them with IAM, Cognito, or Lambda authorizers for actual security, and use API keys purely for metering and throttling.

Composing Authorization

Authorization mechanisms compose in a specific evaluation order:

Resource policy (if configured): evaluated first, can deny before anything else runs
mTLS (if configured): client certificate validated during TLS handshake
Method authorizer (IAM, Lambda, Cognito, or JWT): evaluated per-method
API key (if required): validated after the authorizer succeeds

A request must pass all configured mechanisms. This layered approach enables defense-in-depth: resource policies restrict network-level access, mTLS verifies client identity at the transport layer, method authorizers enforce application-level permissions, and API keys track and throttle per-client usage.

flowchart LR
  A[Incoming
Request] --> B{Resource
Policy}
  B -->|Deny| X[403 Denied]
  B -->|Allow| C{mTLS
Validation}
  C -->|Fail| X
  C -->|Pass| D{Method
Authorizer}
  D -->|Fail| X2[401 / 403]
  D -->|Pass| E{API Key
Validation}
  E -->|Fail| X3[403 Forbidden]
  E -->|Pass| F[Request
Proceeds]

Authorization evaluation order

Throttling and Quotas

API Gateway's throttling model is layered, and the layers interact in ways that bite you in production. Understanding the hierarchy saves you from throttling-related outages.

Account-Level Limits

Every AWS account has a default API Gateway throttle limit that applies across all APIs in a region:

Limit	Default Value	Adjustable
Steady-state request rate	10,000 requests per second	Yes (via service quota increase request)
Burst limit	5,000 requests	Yes (via service quota increase request)

The burst limit uses the token bucket algorithm. API Gateway maintains a bucket of tokens that refills at the steady-state rate. Each request consumes one token. When the bucket is full (5,000 tokens by default), it can absorb a burst of 5,000 concurrent requests. Sustained traffic above 10,000 RPS depletes the bucket and triggers 429 responses.

These limits are shared across all REST APIs, HTTP APIs, and WebSocket APIs in the account and region. Read that again. If you have a high-traffic public API and a low-traffic internal API in the same account, a spike on the public API can throttle the internal API. I have seen this happen. Use separate accounts for APIs with different criticality levels.

Stage-Level and Method-Level Throttling

REST API supports granular throttling at multiple levels:

Level	Scope	Configuration	Available On
Account	All APIs in the account/region	Service quotas (default 10,000 RPS)	All API types
Usage plan	APIs and stages associated with the plan	Rate and burst per plan	REST API only
Stage	All methods in a stage	Stage settings (overrides account limit downward)	REST API only
Route	Individual route	Route-level throttling	REST API only
Per-key	Individual API key within a usage plan	Key-level rate and burst	REST API only

The effective throttle limit for any given request is the minimum of all applicable limits. If the account limit is 10,000 RPS, the stage limit is 5,000 RPS, and the method limit is 1,000 RPS, the effective limit for that method is 1,000 RPS.

HTTP API does not support stage-level, route-level, or per-key throttling. The only throttle is the account-level limit. This is one of the significant operational limitations of HTTP API for multi-tenant or rate-sensitive use cases.

Throttling Cascades

The most dangerous throttling scenario is a cascade. One API or client consumes the account-level throttle budget, and suddenly unrelated APIs in the same account start returning 429s. I have watched this pattern take down internal tooling APIs during traffic spikes on production-facing APIs, twice at different companies.

Mitigations:

Account isolation. Run production APIs in a dedicated account, separate from development, staging, and internal APIs. This is the single most effective mitigation.
Method-level throttling. Set explicit throttle limits on high-traffic methods so they cannot consume the entire account budget.
Usage plans with per-key limits. Assign per-client throttle limits so a single client cannot monopolize capacity.
Request service quota increases proactively. If your production traffic is approaching 10,000 RPS, request an increase before you hit the limit. AWS typically grants increases to 50,000-100,000 RPS for well-architected APIs.
Client-side exponential backoff with jitter. Ensure all clients implement backoff to prevent retry storms from amplifying throttling events.

Caching

API Gateway caching (REST API only) stores integration responses and serves them for subsequent identical requests, reducing backend load and response latency.

Cache Configuration

Parameter	Options	Default	Notes
Cache capacity	0.5 GB, 1.6 GB, 6.1 GB, 13.5 GB, 28.4 GB, 58.2 GB, 118 GB, 237 GB	None (disabled)	Larger caches cost more per hour; provisioned per stage
Default TTL	0 - 3600 seconds	300 seconds (5 minutes)	0 effectively disables caching
Per-method TTL override	Yes	Inherits stage TTL	Allows different TTLs for different endpoints
Per-key cache invalidation	Enabled/disabled	Disabled	Allows clients to bypass cache with `Cache-Control: max-age=0`
Encryption	Enabled/disabled	Disabled	Encrypts cached data at rest

When caching is enabled, API Gateway checks the cache before invoking the backend integration. A cache hit returns the cached response immediately, bypassing Lambda invocation, integration calls, and backend latency. The cache is provisioned per stage, not per method, so all methods in a stage share the same cache capacity.

Cache Pricing

Cache pricing is hourly, independent of request volume:

Cache Size	Hourly Cost (us-east-1)	Monthly Cost
0.5 GB	$0.020	~$14.40
1.6 GB	$0.038	~$27.36
6.1 GB	$0.200	~$144.00
13.5 GB	$0.250	~$180.00
28.4 GB	$0.500	~$360.00
58.2 GB	$1.000	~$720.00
118 GB	$1.900	~$1,368.00
237 GB	$3.800	~$2,736.00

The 0.5 GB cache at ~$14/month is a reasonable starting point for most APIs. Profile your cache hit ratio and response sizes before scaling up. A cache that is too small has frequent evictions and low hit ratios, which means you are paying for caching without the benefit.

Cache Key Design

The cache key determines which requests share a cached response. By default, the cache key includes the HTTP method, resource path, and query string parameters. You can add additional cache key components:

Specific query string parameters. Include only the parameters that affect the response (exclude pagination tokens, request IDs, timestamps).
HTTP headers. Include headers like Accept-Language when the response varies by header value.
Stage variables. Include stage-specific values when the response varies by stage.

The cardinal rule of cache key design: include the minimum necessary to produce a correct response. Every additional parameter kills your hit ratio. Include the Authorization header in the cache key and you have a per-user cache. Your hit ratio drops to near zero and you are paying for a cache that does almost nothing.

Cache Invalidation

There are two approaches to invalidating the API Gateway cache:

Full flush. From the console or API, flush the entire stage cache. This removes all cached responses and forces all subsequent requests to hit the backend until the cache is repopulated. Use this during deployments or when cached data is known to be stale.
Client-initiated per-key invalidation. When per-key cache invalidation is enabled, clients can send Cache-Control: max-age=0 in their request header to bypass the cache for that specific request and refresh the cached entry. This requires the execute-api:InvalidateCache IAM permission.

Important: client-initiated invalidation is a security consideration. If any caller can invalidate your cache, a malicious client could send Cache-Control: max-age=0 on every request, effectively disabling caching and increasing your backend load. Restrict the InvalidateCache permission to trusted callers only, or disable per-key invalidation entirely and manage cache freshness through TTL.

API Gateway Cache vs. CloudFront Cache

Aspect	API Gateway Cache	CloudFront Cache
Availability	REST API only	All API types (when fronted by CloudFront)
Location	Regional (single cache in API region)	Global (600+ edge locations)
Cost model	Hourly per provisioned capacity	Per-request + data transfer
Cache key flexibility	Method-level, header/query string keys	Full cache policies with extensive key control
Invalidation	Client header or full flush	Path-based invalidation, wildcard support
Best for	Backend offloading for REST APIs	Global latency reduction + backend offloading for any API type

My recommendation: if you are fronting your API with CloudFront (which you should be for any internet-facing API), prefer CloudFront caching. It is more flexible, globally distributed, and works with HTTP API. API Gateway caching makes sense when you are using REST API without CloudFront and need a simple, managed cache with no additional infrastructure.

Request/Response Transformation

Mapping Templates (VTL)

REST API uses Apache Velocity Template Language (VTL) for request and response transformation. Mapping templates execute within API Gateway itself, requiring no Lambda function, no compute cost, and no cold start.

VTL templates have access to the full request context:

Variable	Description
`$input.body`	The raw request body
`$input.json('$.field')`	JSONPath extraction from the request body
`$input.params('name')`	Path, query string, or header parameter by name
`$context.requestId`	Unique request identifier
`$context.identity.sourceIp`	Client IP address
`$context.authorizer.claims`	Claims from the Cognito/JWT authorizer
`$stageVariables.name`	Stage variable value
`$util.escapeJavaScript()`	Utility for escaping strings
`$util.urlEncode()`	Utility for URL encoding
`$util.base64Encode()`	Utility for Base64 encoding

Common use cases for VTL templates:

Transforming a REST request into an AWS service API call format (e.g., converting a JSON body into an SQS SendMessage request or a DynamoDB PutItem request).
Renaming or restructuring fields between the client's expected format and the backend's format.
Extracting values from headers, query strings, or path parameters and placing them in the request body.
Filtering sensitive fields from the response before returning it to the client.
Generating error response bodies from integration error outputs.

Here is the reality of VTL: powerful in theory, miserable in practice. No local development environment. Errors are opaque; you get CloudWatch logs of the template output if you are lucky. Complex templates become unreadable within weeks. I limit VTL to simple, well-defined transformations (AWS service integrations, straightforward field mapping) and reach for Lambda the moment I need conditional logic, real error handling, or anything I would call business logic.

Request Validation

REST API can validate incoming requests against JSON Schema models before the request reaches your backend. Validation can check:

Request body. Validates the JSON body against a defined JSON Schema model (required properties, data types, string patterns, numeric ranges, array constraints).
Request parameters. Validates that required query string parameters and headers are present.

When validation fails, API Gateway returns a 400 Bad Request response without invoking the integration. This saves Lambda execution costs and provides consistent error responses for malformed requests.

My recommendation: always enable request validation for public-facing REST APIs. It catches malformed requests at the API Gateway layer (free, sub-millisecond) rather than in your Lambda function (billable, slower). For internal APIs where you control both sides, request validation is less critical but still good hygiene.

HTTP API Parameter Mapping

HTTP API does not support VTL templates, but it does support parameter mapping for simple transformations:

Appending, overwriting, or removing request and response headers
Appending, overwriting, or removing query string parameters
Overwriting the request path

This covers basic transformation needs (adding a custom header, injecting a stage variable into a query parameter) but does not support body transformation. If you need body transformation with HTTP API, handle it in the Lambda function.

WebSocket APIs

WebSocket APIs enable real-time, bidirectional communication between clients and backend services. The architecture is fundamentally different from REST and HTTP APIs.

Connection Management

WebSocket API manages persistent connections using a connection lifecycle with three special routes:

Route	Trigger	Purpose	Integration Required
$connect	Client initiates WebSocket handshake	Authentication, connection initialization, store connection ID in DynamoDB	Yes (Lambda, HTTP, or AWS service)
$disconnect	Client or server closes connection, or connection times out	Clean up connection state, remove connection ID from storage	Yes (best-effort delivery)
$default	Message does not match any custom route	Catch-all for unrecognized message types	Yes
Custom routes	Message matches a route selection expression	Route-specific message handling	Yes

The connection ID is a unique identifier assigned by API Gateway when a client connects. Your backend must store this connection ID (typically in DynamoDB) to send messages back to specific clients. API Gateway provides a callback URL that your backend POSTs to in order to send messages to connected clients:

POST https://{api-id}.execute-api.{region}.amazonaws.com/{stage}/@connections/{connectionId}

You can also use GET to retrieve connection status and DELETE to force-disconnect a client.

Route Selection

Route selection determines which integration handles an incoming WebSocket message. You configure a route selection expression (typically $request.body.action) and define routes that match specific values.

For example, with the route selection expression $request.body.action:

Client Message	Matched Route	Integration
`{"action": "sendMessage", "data": "hello"}`	`sendMessage`	SendMessageFunction
`{"action": "subscribe", "channel": "updates"}`	`subscribe`	SubscribeFunction
`{"action": "unknown"}`	`$default`	DefaultFunction
`{"noAction": true}`	`$default`	DefaultFunction

Scaling Characteristics

Limit	Default Value	Notes
Concurrent connections	500 (soft limit)	Increasable to hundreds of thousands via service quota request
Idle connection timeout	10 minutes	Configurable up to 2 hours; implement ping/pong to keep alive
Maximum connection duration	2 hours	Hard limit; client must implement automatic reconnection
Message frame size	32 KB (default)	Maximum 128 KB
Callback message rate	500 messages/second per connection	Backend-to-client message rate
New connections per second	500 per second (soft limit)	Increasable via service quota request

The 2-hour maximum connection duration is a hard limit that cannot be increased. Your client must implement automatic reconnection logic. The idle connection timeout (10 minutes by default) disconnects clients that are not sending or receiving messages, so implement periodic ping/pong frames to keep connections alive if your use case involves long periods of silence.

For high-scale WebSocket applications (tens of thousands of concurrent connections), DynamoDB is the standard choice for connection state management. Each connection ID is stored as a DynamoDB item with a TTL set to the expected connection expiration. Your backend queries the table to find connection IDs for broadcasting messages. With DynamoDB on-demand capacity, this scales seamlessly without capacity planning.

WebSocket API Architectural Considerations

WebSocket API is a connection management and message routing layer. All application logic (message broadcasting, presence tracking, connection state management, pub/sub) lives in your backend. API Gateway functions as a WebSocket-aware reverse proxy; it does not maintain a pub/sub system or message broker.

Broadcasting a message to all connected clients requires your backend to iterate through stored connection IDs and POST to each one via the callback URL. For large numbers of connections, this can take significant time and should be parallelized using Lambda concurrency or SQS fan-out.

Connection cleanup requires care. The $disconnect route is best-effort; if a client's network drops abruptly, $disconnect may be delayed or in rare cases may not fire. Design your system to handle stale connection IDs gracefully: catch GoneException (410) errors when posting to the callback URL, and remove the stale connection ID from your store.

Observability

CloudWatch Metrics

API Gateway publishes detailed metrics to CloudWatch. The most operationally important metrics:

Metric	REST API	HTTP API	WebSocket API	What It Tells You
Count	Yes	Yes	Yes	Total API requests/messages
4XXError	Yes	Yes	Yes	Client errors (bad requests, auth failures, throttling)
5XXError	Yes	Yes	Yes	Server errors (integration failures, timeouts)
Latency	Yes	Yes	Yes	End-to-end latency (client request to client response)
IntegrationLatency	Yes	Yes	Yes	Time spent in the backend integration only
CacheHitCount	Yes	No	No	Cache hits (REST API with caching enabled)
CacheMissCount	Yes	No	No	Cache misses
ConnectCount	No	No	Yes	New WebSocket connections
MessageCount	No	No	Yes	WebSocket messages processed
DisconnectCount	No	No	Yes	WebSocket disconnections

The difference between Latency and IntegrationLatency is the most important diagnostic signal. If Latency is 200ms and IntegrationLatency is 180ms, API Gateway's overhead is 20ms, which is normal. If the gap is 100ms+, investigate your authorizer execution time, VTL mapping template complexity, or request validation bottlenecks.

Access Logging

API Gateway supports access logging to CloudWatch Logs with configurable log formats using context variables. I configure access logs in JSON format for easy parsing with CloudWatch Insights, OpenSearch, or any log analytics tool.

Essential context variables for access logs:

Variable	Description
`$context.requestId`	Unique request identifier (essential for debugging)
`$context.identity.sourceIp`	Client IP address
`$context.httpMethod`	HTTP method
`$context.resourcePath`	Resource path
`$context.status`	Response status code
`$context.responseLatency`	Total response latency in milliseconds
`$context.integrationLatency`	Integration execution time in milliseconds
`$context.error.message`	Error message (if any)
`$context.authorizer.error`	Authorizer error (if any)
`$context.requestTime`	Request timestamp
`$context.protocol`	Request protocol

My recommendation: enable access logging for every API stage. The cost is modest (CloudWatch Logs ingestion + storage), and the operational value during incidents is immense. Include at minimum: request ID, source IP, method, path, status, latency, integration latency, and error message.

Execution Logging

Execution logging (REST API only) provides detailed request/response logs for every stage of the API Gateway pipeline: authorizer execution, VTL template evaluation, integration request/response bodies, mapping template output, and error details. This level of detail is essential for debugging but extremely verbose and expensive at scale.

Setting	Log Volume	Cost	Use Case
OFF	None	None	Production (default)
Errors only	Low	Low	Production (recommended minimum)
Full	Very high	High	Development and active debugging only

Enable full execution logging for development and staging environments. In production, set it to "errors only" or disable it entirely. Leaving full execution logging enabled in production generates enormous log volumes. At 10,000 RPS, you can easily generate terabytes of logs per month, with CloudWatch Logs costs exceeding the API Gateway cost itself.

X-Ray Tracing

REST API integrates with AWS X-Ray for distributed tracing. When enabled, API Gateway generates trace segments for each request, capturing timing data for authorization, integration execution, and response processing. Combined with X-Ray instrumentation in your Lambda functions and downstream services, this provides end-to-end visibility into request latency across your entire architecture.

X-Ray tracing adds approximately 1-2ms of overhead per request. For most APIs, this is acceptable. For ultra-low-latency APIs where every millisecond counts, measure the impact before enabling it in production.

HTTP API does not support X-Ray natively. If you need distributed tracing with HTTP API, instrument your Lambda functions with the X-Ray SDK directly. You will get traces for everything downstream of API Gateway, but the API Gateway processing stage itself will not appear in traces.

Cost Analysis

Per-Request Pricing

The per-request pricing model is straightforward but varies significantly between API types:

Tier (Monthly Requests)	REST API	HTTP API
First 333 million	$3.50 per million	$1.00 per million
Next 667 million	$2.80 per million	$0.90 per million
Next 19 billion	$2.38 per million	$0.80 per million
Over 20 billion	$1.51 per million	$0.72 per million

REST API pricing decreases at higher tiers, but even at the highest tier ($1.51/million), it is still more than double the cost of HTTP API at the same tier ($0.72/million).

WebSocket API Pricing

WebSocket API has two pricing components:

Component	Cost (us-east-1)
Messages (first 1 billion/month)	$1.00 per million messages (sent or received)
Messages (over 1 billion/month)	$0.80 per million messages
Connection minutes	$0.25 per million connection minutes

Messages are billed in 32 KB increments. A 65 KB message counts as 3 message units. Connection minutes accumulate for the duration each WebSocket connection is open.

For 10,000 concurrent connections maintained 24/7 with an average of 10 messages per minute each, the monthly cost is approximately:

Connection minutes: 10,000 x 60 x 24 x 30 = 432 million minutes = ~$108
Messages: 10,000 x 10 x 60 x 24 x 30 = 4.32 billion messages = ~$4,320
Total: ~$4,428/month

Data Transfer Costs

Data transfer out from API Gateway to the internet follows standard AWS data transfer pricing:

Tier	Cost per GB
First 10 TB/month	$0.09
Next 40 TB/month	$0.085
Next 100 TB/month	$0.07
Over 150 TB/month	$0.05

Data transfer within the same region (to Lambda, to VPC Link targets) is free. Data transfer from CloudFront to API Gateway is also free when both are in the same region, which is a significant cost advantage of placing CloudFront in front of your API.

Cache Pricing Impact

For a typical production REST API serving 50 million requests per month with a 60% cache hit ratio:

Component	Without Cache	With 0.5 GB Cache
API Gateway requests	50M x $3.50/M = $175.00	50M x $3.50/M = $175.00
Cache cost	$0	~$14.40/month
Lambda invocations (at $0.20/M)	50M = $10.00	20M (40% miss) = $4.00
Lambda duration (128MB, 100ms avg)	50M x $0.000002083 = $104.15	20M x $0.000002083 = $41.66
Total	~$289.15	~$235.06
Monthly savings	N/A	~$54.09

The cache pays for itself when the Lambda execution cost savings exceed the cache hourly cost. For compute-intensive backends (long execution times, large memory allocations), the savings are even more dramatic. For lightweight backends, the cache cost may exceed the savings; always do the math for your specific workload.

Total Cost Comparison: REST API vs. HTTP API

For a production API serving 100 million requests per month with 5 KB average response size:

Component	REST API	HTTP API	HTTP API + CloudFront
Request cost	$350.00	$100.00	$100.00
API Gateway cache (0.5 GB)	$14.40	N/A	N/A
CloudFront (if applicable)	N/A	N/A	~$10-20 (depends on cache hit ratio)
Data transfer (500 GB)	$45.00	$45.00	CloudFront pricing applies
Approximate total	~$409/month	~$145/month	~$130-150/month
Annual cost	~$4,908	~$1,740	~$1,560-1,800

Common Failure Modes and Operational Lessons

The 29-Second Timeout Limit

API Gateway imposes a hard 29-second timeout on all integration types. REST API, HTTP API, WebSocket API (for the integration invocation; the WebSocket connection itself persists longer). This limit cannot be configured. Cannot be increased. Cannot be worked around with a support ticket. If your backend fails to respond within 29 seconds, the client gets a 504 Gateway Timeout.

This is the single most impactful constraint in API Gateway. You cannot use it for synchronous long-running operations: report generation, large data exports, complex ML inference, batch processing. Anything that routinely takes more than 29 seconds is out.

Your Lambda function's 15-minute timeout is irrelevant here. If it has not returned a response to API Gateway in 29 seconds, API Gateway times out.

Workarounds:

Asynchronous pattern. Return a 202 Accepted immediately with a task ID. The client polls a status endpoint or receives a callback (webhook) when the long-running operation completes. Use Step Functions, SQS, or EventBridge for orchestration.
Step Functions sync integration. REST API can invoke Express Workflows synchronously via StartSyncExecution. Express Workflows can run for up to 5 minutes, but the API Gateway 29-second timeout still applies; the workflow must complete within that window.
WebSocket API. Use WebSocket for operations where the server needs to push results asynchronously. The integration timeout is still 29 seconds per message, but the client connection persists for up to 2 hours, allowing the backend to send results whenever they are ready.
Response streaming (HTTP API). HTTP API supports Lambda response streaming, where the function begins sending response chunks before the full response is ready. The 29-second timeout applies to the time-to-first-byte, but once streaming begins, the connection can remain open for up to 20 minutes.

Payload Size Limits

API Type	Request Payload	Response Payload
REST API	10 MB	10 MB
HTTP API	10 MB	10 MB
WebSocket API	128 KB (per frame)	128 KB (per frame)

For REST and HTTP APIs, the 10 MB limit is sufficient for most API payloads but insufficient for file uploads. For large file operations, use presigned S3 URLs. API Gateway returns a signed URL, and the client uploads directly to S3, bypassing API Gateway entirely. This pattern avoids the payload limit, reduces API Gateway costs, and takes advantage of S3's multipart upload capabilities for files up to 5 TB.

WebSocket's 128 KB frame limit requires chunking for larger messages. Design your WebSocket message protocol to support multi-frame messages if you need to transmit more than 128 KB.

Throttling Cascades

When API Gateway returns 429 Too Many Requests, aggressive client retries can create a vicious cycle. Retries add to request volume, which increases throttling, which generates more retries. This positive feedback loop sustains elevated error rates long after the original traffic spike has passed.

I have seen this play out the same way every time. Ten clients at 1,000 RPS each hit the 10,000 RPS account limit. All ten get 429s and retry immediately. Now you have 10,000+ retries on top of new organic requests. The actual request volume doubles to 20,000+ RPS, and the API stays throttled until clients back off or someone intervenes.

Mitigations:

Implement exponential backoff with jitter on the client side. This is a requirement for production API clients.
Set per-client throttle limits using usage plans (REST API) to prevent any single client from consuming the full budget.
Monitor 429 error rates and alert on sustained occurrences.
Request service quota increases proactively. Do not wait until you hit the limit in production.
Use account isolation to prevent cross-API throttling entirely.

Cold Starts with Lambda Integration

Cold starts hit API Gateway latency when a Lambda function has not been invoked recently or when concurrent invocations exceed warm capacity. The duration varies by runtime, and the range is wide:

Runtime	Typical Cold Start	With VPC
Python	100-300 ms	100-300 ms
Node.js	100-300 ms	100-300 ms
Java	500-3,000 ms	500-3,000 ms
Go	50-150 ms	50-150 ms
.NET	200-1,000 ms	200-1,000 ms

VPC cold starts were dramatically improved in 2019 when AWS introduced Hyperplane ENI. VPC-connected Lambda functions no longer incur the additional 10+ second cold start penalty that was common previously.

Cold starts on Lambda authorizers are particularly impactful because they delay every request, including subsequent requests to different backend functions. If your authorizer has a 2-second cold start, every request after cache expiration experiences that delay.

Mitigations:

Provisioned Concurrency. Pre-initializes execution environments. Eliminates cold starts entirely for provisioned instances. Costs approximately $0.015 per GB-hour. Use this for latency-sensitive APIs and for Lambda authorizers.
HTTP API JWT authorizer. Eliminates the authorizer cold start entirely by performing JWT validation within the API Gateway data plane. No Lambda function involved.
Optimize package size. Smaller deployment packages initialize faster. Remove unnecessary dependencies, use Lambda layers for shared code, and avoid including test files or documentation in the deployment package.
Keep functions warm. For low-traffic APIs, scheduled pings (CloudWatch Events every 5 minutes) keep at least one environment warm. This is a partial mitigation; it only keeps one environment warm and does not help with concurrency-driven cold starts.

Deployment Gotchas

REST API deployments are immutable snapshots of your API configuration. When you change resources, methods, or integrations, nothing goes live until you create a new deployment and associate it with a stage. This is a safety mechanism, but it trips people up constantly. I have watched teams spend hours debugging an API that "is not working" when the real problem was that they modified the configuration and forgot to deploy it.

HTTP API supports auto-deploy, which deploys changes immediately when you update the API configuration. This is simpler for development but removes the safety net of explicit deployments. For production HTTP APIs, I recommend disabling auto-deploy and managing deployments explicitly through CI/CD pipelines.

Stage variable misconfiguration is another common failure mode. Stage variables are resolved at request time, not deployment time. If a stage variable references a Lambda function alias that does not exist, every request to that stage fails with a 500 error. Validate stage variable references in your CI/CD pipeline before deployment.

Integration with Other AWS Services

Lambda

Lambda is the primary integration target for API Gateway. The combination is the backbone of serverless on AWS. Three integration patterns matter:

Synchronous invocation (default). API Gateway invokes Lambda synchronously and waits for the response. Subject to the 29-second timeout.
Asynchronous invocation. REST API can invoke Lambda asynchronously using the Event invocation type in a non-proxy custom integration. API Gateway returns 202 Accepted immediately without waiting for the Lambda result. Useful for fire-and-forget operations.
Response streaming (HTTP API). Lambda streams response chunks through API Gateway to the client. The first byte must arrive within 29 seconds, but streaming can continue for up to 20 minutes.

Application Load Balancer

ALB is a real alternative to API Gateway for HTTP-based APIs. The choice depends on your requirements, and I see teams pick wrong in both directions:

Capability	API Gateway (REST)	API Gateway (HTTP)	ALB
Request pricing model	$3.50/M requests	$1.00/M requests	~$0.008/LCU-hour (usage-based)
Fixed hourly cost	None	None	~$0.0225/hour (~$16.20/month)
Throttling	Built-in (multi-level)	Account-level only	None native (use WAF rate rules)
Caching	Built-in (REST only)	None	None (use CloudFront)
Request transformation	VTL templates	Parameter mapping	None
WebSocket	Dedicated API type	No	Native passthrough
Authentication	IAM, Cognito, Lambda authorizer	JWT, Lambda authorizer	Cognito, OIDC
Lambda targets	Yes	Yes	Yes
Container targets	Via VPC Link	Via VPC Link	Direct registration
Cost at 100M req/month	~$350 (REST) / ~$100 (HTTP)	~$100	~$30-50 (depends on LCU)

For high-volume APIs with container backends, ALB is often cheaper than API Gateway. For serverless Lambda backends with API management needs (throttling, caching, API keys, request validation), API Gateway is the better fit. For the common pattern of Lambda + simple auth + low cost, HTTP API is the sweet spot.

Step Functions

REST API can invoke Step Functions directly via AWS service integration, without a Lambda function:

Integration	Workflow Type	Behavior	Timeout Consideration
StartExecution	Standard Workflow	Returns execution ARN immediately (async)	No timeout concern: execution runs independently
StartSyncExecution	Express Workflow	Waits for workflow to complete and returns result	Must complete within API Gateway's 29-second timeout

The synchronous Express Workflow pattern is particularly powerful for APIs that need to orchestrate multiple steps (validation, data enrichment, conditional branching, parallel processing) without writing a monolithic Lambda function. Express Workflows can execute for up to 5 minutes, but the API Gateway 29-second timeout is the binding constraint.

CloudFront

Placing CloudFront in front of API Gateway provides capabilities that API Gateway lacks natively:

Capability	What CloudFront Adds
Edge caching	Cache API responses at 600+ edge locations; especially valuable for HTTP API which has no built-in cache
DDoS protection	AWS Shield Standard (automatic) and Shield Advanced
WAF at the edge	Evaluate WAF rules before requests reach API Gateway; essential for HTTP API which has no native WAF
Geographic restrictions	Block or allow traffic by country
Custom error pages	Return branded error responses for 4XX/5XX errors
Edge compute	CloudFront Functions and Lambda@Edge for request/response manipulation
TLS optimization	TLS termination at the edge reduces latency for global clients

This pattern is especially valuable for HTTP APIs. By placing CloudFront in front of an HTTP API, you gain caching, WAF, and global edge optimization while retaining HTTP API's lower cost and latency. The data transfer from CloudFront to a same-region API Gateway endpoint is free, so the CloudFront cost is primarily per-request and cache storage.

WAF

AWS WAF integrates directly with REST API (not HTTP API or WebSocket API). WAF rules are evaluated before any API Gateway processing, providing:

Protection against common web exploits (SQL injection, XSS) via AWS Managed Rules.
Rate-based rules for DDoS mitigation and abuse prevention.
IP-based allow/block lists.
Bot control (managed rules).
Custom rules based on headers, query strings, body content, and geographic origin.

For HTTP APIs that need WAF protection, place CloudFront in front of the API and attach WAF to the CloudFront distribution. This provides the same protection with the added benefit of WAF evaluation at the edge (closest to the attacker) rather than at the regional API Gateway endpoint.

Key Architectural Patterns

After years of building and operating API Gateway deployments in production, these are the patterns I come back to every time:

Default to HTTP API unless you need REST API features. 71% cost savings. Lower latency. HTTP API is the correct choice for most Lambda-backed APIs. Only choose REST API when you genuinely need caching, VTL transformation, WAF integration, usage plans, request validation, or direct AWS service integrations. Verify that you cannot achieve the same result through other means first.
Use direct AWS service integrations for simple operations. Putting an SQS message on a queue, starting a Step Functions execution, or writing a DynamoDB item does not need a Lambda function. REST API's AWS service integrations eliminate a component, reduce latency, eliminate cold starts, and lower cost.
Design for the 29-second timeout from day one. Any operation that might exceed 29 seconds needs an asynchronous pattern. Retrofitting async onto a synchronous API is painful. Designing for it upfront is straightforward.
Isolate production APIs in dedicated AWS accounts. The shared account-level throttle limit (10,000 RPS default) is the most common source of cross-API interference. Account isolation is the cleanest and most reliable solution.
Place CloudFront in front of HTTP APIs for caching and WAF. This compensates for HTTP API's lack of built-in caching and WAF while preserving the cost and latency benefits. CloudFront-to-API Gateway data transfer is free in the same region.
Enable request validation on REST APIs. Rejecting malformed requests at the API Gateway layer is faster and cheaper than rejecting them in Lambda. It also produces consistent error responses without custom code.
Use Provisioned Concurrency for latency-sensitive Lambda authorizers. Cold starts on authorizers are particularly impactful because they delay every request. Provisioned Concurrency or HTTP API's native JWT authorizer eliminates this variability.
Set per-method throttle limits on REST APIs. Protect your backend from traffic spikes on high-volume endpoints while leaving low-traffic endpoints unconstrained. Without method-level limits, a spike on any endpoint can consume the entire stage or account budget.
Monitor the Latency minus IntegrationLatency gap. This gap tells you how much time API Gateway itself is adding. If it grows, investigate authorizer performance, VTL template complexity, or request validation bottlenecks.
Never use API keys as authentication. They are usage tracking and metering tokens, not security tokens. Always pair them with a proper authentication mechanism.
Plan for the 10 MB payload limit. Use presigned S3 URLs for file uploads and downloads rather than proxying large payloads through API Gateway. This avoids the limit, reduces API Gateway costs, and leverages S3's multipart upload for large files.

Additional Resources

Amazon API Gateway Developer Guide: comprehensive reference covering all three API types, integrations, authorization, and deployment
API Gateway REST API vs. HTTP API feature comparison: AWS documentation with a complete matrix of every feature difference between the two types
API Gateway quotas and important notes: account-level and per-API limits with instructions for requesting increases
API Gateway pricing page: current per-request pricing, cache pricing, and data transfer costs for all three API types
API Gateway mapping template reference: VTL syntax, context variables, and utility functions for request/response transformation
Working with Lambda authorizers: configuration patterns for token-based and request-based Lambda authorizers, including caching behavior
Working with WebSocket APIs: connection management, route selection, and the @connections callback API
VPC Link configuration for REST and HTTP APIs: setup instructions for private integrations including NLB, ALB, and Cloud Map targets
AWS Well-Architected Serverless Applications Lens: architectural best practices for serverless workloads including API Gateway patterns
Serverless Land patterns library: community-contributed integration patterns with infrastructure as code templates for common API Gateway architectures

Let's Build Something!

I help teams ship cloud infrastructure that actually works at scale. Whether you're modernizing a legacy platform, designing a multi-region architecture from scratch, or figuring out how AI fits into your engineering workflow, I've seen your problem before. Let me help.

Currently taking on select consulting engagements through Vantalect.

Get in Touch View Background LinkedIn