About the author: I'm Charles Sieg, a cloud architect and platform engineer who builds apps, services, and infrastructure for Fortune 1000 clients through Vantalect. If your organization is rethinking its software strategy in the age of AI-assisted engineering, let's talk.
Amazon API Gateway sits in front of most serverless and microservice architectures on AWS. Three distinct API types, a control plane versus data plane split, a layered throttling hierarchy, a caching layer, a rich integration model. Most teams deploying API Gateway never dig into these mechanics. I have spent years building and operating API Gateway-backed systems handling everything from low-traffic internal tools to production APIs processing tens of thousands of requests per second, and I learned most of the hard lessons the hard way.
This is the architecture reference I wish I had found when I was making foundational decisions about API design on AWS. It covers internals, trade-offs between API types, and the failure modes that only show up under real production load.
The Three API Gateway Types
API Gateway is really three distinct services sharing a name and a console. Picking the right one is the single most consequential decision you will make when adopting the service, and I still see teams get it wrong.
REST API was the original, launched in 2015. It was built as a full-featured API management platform with request/response transformation, model validation, API keys, usage plans, caching, canary deployments, and extensive CloudWatch integration. It is powerful, flexible, and relatively expensive.
HTTP API launched in late 2019 as a response to customer feedback that REST API was too complex and too expensive for the majority of use cases, particularly the common pattern of "receive HTTP request, invoke Lambda, return response." HTTP API strips out most of the advanced features and delivers a simpler, faster, cheaper product.
WebSocket API launched in late 2018 for real-time, bidirectional communication. It manages persistent WebSocket connections and routes messages to backend integrations based on message content.
| Dimension | REST API | HTTP API | WebSocket API |
|---|---|---|---|
| Release year | 2015 | 2019 | 2018 |
| Protocol | HTTP/1.1 | HTTP/1.1, HTTP/2 | WebSocket |
| Endpoint types | Edge-optimized, Regional, Private | Regional | Regional |
| Request pricing (us-east-1) | $3.50 per million | $1.00 per million | $1.00 per million (messages) + $0.25 per million (connection minutes) |
| Payload limit | 10 MB | 10 MB | 128 KB per frame |
| Timeout | 29 seconds | 29 seconds | 29 seconds (integration), 2 hours (idle connection) |
| API key management | Yes | No | No |
| Usage plans and throttling | Yes (per-key, per-stage, per-method) | No | No |
| Request/response transformation | Yes (VTL mapping templates) | No (parameter mapping only) | Yes (VTL mapping templates) |
| Caching | Yes (0.5 GB - 237 GB) | No | No |
| WAF integration | Yes | No | No |
| Resource policies | Yes | No | No |
| Mutual TLS | Yes | Yes | No |
| Custom domain names | Yes | Yes | Yes |
| Lambda authorizers | Request and Token types | Request type (v2 payload format) | Request type |
| Cognito authorizers | Yes (native) | Yes (via JWT authorizer) | No |
| Private integrations (VPC Link) | Yes (NLB-based) | Yes (ALB, NLB, Cloud Map) | No |
| AWS service integrations | Yes (direct) | No | No |
| Mock integrations | Yes | No | No |
| Request validation | Yes | No | No |
| X-Ray tracing | Yes | No | No |
The decision comes down to how much control you need over the request/response pipeline. REST API gives you full control at every stage: validate, transform, route, transform again, cache. HTTP API gives you almost none of that; it is a managed reverse proxy handling TLS termination, authorization, and routing. For most Lambda-backed APIs, HTTP API is the correct choice. Reach for REST API only when you need a specific capability that HTTP API lacks.
flowchart TD
A[New API on AWS] --> B{Need real-time
bidirectional
communication?}
B -->|Yes| WS[WebSocket API]
B -->|No| C{Need caching, VTL
transformation, WAF,
usage plans, or
request validation?}
C -->|Yes| REST[REST API]
C -->|No| D{Need JWT auth,
HTTP/2, or
lowest cost?}
D -->|Yes| HTTP[HTTP API]
D -->|No| HTTP2[HTTP API
default choice] Architecture Internals
Control Plane vs. Data Plane
API Gateway follows the standard AWS pattern of separating the control plane from the data plane. The distinction matters more than most people realize.
The control plane handles API configuration: creating APIs, defining resources and methods, deploying stages, configuring authorizers, and managing domain names. Control plane operations are API calls that propagate asynchronously to the data plane. When you deploy a new stage, the configuration must propagate to all data plane nodes before it is fully active. This propagation typically completes within seconds but can take up to a minute under heavy control plane load.
The data plane handles actual request processing: accepting incoming API calls, executing authorizers, routing to integrations, transforming requests and responses, enforcing throttling, and returning results. Once an API stage is deployed, the data plane keeps serving traffic even if the control plane goes down. You just cannot make configuration changes until the control plane recovers.
The operational implication: never depend on real-time control plane operations during incident response. If you need to disable an API method or tighten throttling during a production incident, those changes require control plane availability. I keep WAF rules and Lambda authorizer kill-switches as the first line of defense for exactly this reason.
How API Gateway Processes a Request
You need to understand this pipeline to debug latency issues and design efficient APIs. For REST APIs, the sequence is:
- TLS termination and HTTP parsing. API Gateway terminates TLS and parses the incoming HTTP request.
- CloudFront processing (edge-optimized only). For edge-optimized endpoints, the request passes through the CloudFront network to reach the API Gateway regional endpoint. This adds latency for nearby clients but reduces latency for geographically distributed clients.
- Resource policy evaluation. If a resource policy is attached, API Gateway evaluates it first. Deny results short-circuit the request.
- Method request validation. If request validation is configured, API Gateway validates the request body and parameters against the defined model. Invalid requests receive a 400 error before reaching the integration.
- Authorization. API Gateway evaluates the configured authorizer (IAM, Cognito, Lambda, or API key). Unauthorized requests receive a 401 or 403.
- Throttling. API Gateway checks the request against account-level, stage-level, and method-level throttle settings. If the request exceeds any limit, a 429 Too Many Requests response is returned.
- Cache lookup (REST API only). If caching is enabled, API Gateway checks the cache. A cache hit returns the cached response and skips the integration entirely.
- Request mapping (REST API only). VTL mapping templates transform the incoming request into the format expected by the backend integration.
- Integration execution. API Gateway forwards the request to the backend (Lambda, HTTP endpoint, AWS service, VPC Link target, or mock).
- Response mapping (REST API only). VTL mapping templates transform the integration response into the format returned to the client.
- Response return. The final response is sent back to the client.
flowchart TD A[TLS Termination & HTTP Parsing] --> B[CloudFront Processing
edge-optimized only] B --> C{Resource Policy} C -->|Deny| Z[403 Forbidden] C -->|Allow| D[Request Validation] D -->|Invalid| Z2[400 Bad Request] D -->|Valid| E{Authorization} E -->|Unauthorized| Z3[401 / 403] E -->|Authorized| F{Throttle Check} F -->|Over limit| Z4[429 Too Many Requests] F -->|Within limit| G{Cache Lookup} G -->|Hit| K[Return Cached Response] G -->|Miss| H[Request Mapping VTL] H --> I[Integration Execution] I --> J[Response Mapping VTL] J --> K2[Return Response]
The HTTP API pipeline is far simpler. It omits VTL transformation, request validation, caching, and resource policies. That stripped-down pipeline is why HTTP APIs consistently add 5-10ms less overhead than REST APIs.
Endpoint Types
REST APIs support three endpoint types, each with different networking and performance characteristics:
| Endpoint Type | Architecture | Best For | Latency Profile |
|---|---|---|---|
| Edge-optimized | Request routes through an AWS-managed CloudFront distribution to regional API Gateway | APIs with geographically distributed clients who do not need a custom CloudFront configuration | Lower latency for distant clients, higher for same-region clients due to the CloudFront hop |
| Regional | Request goes directly to regional API Gateway | APIs with same-region clients, or when you manage your own CloudFront distribution in front | Lowest latency for same-region clients, full control when paired with your own CloudFront |
| Private | Accessible only via interface VPC endpoints (AWS PrivateLink) | Internal APIs that must never be accessible from the internet | Same-region only, VPC-level network isolation |
I keep seeing teams use edge-optimized endpoints for APIs where the clients live in the same region as the API. A frontend hosted in us-east-1 calling its own backend API in us-east-1, routed through CloudFront for no reason. That extra hop adds latency to same-region calls. Use regional endpoints for this pattern.
For APIs that genuinely serve global traffic, deploy a regional endpoint and place your own CloudFront distribution in front of it. You get full control over CloudFront caching behavior, WAF rules at the edge, custom error pages, and edge compute functions. None of that is configurable when API Gateway manages the CloudFront distribution for edge-optimized endpoints.
Private endpoints are the correct choice for any API that should never be accessible from the public internet. They use interface VPC endpoints (powered by AWS PrivateLink), which means traffic never leaves the AWS network. Combine private endpoints with resource policies to restrict access to specific VPCs, accounts, or VPC endpoints for defense-in-depth.
HTTP APIs and WebSocket APIs are regional only, with no edge-optimized or private variant. If you need edge optimization for an HTTP API, place CloudFront in front of it. If you need private access for HTTP backends, use a private ALB or NLB with VPC Link.
How the CloudFront Integration Works
Edge-optimized REST APIs deploy a hidden, AWS-managed CloudFront distribution behind the scenes. You cannot see it in the CloudFront console. You cannot attach WAF web ACLs, configure cache behaviors, or add Lambda@Edge or CloudFront Functions. The distribution forwards all headers and query parameters to the API Gateway regional endpoint with a single cache behavior.
So what do you actually get? TLS termination and global request routing. No edge caching. The API Gateway cache (if enabled) operates at the regional layer, after the request has already traversed CloudFront. Distant clients get lower latency from CloudFront's optimized network, but API responses are never cached at the edge.
If you want edge caching of API responses (and for read-heavy APIs, you almost certainly do), you must deploy your own CloudFront distribution in front of a regional endpoint and configure cache behaviors with appropriate cache policies.
REST API vs. HTTP API: The Core Decision
Teams agonize over this decision. They should. The two API types overlap significantly in capability but diverge in ways that are expensive to reverse later.
When to Use REST API
Choose REST API when you need any of the following capabilities that HTTP API does not support:
- API Gateway caching. If you want API Gateway to cache integration responses and serve subsequent identical requests from cache without hitting the backend, you must use REST API.
- Request/response transformation. VTL mapping templates for transforming request and response payloads are REST API only. HTTP API supports only simple parameter mapping (headers, query strings, path parameters).
- Request validation. Validating request bodies and parameters against JSON Schema models before they reach the integration is REST API only.
- WAF integration. Attaching an AWS WAF web ACL directly to your API requires REST API. HTTP API has no native WAF integration.
- Usage plans and API keys. Metering and throttling per API key with usage plans is REST API only.
- AWS service integrations. Direct integration with AWS services (SQS, Step Functions, DynamoDB, Kinesis, SNS, S3, EventBridge) without a Lambda function in between is REST API only.
- Resource policies. Cross-account access control and IP-based access restrictions via resource policies are REST API only.
- X-Ray tracing. Built-in AWS X-Ray integration for distributed tracing is REST API only.
- Mock integrations. Returning static responses without any backend integration is REST API only.
- Canary deployments. Routing a percentage of stage traffic to a new deployment is REST API only.
- Private endpoints. VPC-only accessibility via interface VPC endpoints is REST API only.
When to Use HTTP API
Choose HTTP API when:
- Your primary use case is proxying requests to Lambda functions or HTTP backends.
- You want lower latency. HTTP API is measurably faster, typically adding 5-10ms less overhead than REST API.
- You want lower cost. HTTP API costs $1.00 per million requests versus $3.50 per million for REST API, a 71% savings.
- You need native JWT authorization. HTTP API has built-in JWT authorizer support that validates tokens from any OIDC-compliant provider without a custom Lambda authorizer.
- You need ALB or Cloud Map as a VPC Link target. HTTP API's VPC Link is more flexible, supporting ALB, NLB, and Cloud Map service discovery, whereas REST API's VPC Link only supports NLB.
- You want automatic deployments. HTTP API supports auto-deploy, which deploys changes to a stage immediately without a separate deployment step.
- You need HTTP/2 support. HTTP API supports HTTP/2, which REST API does not.
- You want simplified CORS configuration. HTTP API handles CORS at the API level with a few parameters rather than requiring manual OPTIONS methods and response header configuration.
Performance Comparison
In my production measurements, the overhead difference between REST API and HTTP API is consistent:
| Metric | REST API | HTTP API |
|---|---|---|
| API Gateway overhead (P50) | ~15-20 ms | ~5-10 ms |
| API Gateway overhead (P99) | ~50-80 ms | ~20-40 ms |
| Cold start impact | No difference | No difference |
| Maximum throughput | 10,000 RPS (default account limit) | 10,000 RPS (default account limit) |
These numbers are pure API Gateway overhead; they exclude integration execution time (Lambda duration, database queries, etc.). If your Lambda functions run 100ms+, the 10ms difference between API types is noise. But for latency-sensitive APIs where you are fighting for every millisecond, HTTP API's lower overhead matters.
Cost Comparison at Scale
The pricing difference is substantial and compounds quickly:
| Monthly Request Volume | REST API Cost | HTTP API Cost | Monthly Savings | Annual Savings |
|---|---|---|---|---|
| 1 million | $3.50 | $1.00 | $2.50 | $30 |
| 10 million | $35.00 | $10.00 | $25.00 | $300 |
| 100 million | $350.00 | $100.00 | $250.00 | $3,000 |
| 1 billion | $3,500.00 | $1,000.00 | $2,500.00 | $30,000 |
| 10 billion | $35,000.00 | $10,000.00 | $25,000.00 | $300,000 |
At high volume, the 71% cost reduction of HTTP API is large enough to justify rearchitecting around its limitations. If you are spending thousands per month on REST API and you are not using caching, VTL transformations, or WAF, you are overpaying. Full stop.
Default to HTTP API. Need WAF? Put CloudFront in front and attach WAF there. Need caching? CloudFront caching. Need request validation? Do it in Lambda. The 71% cost savings and lower latency justify these architectural adjustments for most workloads.
Integration Types
The integration type determines how API Gateway connects to backend services. This is where the service earns its keep as an API management layer.
Lambda Proxy Integration
Lambda proxy integration is the most common pattern and the one I recommend as the default for new APIs. API Gateway passes the entire request (headers, body, path parameters, query parameters, request context) to the Lambda function as a structured event, and the Lambda function returns a structured response that API Gateway passes back to the client.
The advantage is simplicity. The Lambda function has full control over the response: status code, headers, body. No VTL mapping template to maintain, no model to define, no transformation logic living outside your code.
For REST API, use the AWS_PROXY integration type. For HTTP API, Lambda proxy is the default and essentially the only Lambda integration mode, using the v2 payload format.
The v2 payload format (HTTP API) is simpler and more ergonomic than the v1 format (REST API). The v2 format provides a cleaner event structure with deduplicated fields and simpler response formatting. If you are starting a new project and considering REST API solely for its richer event format, that is not a valid reason; v2 is better designed.
HTTP Proxy Integration
HTTP proxy integration forwards requests to an HTTP endpoint (any publicly accessible URL or a private endpoint via VPC Link). API Gateway acts as a transparent proxy, passing the request through to the backend and returning the response to the client.
Common use cases:
- Fronting existing HTTP services with API Gateway's authentication, throttling, and monitoring capabilities.
- Migrating monolithic APIs incrementally: route some paths through API Gateway to new Lambda functions while proxying the rest to the legacy service.
- Connecting to third-party APIs through a controlled gateway with API key injection or request transformation.
For REST API, the HTTP_PROXY integration type passes requests through with minimal processing. The HTTP (non-proxy) integration type enables VTL mapping templates for request and response transformation before and after the backend call.
AWS Service Integrations
REST API's direct AWS service integration is its most underappreciated feature. API Gateway calls AWS services directly. No Lambda function in the middle. No cold start. No execution cost.
| AWS Service | Operation | Use Case | Why Skip Lambda |
|---|---|---|---|
| SQS | SendMessage, SendMessageBatch | Async message ingestion, webhook receivers | Eliminates cold starts, eliminates Lambda execution cost for simple queue writes |
| Step Functions | StartExecution, StartSyncExecution | Workflow orchestration | Sync Express Workflows return results directly; no glue Lambda needed |
| DynamoDB | PutItem, GetItem, Query, Scan, UpdateItem, DeleteItem | Simple CRUD APIs | Eliminates Lambda overhead for straightforward key-value operations |
| Kinesis | PutRecord, PutRecords | High-throughput event/stream ingestion | Lowest-latency path to a Kinesis stream |
| SNS | Publish | Fan-out notifications | Publish to topics without Lambda intermediary |
| S3 | GetObject, PutObject | File upload/download | Direct object serving or presigned URL pattern |
| EventBridge | PutEvents | Event-driven architectures | Publish events without Lambda intermediary |
The implementation uses VTL mapping templates to transform the incoming HTTP request into the AWS service's API request format (and vice versa for the response). Templates get gnarly fast for non-trivial transformations. But for simple operations like putting a message on an SQS queue, starting a Step Functions execution, or writing a DynamoDB item, the templates are straightforward and the cost savings from eliminating a Lambda function compound quickly at scale.
I use the SQS integration constantly for write-heavy APIs where I want to decouple request acceptance from processing. API Gateway writes the message to SQS and returns 200 immediately. A Lambda function polls the queue and processes asynchronously. Traffic spikes get absorbed by the queue instead of throttling clients or overwhelming downstream services.
My recommendation: use direct AWS service integrations for simple, stateless operations where the API Gateway request can be mapped to a single AWS API call. Use Lambda when you need conditional logic, multiple service calls, error handling beyond what VTL can express, or any non-trivial business logic.
VPC Link
VPC Link enables API Gateway to reach resources inside your VPC (ECS services, EC2 instances, internal load balancers, or any other private resource) without exposing them to the public internet.
| Feature | REST API VPC Link | HTTP API VPC Link |
|---|---|---|
| Target | Network Load Balancer only | ALB, NLB, or Cloud Map |
| Architecture | One VPC Link per NLB | One VPC Link per VPC (shared across routes) |
| Setup complexity | Requires NLB even if backend is behind ALB | Can target ALB directly |
| Cost | NLB hourly cost + NLCU; VPC Link itself is free | ALB/NLB cost if applicable; VPC Link itself is free |
| Service discovery | None | AWS Cloud Map integration |
HTTP API's VPC Link is more flexible and usually simpler to configure. If you need to reach an ALB-backed service, HTTP API lets you point directly at the ALB. With REST API, you would need to deploy an NLB in front of the ALB (or replace the ALB with an NLB), which adds approximately $16/month in NLB cost plus complexity.
The Cloud Map integration with HTTP API VPC Link is well-suited for microservice architectures on ECS. Your ECS services register with Cloud Map, and HTTP API routes directly to healthy instances via service discovery without any load balancer at all.
Mock Integration
Mock integration (REST API only) returns a response directly from API Gateway without calling any backend. Combined with VTL mapping templates, you can return dynamic mock responses based on request parameters. This is useful for:
- CORS preflight responses (OPTIONS methods): return the CORS headers without invoking a Lambda function.
- API stubs during development: return realistic responses while the backend is being built.
- Health check endpoints: return a 200 OK without any backend dependency.
- Maintenance mode responses: return a 503 with a maintenance message when the backend is intentionally offline.
Authentication and Authorization
API Gateway provides multiple authentication mechanisms. They target different client types and security requirements, and you can layer them for defense-in-depth.
IAM Authorization
IAM authorization uses AWS Signature Version 4 (SigV4) to authenticate requests. The caller must have valid AWS credentials and the appropriate IAM permissions to invoke the API method.
Best for: service-to-service communication within AWS, where both the caller and the API are AWS resources with IAM roles. IAM authorization is free (no additional cost beyond the API request itself), has no cold start, integrates with the full IAM policy model (including conditions on source IP, VPC endpoint, time of day, and tags), and provides the strongest authentication guarantee available on API Gateway.
Operational consideration: IAM authorization requires the caller to have AWS credentials and to sign the request with SigV4. This is straightforward for AWS SDKs and Lambda functions but impractical for browser-based JavaScript clients or mobile apps that should not have long-lived AWS credentials. For those scenarios, use Cognito to obtain temporary credentials and sign requests with the AWS Amplify library, or use JWT-based authorization.
Amazon Cognito Authorizers
Cognito authorizers (REST API) validate JWT tokens issued by a Cognito User Pool. The caller obtains a token from Cognito (via the hosted UI, the Cognito API, or a federated identity provider) and passes it in the Authorization header. API Gateway validates the token signature, expiration, and optionally the audience and scopes without calling Cognito for each request, since JWT validation is performed locally against the Cognito JWKS keys that API Gateway caches.
Best for: consumer-facing REST APIs where users authenticate through Cognito. The token validation is fast (sub-millisecond) because it is purely cryptographic verification performed locally.
For HTTP API, use the built-in JWT authorizer instead, which works with any OIDC-compliant provider including Cognito. The JWT authorizer on HTTP API is more flexible because it is not Cognito-specific.
Lambda Authorizers
Lambda authorizers execute a Lambda function to make authorization decisions. The function receives the request (or a token) and returns an IAM policy document specifying which API methods the caller is allowed to invoke.
There are two types:
| Type | Input | Caching Key | Best For |
|---|---|---|---|
| Token-based (REST API only) | Authorization header value | The token itself | Simple bearer token validation (JWT, OAuth, custom tokens) |
| Request-based (REST API, HTTP API) | Full request context (headers, query strings, path, stage variables) | Configurable (any combination of identity sources) | Complex authorization logic using multiple request attributes |
Lambda authorizer results can be cached for up to 3600 seconds (1 hour). When caching is enabled, API Gateway caches the returned IAM policy and applies it to subsequent requests with the same caching key, without re-invoking the Lambda function. This dramatically reduces authorizer invocation costs and latency for APIs with repeat callers.
The caching behavior has a critical implication: the IAM policy returned by the authorizer applies to all methods and resources matching the policy's resource ARN. If your authorizer returns a policy that allows arn:aws:execute-api:*:*:*/*/*, a cached policy from one endpoint authorizes access to all endpoints for the duration of the TTL. Design your authorizer policies carefully; scope them to specific methods and resources to avoid unintended access.
The cold start trade-off: Lambda authorizers introduce a cold start risk on the first request after cache expiration. For latency-sensitive APIs, use Provisioned Concurrency on the authorizer function, keep the cache TTL long enough that cold starts are rare, or consider HTTP API's native JWT authorizer which has no cold start at all.
Mutual TLS (mTLS)
Mutual TLS requires the client to present a valid X.509 certificate during the TLS handshake, in addition to the server presenting its certificate. API Gateway validates the client certificate against a truststore (a CA bundle stored in S3).
Best for: B2B integrations, financial services APIs, healthcare APIs, and any scenario where strong client identity is required at the transport layer. mTLS is supported on REST API and HTTP API with custom domain names.
Operational consideration: mTLS certificate management is non-trivial. You need to manage the certificate authority, issue client certificates, handle revocation (API Gateway supports CRLs stored in the same S3 truststore), and rotate certificates before expiration. Plan for this operational overhead before choosing mTLS.
API Keys and Usage Plans
API keys (REST API only) serve as identification tokens for throttling and metering. API keys identify which client is making the request, and usage plans associate API keys with throttle limits and quota allocations.
| Concept | Purpose | Example |
|---|---|---|
| API key | Identifies a client | x-api-key: abc123def456 |
| Usage plan | Defines throttle and quota limits | 100 RPS burst, 50 RPS steady-state, 10,000 requests/day |
| Association | Links API keys to usage plans and plans to stages | Client A gets the "Premium" plan with 500 RPS; Client B gets the "Free" plan with 10 RPS |
Usage plans support per-client throttle rates (requests per second), burst limits, and daily/weekly/monthly quotas. This is how you implement tiered API access (free, basic, premium) without custom throttling logic.
Important: API keys are sent in plaintext in the x-api-key header. They are trivially easy to share, steal, or leak. Never use API keys as the sole authentication mechanism. Combine them with IAM, Cognito, or Lambda authorizers for actual security, and use API keys purely for metering and throttling.
Composing Authorization
Authorization mechanisms compose in a specific evaluation order:
- Resource policy (if configured): evaluated first, can deny before anything else runs
- mTLS (if configured): client certificate validated during TLS handshake
- Method authorizer (IAM, Lambda, Cognito, or JWT): evaluated per-method
- API key (if required): validated after the authorizer succeeds
A request must pass all configured mechanisms. This layered approach enables defense-in-depth: resource policies restrict network-level access, mTLS verifies client identity at the transport layer, method authorizers enforce application-level permissions, and API keys track and throttle per-client usage.
flowchart LR A[Incoming
Request] --> B{Resource
Policy} B -->|Deny| X[403 Denied] B -->|Allow| C{mTLS
Validation} C -->|Fail| X C -->|Pass| D{Method
Authorizer} D -->|Fail| X2[401 / 403] D -->|Pass| E{API Key
Validation} E -->|Fail| X3[403 Forbidden] E -->|Pass| F[Request
Proceeds]
Throttling and Quotas
API Gateway's throttling model is layered, and the layers interact in ways that bite you in production. Understanding the hierarchy saves you from throttling-related outages.
Account-Level Limits
Every AWS account has a default API Gateway throttle limit that applies across all APIs in a region:
| Limit | Default Value | Adjustable |
|---|---|---|
| Steady-state request rate | 10,000 requests per second | Yes (via service quota increase request) |
| Burst limit | 5,000 requests | Yes (via service quota increase request) |
The burst limit uses the token bucket algorithm. API Gateway maintains a bucket of tokens that refills at the steady-state rate. Each request consumes one token. When the bucket is full (5,000 tokens by default), it can absorb a burst of 5,000 concurrent requests. Sustained traffic above 10,000 RPS depletes the bucket and triggers 429 responses.
These limits are shared across all REST APIs, HTTP APIs, and WebSocket APIs in the account and region. Read that again. If you have a high-traffic public API and a low-traffic internal API in the same account, a spike on the public API can throttle the internal API. I have seen this happen. Use separate accounts for APIs with different criticality levels.
Stage-Level and Method-Level Throttling
REST API supports granular throttling at multiple levels:
| Level | Scope | Configuration | Available On |
|---|---|---|---|
| Account | All APIs in the account/region | Service quotas (default 10,000 RPS) | All API types |
| Usage plan | APIs and stages associated with the plan | Rate and burst per plan | REST API only |
| Stage | All methods in a stage | Stage settings (overrides account limit downward) | REST API only |
| Route | Individual route | Route-level throttling | REST API only |
| Per-key | Individual API key within a usage plan | Key-level rate and burst | REST API only |
The effective throttle limit for any given request is the minimum of all applicable limits. If the account limit is 10,000 RPS, the stage limit is 5,000 RPS, and the method limit is 1,000 RPS, the effective limit for that method is 1,000 RPS.
HTTP API does not support stage-level, route-level, or per-key throttling. The only throttle is the account-level limit. This is one of the significant operational limitations of HTTP API for multi-tenant or rate-sensitive use cases.
Throttling Cascades
The most dangerous throttling scenario is a cascade. One API or client consumes the account-level throttle budget, and suddenly unrelated APIs in the same account start returning 429s. I have watched this pattern take down internal tooling APIs during traffic spikes on production-facing APIs, twice at different companies.
Mitigations:
- Account isolation. Run production APIs in a dedicated account, separate from development, staging, and internal APIs. This is the single most effective mitigation.
- Method-level throttling. Set explicit throttle limits on high-traffic methods so they cannot consume the entire account budget.
- Usage plans with per-key limits. Assign per-client throttle limits so a single client cannot monopolize capacity.
- Request service quota increases proactively. If your production traffic is approaching 10,000 RPS, request an increase before you hit the limit. AWS typically grants increases to 50,000-100,000 RPS for well-architected APIs.
- Client-side exponential backoff with jitter. Ensure all clients implement backoff to prevent retry storms from amplifying throttling events.
Caching
API Gateway caching (REST API only) stores integration responses and serves them for subsequent identical requests, reducing backend load and response latency.
Cache Configuration
| Parameter | Options | Default | Notes |
|---|---|---|---|
| Cache capacity | 0.5 GB, 1.6 GB, 6.1 GB, 13.5 GB, 28.4 GB, 58.2 GB, 118 GB, 237 GB | None (disabled) | Larger caches cost more per hour; provisioned per stage |
| Default TTL | 0 - 3600 seconds | 300 seconds (5 minutes) | 0 effectively disables caching |
| Per-method TTL override | Yes | Inherits stage TTL | Allows different TTLs for different endpoints |
| Per-key cache invalidation | Enabled/disabled | Disabled | Allows clients to bypass cache with Cache-Control: max-age=0 |
| Encryption | Enabled/disabled | Disabled | Encrypts cached data at rest |
When caching is enabled, API Gateway checks the cache before invoking the backend integration. A cache hit returns the cached response immediately, bypassing Lambda invocation, integration calls, and backend latency. The cache is provisioned per stage, not per method, so all methods in a stage share the same cache capacity.
Cache Pricing
Cache pricing is hourly, independent of request volume:
| Cache Size | Hourly Cost (us-east-1) | Monthly Cost |
|---|---|---|
| 0.5 GB | $0.020 | ~$14.40 |
| 1.6 GB | $0.038 | ~$27.36 |
| 6.1 GB | $0.200 | ~$144.00 |
| 13.5 GB | $0.250 | ~$180.00 |
| 28.4 GB | $0.500 | ~$360.00 |
| 58.2 GB | $1.000 | ~$720.00 |
| 118 GB | $1.900 | ~$1,368.00 |
| 237 GB | $3.800 | ~$2,736.00 |
The 0.5 GB cache at ~$14/month is a reasonable starting point for most APIs. Profile your cache hit ratio and response sizes before scaling up. A cache that is too small has frequent evictions and low hit ratios, which means you are paying for caching without the benefit.
Cache Key Design
The cache key determines which requests share a cached response. By default, the cache key includes the HTTP method, resource path, and query string parameters. You can add additional cache key components:
- Specific query string parameters. Include only the parameters that affect the response (exclude pagination tokens, request IDs, timestamps).
- HTTP headers. Include headers like
Accept-Languagewhen the response varies by header value. - Stage variables. Include stage-specific values when the response varies by stage.
The cardinal rule of cache key design: include the minimum necessary to produce a correct response. Every additional parameter kills your hit ratio. Include the Authorization header in the cache key and you have a per-user cache. Your hit ratio drops to near zero and you are paying for a cache that does almost nothing.
Cache Invalidation
There are two approaches to invalidating the API Gateway cache:
- Full flush. From the console or API, flush the entire stage cache. This removes all cached responses and forces all subsequent requests to hit the backend until the cache is repopulated. Use this during deployments or when cached data is known to be stale.
- Client-initiated per-key invalidation. When per-key cache invalidation is enabled, clients can send
Cache-Control: max-age=0in their request header to bypass the cache for that specific request and refresh the cached entry. This requires theexecute-api:InvalidateCacheIAM permission.
Important: client-initiated invalidation is a security consideration. If any caller can invalidate your cache, a malicious client could send Cache-Control: max-age=0 on every request, effectively disabling caching and increasing your backend load. Restrict the InvalidateCache permission to trusted callers only, or disable per-key invalidation entirely and manage cache freshness through TTL.
API Gateway Cache vs. CloudFront Cache
| Aspect | API Gateway Cache | CloudFront Cache |
|---|---|---|
| Availability | REST API only | All API types (when fronted by CloudFront) |
| Location | Regional (single cache in API region) | Global (600+ edge locations) |
| Cost model | Hourly per provisioned capacity | Per-request + data transfer |
| Cache key flexibility | Method-level, header/query string keys | Full cache policies with extensive key control |
| Invalidation | Client header or full flush | Path-based invalidation, wildcard support |
| Best for | Backend offloading for REST APIs | Global latency reduction + backend offloading for any API type |
My recommendation: if you are fronting your API with CloudFront (which you should be for any internet-facing API), prefer CloudFront caching. It is more flexible, globally distributed, and works with HTTP API. API Gateway caching makes sense when you are using REST API without CloudFront and need a simple, managed cache with no additional infrastructure.
Request/Response Transformation
Mapping Templates (VTL)
REST API uses Apache Velocity Template Language (VTL) for request and response transformation. Mapping templates execute within API Gateway itself, requiring no Lambda function, no compute cost, and no cold start.
VTL templates have access to the full request context:
| Variable | Description |
|---|---|
$input.body | The raw request body |
$input.json('$.field') | JSONPath extraction from the request body |
$input.params('name') | Path, query string, or header parameter by name |
$context.requestId | Unique request identifier |
$context.identity.sourceIp | Client IP address |
$context.authorizer.claims | Claims from the Cognito/JWT authorizer |
$stageVariables.name | Stage variable value |
$util.escapeJavaScript() | Utility for escaping strings |
$util.urlEncode() | Utility for URL encoding |
$util.base64Encode() | Utility for Base64 encoding |
Common use cases for VTL templates:
- Transforming a REST request into an AWS service API call format (e.g., converting a JSON body into an SQS
SendMessagerequest or a DynamoDBPutItemrequest). - Renaming or restructuring fields between the client's expected format and the backend's format.
- Extracting values from headers, query strings, or path parameters and placing them in the request body.
- Filtering sensitive fields from the response before returning it to the client.
- Generating error response bodies from integration error outputs.
Here is the reality of VTL: powerful in theory, miserable in practice. No local development environment. Errors are opaque; you get CloudWatch logs of the template output if you are lucky. Complex templates become unreadable within weeks. I limit VTL to simple, well-defined transformations (AWS service integrations, straightforward field mapping) and reach for Lambda the moment I need conditional logic, real error handling, or anything I would call business logic.
Request Validation
REST API can validate incoming requests against JSON Schema models before the request reaches your backend. Validation can check:
- Request body. Validates the JSON body against a defined JSON Schema model (required properties, data types, string patterns, numeric ranges, array constraints).
- Request parameters. Validates that required query string parameters and headers are present.
When validation fails, API Gateway returns a 400 Bad Request response without invoking the integration. This saves Lambda execution costs and provides consistent error responses for malformed requests.
My recommendation: always enable request validation for public-facing REST APIs. It catches malformed requests at the API Gateway layer (free, sub-millisecond) rather than in your Lambda function (billable, slower). For internal APIs where you control both sides, request validation is less critical but still good hygiene.
HTTP API Parameter Mapping
HTTP API does not support VTL templates, but it does support parameter mapping for simple transformations:
- Appending, overwriting, or removing request and response headers
- Appending, overwriting, or removing query string parameters
- Overwriting the request path
This covers basic transformation needs (adding a custom header, injecting a stage variable into a query parameter) but does not support body transformation. If you need body transformation with HTTP API, handle it in the Lambda function.
WebSocket APIs
WebSocket APIs enable real-time, bidirectional communication between clients and backend services. The architecture is fundamentally different from REST and HTTP APIs.
Connection Management
WebSocket API manages persistent connections using a connection lifecycle with three special routes:
| Route | Trigger | Purpose | Integration Required |
|---|---|---|---|
| $connect | Client initiates WebSocket handshake | Authentication, connection initialization, store connection ID in DynamoDB | Yes (Lambda, HTTP, or AWS service) |
| $disconnect | Client or server closes connection, or connection times out | Clean up connection state, remove connection ID from storage | Yes (best-effort delivery) |
| $default | Message does not match any custom route | Catch-all for unrecognized message types | Yes |
| Custom routes | Message matches a route selection expression | Route-specific message handling | Yes |
The connection ID is a unique identifier assigned by API Gateway when a client connects. Your backend must store this connection ID (typically in DynamoDB) to send messages back to specific clients. API Gateway provides a callback URL that your backend POSTs to in order to send messages to connected clients:
POST https://{api-id}.execute-api.{region}.amazonaws.com/{stage}/@connections/{connectionId}
You can also use GET to retrieve connection status and DELETE to force-disconnect a client.
Route Selection
Route selection determines which integration handles an incoming WebSocket message. You configure a route selection expression (typically $request.body.action) and define routes that match specific values.
For example, with the route selection expression $request.body.action:
| Client Message | Matched Route | Integration |
|---|---|---|
{"action": "sendMessage", "data": "hello"} | sendMessage | SendMessageFunction |
{"action": "subscribe", "channel": "updates"} | subscribe | SubscribeFunction |
{"action": "unknown"} | $default | DefaultFunction |
{"noAction": true} | $default | DefaultFunction |
Scaling Characteristics
| Limit | Default Value | Notes |
|---|---|---|
| Concurrent connections | 500 (soft limit) | Increasable to hundreds of thousands via service quota request |
| Idle connection timeout | 10 minutes | Configurable up to 2 hours; implement ping/pong to keep alive |
| Maximum connection duration | 2 hours | Hard limit; client must implement automatic reconnection |
| Message frame size | 32 KB (default) | Maximum 128 KB |
| Callback message rate | 500 messages/second per connection | Backend-to-client message rate |
| New connections per second | 500 per second (soft limit) | Increasable via service quota request |
The 2-hour maximum connection duration is a hard limit that cannot be increased. Your client must implement automatic reconnection logic. The idle connection timeout (10 minutes by default) disconnects clients that are not sending or receiving messages, so implement periodic ping/pong frames to keep connections alive if your use case involves long periods of silence.
For high-scale WebSocket applications (tens of thousands of concurrent connections), DynamoDB is the standard choice for connection state management. Each connection ID is stored as a DynamoDB item with a TTL set to the expected connection expiration. Your backend queries the table to find connection IDs for broadcasting messages. With DynamoDB on-demand capacity, this scales seamlessly without capacity planning.
WebSocket API Architectural Considerations
WebSocket API is a connection management and message routing layer. All application logic (message broadcasting, presence tracking, connection state management, pub/sub) lives in your backend. API Gateway functions as a WebSocket-aware reverse proxy; it does not maintain a pub/sub system or message broker.
Broadcasting a message to all connected clients requires your backend to iterate through stored connection IDs and POST to each one via the callback URL. For large numbers of connections, this can take significant time and should be parallelized using Lambda concurrency or SQS fan-out.
Connection cleanup requires care. The $disconnect route is best-effort; if a client's network drops abruptly, $disconnect may be delayed or in rare cases may not fire. Design your system to handle stale connection IDs gracefully: catch GoneException (410) errors when posting to the callback URL, and remove the stale connection ID from your store.
Observability
CloudWatch Metrics
API Gateway publishes detailed metrics to CloudWatch. The most operationally important metrics:
| Metric | REST API | HTTP API | WebSocket API | What It Tells You |
|---|---|---|---|---|
| Count | Yes | Yes | Yes | Total API requests/messages |
| 4XXError | Yes | Yes | Yes | Client errors (bad requests, auth failures, throttling) |
| 5XXError | Yes | Yes | Yes | Server errors (integration failures, timeouts) |
| Latency | Yes | Yes | Yes | End-to-end latency (client request to client response) |
| IntegrationLatency | Yes | Yes | Yes | Time spent in the backend integration only |
| CacheHitCount | Yes | No | No | Cache hits (REST API with caching enabled) |
| CacheMissCount | Yes | No | No | Cache misses |
| ConnectCount | No | No | Yes | New WebSocket connections |
| MessageCount | No | No | Yes | WebSocket messages processed |
| DisconnectCount | No | No | Yes | WebSocket disconnections |
The difference between Latency and IntegrationLatency is the most important diagnostic signal. If Latency is 200ms and IntegrationLatency is 180ms, API Gateway's overhead is 20ms, which is normal. If the gap is 100ms+, investigate your authorizer execution time, VTL mapping template complexity, or request validation bottlenecks.
Access Logging
API Gateway supports access logging to CloudWatch Logs with configurable log formats using context variables. I configure access logs in JSON format for easy parsing with CloudWatch Insights, OpenSearch, or any log analytics tool.
Essential context variables for access logs:
| Variable | Description |
|---|---|
$context.requestId | Unique request identifier (essential for debugging) |
$context.identity.sourceIp | Client IP address |
$context.httpMethod | HTTP method |
$context.resourcePath | Resource path |
$context.status | Response status code |
$context.responseLatency | Total response latency in milliseconds |
$context.integrationLatency | Integration execution time in milliseconds |
$context.error.message | Error message (if any) |
$context.authorizer.error | Authorizer error (if any) |
$context.requestTime | Request timestamp |
$context.protocol | Request protocol |
My recommendation: enable access logging for every API stage. The cost is modest (CloudWatch Logs ingestion + storage), and the operational value during incidents is immense. Include at minimum: request ID, source IP, method, path, status, latency, integration latency, and error message.
Execution Logging
Execution logging (REST API only) provides detailed request/response logs for every stage of the API Gateway pipeline: authorizer execution, VTL template evaluation, integration request/response bodies, mapping template output, and error details. This level of detail is essential for debugging but extremely verbose and expensive at scale.
| Setting | Log Volume | Cost | Use Case |
|---|---|---|---|
| OFF | None | None | Production (default) |
| Errors only | Low | Low | Production (recommended minimum) |
| Full | Very high | High | Development and active debugging only |
Enable full execution logging for development and staging environments. In production, set it to "errors only" or disable it entirely. Leaving full execution logging enabled in production generates enormous log volumes. At 10,000 RPS, you can easily generate terabytes of logs per month, with CloudWatch Logs costs exceeding the API Gateway cost itself.
X-Ray Tracing
REST API integrates with AWS X-Ray for distributed tracing. When enabled, API Gateway generates trace segments for each request, capturing timing data for authorization, integration execution, and response processing. Combined with X-Ray instrumentation in your Lambda functions and downstream services, this provides end-to-end visibility into request latency across your entire architecture.
X-Ray tracing adds approximately 1-2ms of overhead per request. For most APIs, this is acceptable. For ultra-low-latency APIs where every millisecond counts, measure the impact before enabling it in production.
HTTP API does not support X-Ray natively. If you need distributed tracing with HTTP API, instrument your Lambda functions with the X-Ray SDK directly. You will get traces for everything downstream of API Gateway, but the API Gateway processing stage itself will not appear in traces.
Cost Analysis
Per-Request Pricing
The per-request pricing model is straightforward but varies significantly between API types:
| Tier (Monthly Requests) | REST API | HTTP API |
|---|---|---|
| First 333 million | $3.50 per million | $1.00 per million |
| Next 667 million | $2.80 per million | $0.90 per million |
| Next 19 billion | $2.38 per million | $0.80 per million |
| Over 20 billion | $1.51 per million | $0.72 per million |
REST API pricing decreases at higher tiers, but even at the highest tier ($1.51/million), it is still more than double the cost of HTTP API at the same tier ($0.72/million).
WebSocket API Pricing
WebSocket API has two pricing components:
| Component | Cost (us-east-1) |
|---|---|
| Messages (first 1 billion/month) | $1.00 per million messages (sent or received) |
| Messages (over 1 billion/month) | $0.80 per million messages |
| Connection minutes | $0.25 per million connection minutes |
Messages are billed in 32 KB increments. A 65 KB message counts as 3 message units. Connection minutes accumulate for the duration each WebSocket connection is open.
For 10,000 concurrent connections maintained 24/7 with an average of 10 messages per minute each, the monthly cost is approximately:
- Connection minutes: 10,000 x 60 x 24 x 30 = 432 million minutes = ~$108
- Messages: 10,000 x 10 x 60 x 24 x 30 = 4.32 billion messages = ~$4,320
- Total: ~$4,428/month
Data Transfer Costs
Data transfer out from API Gateway to the internet follows standard AWS data transfer pricing:
| Tier | Cost per GB |
|---|---|
| First 10 TB/month | $0.09 |
| Next 40 TB/month | $0.085 |
| Next 100 TB/month | $0.07 |
| Over 150 TB/month | $0.05 |
Data transfer within the same region (to Lambda, to VPC Link targets) is free. Data transfer from CloudFront to API Gateway is also free when both are in the same region, which is a significant cost advantage of placing CloudFront in front of your API.
Cache Pricing Impact
For a typical production REST API serving 50 million requests per month with a 60% cache hit ratio:
| Component | Without Cache | With 0.5 GB Cache |
|---|---|---|
| API Gateway requests | 50M x $3.50/M = $175.00 | 50M x $3.50/M = $175.00 |
| Cache cost | $0 | ~$14.40/month |
| Lambda invocations (at $0.20/M) | 50M = $10.00 | 20M (40% miss) = $4.00 |
| Lambda duration (128MB, 100ms avg) | 50M x $0.000002083 = $104.15 | 20M x $0.000002083 = $41.66 |
| Total | ~$289.15 | ~$235.06 |
| Monthly savings | N/A | ~$54.09 |
The cache pays for itself when the Lambda execution cost savings exceed the cache hourly cost. For compute-intensive backends (long execution times, large memory allocations), the savings are even more dramatic. For lightweight backends, the cache cost may exceed the savings; always do the math for your specific workload.
Total Cost Comparison: REST API vs. HTTP API
For a production API serving 100 million requests per month with 5 KB average response size:
| Component | REST API | HTTP API | HTTP API + CloudFront |
|---|---|---|---|
| Request cost | $350.00 | $100.00 | $100.00 |
| API Gateway cache (0.5 GB) | $14.40 | N/A | N/A |
| CloudFront (if applicable) | N/A | N/A | ~$10-20 (depends on cache hit ratio) |
| Data transfer (500 GB) | $45.00 | $45.00 | CloudFront pricing applies |
| Approximate total | ~$409/month | ~$145/month | ~$130-150/month |
| Annual cost | ~$4,908 | ~$1,740 | ~$1,560-1,800 |
Common Failure Modes and Operational Lessons
The 29-Second Timeout Limit
API Gateway imposes a hard 29-second timeout on all integration types. REST API, HTTP API, WebSocket API (for the integration invocation; the WebSocket connection itself persists longer). This limit cannot be configured. Cannot be increased. Cannot be worked around with a support ticket. If your backend fails to respond within 29 seconds, the client gets a 504 Gateway Timeout.
This is the single most impactful constraint in API Gateway. You cannot use it for synchronous long-running operations: report generation, large data exports, complex ML inference, batch processing. Anything that routinely takes more than 29 seconds is out.
Your Lambda function's 15-minute timeout is irrelevant here. If it has not returned a response to API Gateway in 29 seconds, API Gateway times out.
Workarounds:
- Asynchronous pattern. Return a 202 Accepted immediately with a task ID. The client polls a status endpoint or receives a callback (webhook) when the long-running operation completes. Use Step Functions, SQS, or EventBridge for orchestration.
- Step Functions sync integration. REST API can invoke Express Workflows synchronously via
StartSyncExecution. Express Workflows can run for up to 5 minutes, but the API Gateway 29-second timeout still applies; the workflow must complete within that window. - WebSocket API. Use WebSocket for operations where the server needs to push results asynchronously. The integration timeout is still 29 seconds per message, but the client connection persists for up to 2 hours, allowing the backend to send results whenever they are ready.
- Response streaming (HTTP API). HTTP API supports Lambda response streaming, where the function begins sending response chunks before the full response is ready. The 29-second timeout applies to the time-to-first-byte, but once streaming begins, the connection can remain open for up to 20 minutes.
Payload Size Limits
| API Type | Request Payload | Response Payload |
|---|---|---|
| REST API | 10 MB | 10 MB |
| HTTP API | 10 MB | 10 MB |
| WebSocket API | 128 KB (per frame) | 128 KB (per frame) |
For REST and HTTP APIs, the 10 MB limit is sufficient for most API payloads but insufficient for file uploads. For large file operations, use presigned S3 URLs. API Gateway returns a signed URL, and the client uploads directly to S3, bypassing API Gateway entirely. This pattern avoids the payload limit, reduces API Gateway costs, and takes advantage of S3's multipart upload capabilities for files up to 5 TB.
WebSocket's 128 KB frame limit requires chunking for larger messages. Design your WebSocket message protocol to support multi-frame messages if you need to transmit more than 128 KB.
Throttling Cascades
When API Gateway returns 429 Too Many Requests, aggressive client retries can create a vicious cycle. Retries add to request volume, which increases throttling, which generates more retries. This positive feedback loop sustains elevated error rates long after the original traffic spike has passed.
I have seen this play out the same way every time. Ten clients at 1,000 RPS each hit the 10,000 RPS account limit. All ten get 429s and retry immediately. Now you have 10,000+ retries on top of new organic requests. The actual request volume doubles to 20,000+ RPS, and the API stays throttled until clients back off or someone intervenes.
Mitigations:
- Implement exponential backoff with jitter on the client side. This is a requirement for production API clients.
- Set per-client throttle limits using usage plans (REST API) to prevent any single client from consuming the full budget.
- Monitor 429 error rates and alert on sustained occurrences.
- Request service quota increases proactively. Do not wait until you hit the limit in production.
- Use account isolation to prevent cross-API throttling entirely.
Cold Starts with Lambda Integration
Cold starts hit API Gateway latency when a Lambda function has not been invoked recently or when concurrent invocations exceed warm capacity. The duration varies by runtime, and the range is wide:
| Runtime | Typical Cold Start | With VPC |
|---|---|---|
| Python | 100-300 ms | 100-300 ms |
| Node.js | 100-300 ms | 100-300 ms |
| Java | 500-3,000 ms | 500-3,000 ms |
| Go | 50-150 ms | 50-150 ms |
| .NET | 200-1,000 ms | 200-1,000 ms |
VPC cold starts were dramatically improved in 2019 when AWS introduced Hyperplane ENI. VPC-connected Lambda functions no longer incur the additional 10+ second cold start penalty that was common previously.
Cold starts on Lambda authorizers are particularly impactful because they delay every request, including subsequent requests to different backend functions. If your authorizer has a 2-second cold start, every request after cache expiration experiences that delay.
Mitigations:
- Provisioned Concurrency. Pre-initializes execution environments. Eliminates cold starts entirely for provisioned instances. Costs approximately $0.015 per GB-hour. Use this for latency-sensitive APIs and for Lambda authorizers.
- HTTP API JWT authorizer. Eliminates the authorizer cold start entirely by performing JWT validation within the API Gateway data plane. No Lambda function involved.
- Optimize package size. Smaller deployment packages initialize faster. Remove unnecessary dependencies, use Lambda layers for shared code, and avoid including test files or documentation in the deployment package.
- Keep functions warm. For low-traffic APIs, scheduled pings (CloudWatch Events every 5 minutes) keep at least one environment warm. This is a partial mitigation; it only keeps one environment warm and does not help with concurrency-driven cold starts.
Deployment Gotchas
REST API deployments are immutable snapshots of your API configuration. When you change resources, methods, or integrations, nothing goes live until you create a new deployment and associate it with a stage. This is a safety mechanism, but it trips people up constantly. I have watched teams spend hours debugging an API that "is not working" when the real problem was that they modified the configuration and forgot to deploy it.
HTTP API supports auto-deploy, which deploys changes immediately when you update the API configuration. This is simpler for development but removes the safety net of explicit deployments. For production HTTP APIs, I recommend disabling auto-deploy and managing deployments explicitly through CI/CD pipelines.
Stage variable misconfiguration is another common failure mode. Stage variables are resolved at request time, not deployment time. If a stage variable references a Lambda function alias that does not exist, every request to that stage fails with a 500 error. Validate stage variable references in your CI/CD pipeline before deployment.
Integration with Other AWS Services
Lambda
Lambda is the primary integration target for API Gateway. The combination is the backbone of serverless on AWS. Three integration patterns matter:
- Synchronous invocation (default). API Gateway invokes Lambda synchronously and waits for the response. Subject to the 29-second timeout.
- Asynchronous invocation. REST API can invoke Lambda asynchronously using the
Eventinvocation type in a non-proxy custom integration. API Gateway returns 202 Accepted immediately without waiting for the Lambda result. Useful for fire-and-forget operations. - Response streaming (HTTP API). Lambda streams response chunks through API Gateway to the client. The first byte must arrive within 29 seconds, but streaming can continue for up to 20 minutes.
Application Load Balancer
ALB is a real alternative to API Gateway for HTTP-based APIs. The choice depends on your requirements, and I see teams pick wrong in both directions:
| Capability | API Gateway (REST) | API Gateway (HTTP) | ALB |
|---|---|---|---|
| Request pricing model | $3.50/M requests | $1.00/M requests | ~$0.008/LCU-hour (usage-based) |
| Fixed hourly cost | None | None | ~$0.0225/hour (~$16.20/month) |
| Throttling | Built-in (multi-level) | Account-level only | None native (use WAF rate rules) |
| Caching | Built-in (REST only) | None | None (use CloudFront) |
| Request transformation | VTL templates | Parameter mapping | None |
| WebSocket | Dedicated API type | No | Native passthrough |
| Authentication | IAM, Cognito, Lambda authorizer | JWT, Lambda authorizer | Cognito, OIDC |
| Lambda targets | Yes | Yes | Yes |
| Container targets | Via VPC Link | Via VPC Link | Direct registration |
| Cost at 100M req/month | ~$350 (REST) / ~$100 (HTTP) | ~$100 | ~$30-50 (depends on LCU) |
For high-volume APIs with container backends, ALB is often cheaper than API Gateway. For serverless Lambda backends with API management needs (throttling, caching, API keys, request validation), API Gateway is the better fit. For the common pattern of Lambda + simple auth + low cost, HTTP API is the sweet spot.
Step Functions
REST API can invoke Step Functions directly via AWS service integration, without a Lambda function:
| Integration | Workflow Type | Behavior | Timeout Consideration |
|---|---|---|---|
| StartExecution | Standard Workflow | Returns execution ARN immediately (async) | No timeout concern: execution runs independently |
| StartSyncExecution | Express Workflow | Waits for workflow to complete and returns result | Must complete within API Gateway's 29-second timeout |
The synchronous Express Workflow pattern is particularly powerful for APIs that need to orchestrate multiple steps (validation, data enrichment, conditional branching, parallel processing) without writing a monolithic Lambda function. Express Workflows can execute for up to 5 minutes, but the API Gateway 29-second timeout is the binding constraint.
CloudFront
Placing CloudFront in front of API Gateway provides capabilities that API Gateway lacks natively:
| Capability | What CloudFront Adds |
|---|---|
| Edge caching | Cache API responses at 600+ edge locations; especially valuable for HTTP API which has no built-in cache |
| DDoS protection | AWS Shield Standard (automatic) and Shield Advanced |
| WAF at the edge | Evaluate WAF rules before requests reach API Gateway; essential for HTTP API which has no native WAF |
| Geographic restrictions | Block or allow traffic by country |
| Custom error pages | Return branded error responses for 4XX/5XX errors |
| Edge compute | CloudFront Functions and Lambda@Edge for request/response manipulation |
| TLS optimization | TLS termination at the edge reduces latency for global clients |
This pattern is especially valuable for HTTP APIs. By placing CloudFront in front of an HTTP API, you gain caching, WAF, and global edge optimization while retaining HTTP API's lower cost and latency. The data transfer from CloudFront to a same-region API Gateway endpoint is free, so the CloudFront cost is primarily per-request and cache storage.
WAF
AWS WAF integrates directly with REST API (not HTTP API or WebSocket API). WAF rules are evaluated before any API Gateway processing, providing:
- Protection against common web exploits (SQL injection, XSS) via AWS Managed Rules.
- Rate-based rules for DDoS mitigation and abuse prevention.
- IP-based allow/block lists.
- Bot control (managed rules).
- Custom rules based on headers, query strings, body content, and geographic origin.
For HTTP APIs that need WAF protection, place CloudFront in front of the API and attach WAF to the CloudFront distribution. This provides the same protection with the added benefit of WAF evaluation at the edge (closest to the attacker) rather than at the regional API Gateway endpoint.
Key Architectural Patterns
After years of building and operating API Gateway deployments in production, these are the patterns I come back to every time:
- Default to HTTP API unless you need REST API features. 71% cost savings. Lower latency. HTTP API is the correct choice for most Lambda-backed APIs. Only choose REST API when you genuinely need caching, VTL transformation, WAF integration, usage plans, request validation, or direct AWS service integrations. Verify that you cannot achieve the same result through other means first.
- Use direct AWS service integrations for simple operations. Putting an SQS message on a queue, starting a Step Functions execution, or writing a DynamoDB item does not need a Lambda function. REST API's AWS service integrations eliminate a component, reduce latency, eliminate cold starts, and lower cost.
- Design for the 29-second timeout from day one. Any operation that might exceed 29 seconds needs an asynchronous pattern. Retrofitting async onto a synchronous API is painful. Designing for it upfront is straightforward.
- Isolate production APIs in dedicated AWS accounts. The shared account-level throttle limit (10,000 RPS default) is the most common source of cross-API interference. Account isolation is the cleanest and most reliable solution.
- Place CloudFront in front of HTTP APIs for caching and WAF. This compensates for HTTP API's lack of built-in caching and WAF while preserving the cost and latency benefits. CloudFront-to-API Gateway data transfer is free in the same region.
- Enable request validation on REST APIs. Rejecting malformed requests at the API Gateway layer is faster and cheaper than rejecting them in Lambda. It also produces consistent error responses without custom code.
- Use Provisioned Concurrency for latency-sensitive Lambda authorizers. Cold starts on authorizers are particularly impactful because they delay every request. Provisioned Concurrency or HTTP API's native JWT authorizer eliminates this variability.
- Set per-method throttle limits on REST APIs. Protect your backend from traffic spikes on high-volume endpoints while leaving low-traffic endpoints unconstrained. Without method-level limits, a spike on any endpoint can consume the entire stage or account budget.
- Monitor the Latency minus IntegrationLatency gap. This gap tells you how much time API Gateway itself is adding. If it grows, investigate authorizer performance, VTL template complexity, or request validation bottlenecks.
- Never use API keys as authentication. They are usage tracking and metering tokens, not security tokens. Always pair them with a proper authentication mechanism.
- Plan for the 10 MB payload limit. Use presigned S3 URLs for file uploads and downloads rather than proxying large payloads through API Gateway. This avoids the limit, reduces API Gateway costs, and leverages S3's multipart upload for large files.
Additional Resources
- Amazon API Gateway Developer Guide: comprehensive reference covering all three API types, integrations, authorization, and deployment
- API Gateway REST API vs. HTTP API feature comparison: AWS documentation with a complete matrix of every feature difference between the two types
- API Gateway quotas and important notes: account-level and per-API limits with instructions for requesting increases
- API Gateway pricing page: current per-request pricing, cache pricing, and data transfer costs for all three API types
- API Gateway mapping template reference: VTL syntax, context variables, and utility functions for request/response transformation
- Working with Lambda authorizers: configuration patterns for token-based and request-based Lambda authorizers, including caching behavior
- Working with WebSocket APIs: connection management, route selection, and the @connections callback API
- VPC Link configuration for REST and HTTP APIs: setup instructions for private integrations including NLB, ALB, and Cloud Map targets
- AWS Well-Architected Serverless Applications Lens: architectural best practices for serverless workloads including API Gateway patterns
- Serverless Land patterns library: community-contributed integration patterns with infrastructure as code templates for common API Gateway architectures
Let's Build Something!
I help teams ship cloud infrastructure that actually works at scale. Whether you're modernizing a legacy platform, designing a multi-region architecture from scratch, or figuring out how AI fits into your engineering workflow, I've seen your problem before. Let me help.
Currently taking on select consulting engagements through Vantalect.

