About the author: I'm Charles Sieg, a cloud architect and platform engineer who builds apps, services, and infrastructure for Fortune 1000 clients through Vantalect. If your organization is rethinking its software strategy in the age of AI-assisted engineering, let's talk.
Having spent years packaging Lambda functions as zip archives, I hit the wall that every team eventually hits: the 250 MB deployment package limit. The first time it happened was an ML inference function with a PyTorch model and its dependency tree. We burned weeks trying to strip binaries, use Lambda Layers creatively, and shave megabytes from scipy. When AWS launched container image support for Lambda in December 2020, it raised the size ceiling to 10 GB and fundamentally changed how I think about Lambda packaging, base image standardization, CI/CD pipelines, and the boundary between serverless and container workloads. Container images let you use the same Dockerfile, the same build toolchain, and the same base image across Lambda, ECS, and Fargate, which eliminates an entire category of "works in my container but not in Lambda" problems.
This article is an architecture reference for engineers and architects who need to understand how Lambda container images work under the hood, how to design images that are fast, secure, and cost-effective, and how to build CI/CD pipelines that deploy them reliably. It covers the architecture, trade-offs, performance characteristics, and operational patterns that inform how I design container-based Lambda workloads in production.
Why Container Images for Lambda
The zip deployment model served Lambda well for years, but it imposes constraints that become painful as functions grow in complexity. The 250 MB limit (50 MB for direct upload, 250 MB unzipped from S3) excludes many legitimate workloads: ML models, scientific computing libraries, functions with large native dependency trees, and custom runtimes with embedded interpreters. Lambda Layers help but introduce their own complexity. You are limited to five layers, each layer counts against the 250 MB total, and layer versioning creates a matrix of compatibility that is tedious to manage.
Container images raise the ceiling to 10 GB and bring the entire Docker ecosystem to Lambda. You get multi-stage builds for minimal images, layer caching for fast rebuilds, the same Dockerfile across Lambda and ECS, and the ability to run and debug locally with the exact same image that runs in production.
For small, simple functions, zip archives remain the right choice.
| Dimension | Zip Archive | Container Image |
|---|---|---|
| Maximum size | 250 MB (unzipped) | 10 GB |
| Artifact format | .zip file in S3 or direct upload | OCI image in ECR |
| Cold start | Generally faster for small packages | Comparable; optimized by Lambda's chunk caching |
| Supported runtimes | AWS-managed runtimes only | Any runtime: bring your own |
| Lambda Layers | Up to 5 layers | Not supported (use Docker layers instead) |
| Local testing | Requires SAM CLI or manual invocation | docker run with RIE (identical to production) |
| Build toolchain | zip, SAM, CDK asset bundling | Docker/Buildx, any CI/CD system |
| Registry | S3 (managed by Lambda) | ECR (you manage) |
| Image sharing | Not applicable | Same base image across Lambda, ECS, Fargate |
| Versioning | S3 object versioning or CodeUri hash | Image tags and SHA256 digests |
| Rollback | Redeploy previous zip | Point Lambda to previous image digest |
| Pricing | Lambda compute only | Lambda compute + ECR storage ($0.10/GB/month) |
The decision framework: use container images when your deployment package exceeds 250 MB, when you need a custom runtime, when you want to share base images across compute platforms, or when your team already has Docker-based CI/CD and wants a unified build pipeline. Use zip archives when your function is small, your dependencies are minimal, and you value the simplicity of sam deploy or inline code editing in the console.
Architecture Internals
Lambda's internal handling of container images drives the performance characteristics and operational behaviors you observe in production. Lambda performs a one-time optimization when you create or update a function, and then uses a caching system to assemble the runtime environment quickly rather than pulling the full container image from ECR on every cold start.
When you update a Lambda function's image URI, the Lambda service pulls the image from ECR, decompresses it, encrypts the layers, and breaks them into small chunks. These chunks are stored in a Lambda-managed cache distributed across the fleet of workers in the region. On cold start, the worker assembles only the chunks it needs (starting with the handler and its immediate dependencies) rather than downloading the entire image. A 2 GB image therefore does not necessarily produce a 2 GB cold start penalty; Lambda loads chunks on demand and caches them at the fleet level.
The Runtime Interface Client (RIC) is the critical piece that makes a container image compatible with Lambda. The RIC implements the Lambda Runtime API, the HTTP-based protocol that Lambda workers use to send invocation events to your function and receive responses. AWS provides RIC implementations for Python, Node.js, Java, .NET, Go, Ruby, C++, and Rust. When you use an AWS base image, the RIC is pre-installed. When you build from scratch, you must install the RIC yourself.
The Runtime Interface Emulator (RIE) is a companion tool for local testing. It emulates the Lambda Runtime API on your development machine, allowing you to invoke your containerized function with curl and receive the same JSON response format you would get in production. The RIE is included in AWS base images and can be added to custom images for local development.
When you update a Lambda function to point to a new image (or a new digest behind the same tag), warm execution environments continue running the old image until they are recycled. Lambda does not terminate warm environments to pick up the new image. New cold starts use the new image, but you may observe both old and new versions serving traffic simultaneously during the transition period. I recommend deploying via Lambda aliases with CodeDeploy traffic shifting rather than updating the function directly.
graph TD
A[Docker Build] --> B[Push to ECR]
B --> C[Update Lambda
Function Config]
C --> D[Lambda Pulls Image
from ECR]
D --> E[Decompress, Encrypt,
and Chunk Layers]
E --> F[Store Chunks in
Regional Cache]
F --> G[Cold Start: Assemble
from Cached Chunks]
G --> H[RIC Initializes
Runtime]
H --> I[Handler Invoked]| Runtime | RIC Package | Install Command |
|---|---|---|
| Python | awslambdaric | pip install awslambdaric |
| Node.js | aws-lambda-ric | npm install aws-lambda-ric |
| Java | aws-lambda-java-runtime-interface-client | Maven/Gradle dependency |
| .NET | Amazon.Lambda.RuntimeSupport | NuGet package |
| Go | aws-lambda-go | go get github.com/aws/aws-lambda-go |
| Ruby | aws_lambda_ric | gem install aws_lambda_ric |
| C++ | aws-lambda-cpp | CMake build from source |
| Rust | lambda_runtime | cargo add lambda_runtime |
AWS Base Images vs. Custom Images
AWS publishes official Lambda base images in ECR Public Gallery at public.ecr.aws/lambda/. These images include the Lambda runtime, the RIC, the RIE, and a minimal Amazon Linux operating system. They are the fastest path to a working container-based Lambda function and the approach I recommend for most teams starting out.
| Runtime | Image URI | Approximate Size | OS |
|---|---|---|---|
| Python 3.12 | public.ecr.aws/lambda/python:3.12 | ~580 MB | Amazon Linux 2023 |
| Python 3.13 | public.ecr.aws/lambda/python:3.13 | ~590 MB | Amazon Linux 2023 |
| Node.js 20 | public.ecr.aws/lambda/nodejs:20 | ~490 MB | Amazon Linux 2023 |
| Node.js 22 | public.ecr.aws/lambda/nodejs:22 | ~500 MB | Amazon Linux 2023 |
| Java 21 | public.ecr.aws/lambda/java:21 | ~620 MB | Amazon Linux 2023 |
| .NET 8 | public.ecr.aws/lambda/dotnet:8 | ~560 MB | Amazon Linux 2023 |
| Ruby 3.3 | public.ecr.aws/lambda/ruby:3.3 | ~510 MB | Amazon Linux 2023 |
Building from a custom base image gives you full control over the operating system, system libraries, and image size. This is the right choice when you need a specific Linux distribution for compliance reasons, need to minimize image size, or need system-level packages that are not available in Amazon Linux. The trade-off: you are responsible for installing and maintaining the RIC and for applying OS patches that AWS otherwise handles automatically.
| Dimension | AWS Base Image | Custom Base Image |
|---|---|---|
| Maintenance | AWS patches the OS and runtime | You patch everything |
| RIC included | Yes | You install it |
| RIE included | Yes | You install it |
| OS choice | Amazon Linux 2023 only | Any Linux distribution |
| Size control | Limited: base image is fixed | Full control via multi-stage builds |
| Compliance | AWS-managed, SOC2/HIPAA eligible | You validate compliance |
| Cold start | AWS-optimized | Depends on your optimization |
| Native libraries | Amazon Linux packages | Any packages you install |
| FIPS 140-2 | Not available in base images | Can use FIPS-validated OS |
Example: Dockerfile using AWS base image (Python)
FROM public.ecr.aws/lambda/python:3.12
COPY requirements.txt ${LAMBDA_TASK_ROOT}
RUN pip install -r requirements.txt --target "${LAMBDA_TASK_ROOT}"
COPY app.py ${LAMBDA_TASK_ROOT}
CMD ["app.handler"]
Example: Dockerfile from custom base image (Python)
FROM python:3.12-slim
RUN pip install awslambdaric
WORKDIR /var/task
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY app.py .
ENTRYPOINT ["python", "-m", "awslambdaric"]
CMD ["app.handler"]
Building Lambda Container Images
The ENTRYPOINT and CMD instructions in your Dockerfile have specific meaning for Lambda container images. The ENTRYPOINT specifies the executable that implements the Runtime Interface Client, and CMD specifies the handler function in module.handler format. When using AWS base images, the ENTRYPOINT is pre-configured; you only need to set CMD. When using custom base images, you must set both.
Multi-stage builds are essential for keeping images small. A common pattern is to use a full build image (with compilers, headers, and build tools) to compile native extensions, then copy only the compiled artifacts into a minimal runtime image. This can reduce image size by 50–80% compared to a single-stage build.
Layer ordering matters for Docker build cache efficiency. Docker invalidates the cache for a layer and all subsequent layers when any file in a COPY instruction changes. By copying dependency files (like requirements.txt or package.json) before copying application code, you ensure that the expensive dependency installation step is cached across builds where only application code changes.
The /var/task working directory is a Lambda convention. Both AWS base images and the Lambda execution environment expect your function code to be in /var/task. While you can use a different directory with custom images, sticking with the convention avoids subtle issues with relative path resolution.
| Best Practice | Rationale | Impact |
|---|---|---|
| Use multi-stage builds | Exclude build tools, headers, and intermediate artifacts from runtime image | 50–80% size reduction |
| Order layers by change frequency | Copy dependency manifests before source code | Faster rebuilds via cache hits |
| Pin base image tags | Use python:3.12.4-slim not python:3.12-slim | Reproducible builds |
Use .dockerignore | Exclude tests, docs, .git, __pycache__ | Smaller build context, faster uploads |
Set WORKDIR /var/task | Match Lambda's expected function directory | Consistent path resolution |
| Minimize layer count | Combine related RUN commands with && | Smaller image, fewer cache layers |
| Remove package manager caches | Add --no-cache-dir (pip) or npm cache clean | 50–200 MB savings |
Use --target for build artifacts | Install Python packages to ${LAMBDA_TASK_ROOT} | Clean separation of runtime deps |
Example: Optimized multi-stage Python Dockerfile
FROM python:3.12-slim AS builder
WORKDIR /build
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt --target /build/deps
FROM public.ecr.aws/lambda/python:3.12
COPY --from=builder /build/deps ${LAMBDA_TASK_ROOT}
COPY src/ ${LAMBDA_TASK_ROOT}
CMD ["app.handler"]
Example: Optimized Node.js Dockerfile
FROM public.ecr.aws/lambda/nodejs:20
COPY package.json package-lock.json ${LAMBDA_TASK_ROOT}/
RUN npm ci --production
COPY src/ ${LAMBDA_TASK_ROOT}/
CMD ["app.handler"]
ECR as the Lambda Image Registry
Lambda container images must be stored in Amazon Elastic Container Registry. This is a hard requirement. Lambda cannot pull images from Docker Hub, GitHub Container Registry, or any other registry. Your CI/CD pipeline must push images to ECR, and your Lambda function must reference an ECR image URI.
Image tag strategy matters more than most teams realize. Lambda resolves an image tag to a specific SHA256 digest at the time you create or update the function configuration. If you subsequently push a new image to the same tag (a mutable tag like latest or v1), Lambda continues running the old digest until you explicitly update the function. Mutable tags therefore provide no automatic rollout and create confusion about which version is actually deployed. I recommend using immutable tags (ECR supports an immutable tag policy) or deploying by digest (123456789.dkr.ecr.us-east-1.amazonaws.com/my-func@sha256:abc123...).
ECR lifecycle policies are essential for cost control. Without them, every image you push accumulates at $0.10/GB/month. A team pushing 500 MB images daily will accumulate 15 GB/month, costing $1.50/month. That amount seems small, but it compounds across dozens of functions and environments. A sensible lifecycle policy retains the last N tagged images and expires untagged images after a few days.
| Setting | Recommendation | Rationale |
|---|---|---|
| Tag immutability | Enabled | Prevents accidental overwrite; deploy by digest for safety |
| Image scanning on push | Enabled | Catches known vulnerabilities before deployment |
| Lifecycle policy | Keep last 10 tagged images; expire untagged after 3 days | Controls storage costs |
| Encryption | AWS-managed KMS key (default) or customer-managed CMK | At-rest encryption; CMK for compliance |
| Cross-region replication | Enable for multi-region Lambda deployments | Images available in each region Lambda runs |
| Repository policy | Grant lambda.amazonaws.com ECR pull access | Required for Lambda to pull images |
| Pull-through cache | Not needed for Lambda | Lambda only pulls from ECR directly |
ECR offers two scanning modes. Basic scanning uses the open-source Clair project and scans for OS package vulnerabilities. Enhanced scanning uses Amazon Inspector and adds programming language package scanning (pip, npm, Maven, etc.) plus continuous monitoring that re-scans images when new CVEs are published.
| Dimension | Basic Scanning | Enhanced Scanning |
|---|---|---|
| Engine | Clair (open source) | Amazon Inspector |
| OS package CVEs | Yes | Yes |
| Language package CVEs | No | Yes (pip, npm, Maven, Go, .NET) |
| Scan trigger | On push or manual | Continuous: re-scans on new CVEs |
| Cost | Free | Inspector pricing ($0.09/image/month for continuous) |
| Findings format | ECR native | Inspector findings + Security Hub |
| Severity levels | Critical, High, Medium, Low, Informational | Same, with CVSS scoring |
Cold Start Performance
Cold starts are the primary concern teams raise when evaluating container images for Lambda. The performance gap between zip and container deployments has narrowed since the feature launched. Lambda's chunk-based caching system means that cold start time is not linearly proportional to image size. In practice, I see container cold starts ranging from 200 ms to 2 seconds for typical workloads, with image size being the dominant (but not the only) factor.
Lambda's cold start sequence for container images involves six phases: downloading cached image chunks to the worker, setting up the container filesystem, initializing the runtime environment, starting the RIC, executing your function's initialization code (module-level imports, global variable setup), and finally invoking the handler. The chunk download phase is where container images differ from zip, but Lambda's caching means that popular images (including your own, after the first invocation in a region) are already cached on the worker fleet.
graph LR
A[Chunk
Download] --> B[Container
Setup] --> C[Runtime
Init] --> D[RIC
Startup] --> E[Function
Init Code] --> F[Handler
Ready]| Runtime | Image Size | Cold Start (p50) | Cold Start (p99) | Notes |
|---|---|---|---|---|
| Python 3.12 | ~580 MB (base only) | ~300 ms | ~800 ms | AWS base image, no additional deps |
| Python 3.12 | ~1.2 GB (with numpy/pandas) | ~600 ms | ~1.5 s | Common data science stack |
| Python 3.12 | ~3 GB (with PyTorch) | ~1.2 s | ~3 s | ML inference workload |
| Node.js 20 | ~490 MB (base only) | ~250 ms | ~700 ms | AWS base image, no additional deps |
| Node.js 20 | ~800 MB (with dependencies) | ~450 ms | ~1.2 s | Typical API function |
| Java 21 | ~620 MB (base only) | ~3 s | ~8 s | JVM startup dominates; use SnapStart |
| Java 21 + SnapStart | ~620 MB (base only) | ~200 ms | ~500 ms | SnapStart eliminates JVM init |
| .NET 8 | ~560 MB (base only) | ~400 ms | ~1 s | AOT compilation improves this |
| Go (custom) | ~50 MB (Alpine + binary) | ~80 ms | ~200 ms | Compiled binary, minimal runtime |
SnapStart, originally available only for Java 11 and 17 zip deployments, now supports Java 21 container images. It takes a Firecracker microVM snapshot after initialization and restores from that snapshot on cold start, reducing Java cold starts from seconds to milliseconds. If you are running Java on Lambda, SnapStart with containers is the single most impactful optimization available.
Provisioned Concurrency eliminates cold starts entirely by keeping a specified number of execution environments warm. This works identically for zip and container deployments. At $0.0000041667 per GB-second of provisioned concurrency, it is cost-effective for latency-sensitive workloads with predictable traffic patterns.
For a detailed look at Lambda networking, invocation patterns, and integration with ALB and CloudFront, see the Lambda Behind ALB Behind CloudFront: An Architecture Deep-Dive.
Multi-Architecture Support
Lambda supports both x8664 and ARM64 (Graviton2) architectures. ARM64 Lambda functions are priced 20% lower than x8664 ($0.0000133334 per GB-second versus $0.0000166667) and in my experience deliver comparable or better performance for most workloads. For container-based functions, supporting both architectures requires building multi-architecture images using Docker's buildx and manifest lists.
A multi-architecture image is a single ECR repository tag that resolves to different image digests depending on the requesting platform's architecture. When you configure a Lambda function with arm64 architecture and point it to a multi-arch image tag, Lambda automatically pulls the ARM64 variant. This lets you maintain a single image tag across architectures while Lambda handles the selection.
graph LR
A[Source
Code] --> B[x86_64
Build]
A --> C[ARM64
Build]
B --> D[Manifest
List]
C --> D
D --> E[Push to
ECR]
E --> F[x86_64
Lambda]
E --> G[ARM64
Lambda]The critical CI/CD consideration is build speed. Building ARM64 images on x86_64 build hosts requires QEMU emulation, which is 5–10x slower than native builds. For production pipelines, I recommend using ARM64-native CodeBuild instances (available since 2023) or a build matrix that runs each architecture on its native hardware.
| Dimension | x86_64 (Intel/AMD) | ARM64 (Graviton2) |
|---|---|---|
| Lambda pricing | $0.0000166667/GB-s | $0.0000133334/GB-s (20% cheaper) |
| Compute performance | Baseline | Comparable; better for some workloads |
| Native build speed | Fast on x86 hosts | Fast on ARM hosts; slow via QEMU |
| CodeBuild support | All compute types | ARM-native instances available |
| Base image availability | All AWS base images | All AWS base images |
| Third-party library support | Universal | Most major libraries; verify native extensions |
| Docker buildx required | Yes (for multi-arch) | Yes (for multi-arch) |
| ECR storage | Per-architecture layers | Per-architecture layers |
| Graviton3 support | Not applicable | Not yet available for Lambda |
| Migration effort | None (default) | Test native extensions; rebuild images |
To build multi-architecture images with buildx:
# Create and use a buildx builder
docker buildx create --name lambda-builder --use
# Build and push both architectures
docker buildx build \
--platform linux/amd64,linux/arm64 \
--tag 123456789.dkr.ecr.us-east-1.amazonaws.com/my-func:v1.0.0 \
--push .
CI/CD Patterns
The canonical CI/CD pipeline for Lambda container images follows a predictable flow: source code triggers a build, the build produces a container image and pushes it to ECR, and a deployment step updates the Lambda function to use the new image. The details (how you trigger builds, how you tag images, how you deploy safely) determine whether this pipeline is robust or fragile.
graph LR
A[GitHub
Push] --> B[CodePipeline
Source]
B --> C[CodeBuild
Build & Push]
C --> D[ECR
Image]
D --> E[CloudFormation
or CDK Deploy]
E --> F[Lambda
New Version]
F --> G[CodeDeploy
Alias Shift]The most important CI/CD principle for container-based Lambda: always deploy by image digest, never by mutable tag. When CodeBuild pushes a new image, capture the digest from the docker push output and pass it to the deployment step. This guarantees that the Lambda function runs exactly the image that was built and tested, not whatever happens to be tagged latest at deployment time.
| Deployment Strategy | How It Works | Rollback Speed | Risk Level | Use Case |
|---|---|---|---|---|
| All-at-once | Update function image URI directly | Manual revert | Highest | Development and testing |
| Lambda versioning + alias | Publish version, shift alias | Point alias to previous version | Medium | Simple production deployments |
| CodeDeploy Linear10PercentEvery1Minute | Shift 10% traffic per minute | Automatic on alarm | Low | Production APIs with health metrics |
| CodeDeploy Canary10Percent5Minutes | 10% for 5 min, then 100% | Automatic on alarm | Lowest | Production with strict SLAs |
| CodeDeploy AllAtOnce | Shift all traffic immediately via CodeDeploy | Automatic on alarm | Medium | When you want alarm-based rollback without gradual shift |
Example: CodeBuild buildspec for Lambda container image
version: 0.2
env:
variables:
ECR_REPO: 123456789.dkr.ecr.us-east-1.amazonaws.com/my-func
FUNCTION_NAME: my-function
phases:
pre_build:
commands:
- aws ecr get-login-password --region $AWS_DEFAULT_REGION | docker login --username AWS --password-stdin $ECR_REPO
- IMAGE_TAG=$(echo $CODEBUILD_RESOLVED_SOURCE_VERSION | cut -c 1-8)
build:
commands:
- docker build -t $ECR_REPO:$IMAGE_TAG .
- docker push $ECR_REPO:$IMAGE_TAG
- IMAGE_URI=$(docker inspect --format='{{index .RepoDigests 0}}' $ECR_REPO:$IMAGE_TAG)
post_build:
commands:
- aws lambda update-function-code --function-name $FUNCTION_NAME --image-uri $IMAGE_URI
- printf '{"ImageUri":"%s"}' "$IMAGE_URI" > imageDetail.json
artifacts:
files:
- imageDetail.json
For deeper coverage of the build, deployment, and pipeline services referenced here, see the AWS CodeBuild: An Architecture Deep-Dive, the AWS CodeDeploy: An Architecture Deep-Dive, and the AWS CodePipeline: An Architecture Deep-Dive.
Infrastructure as Code
Defining container-based Lambda functions in infrastructure as code differs from zip-based functions in a few important ways. The most notable: you do not specify a runtime or handler in the Lambda function configuration. Both are embedded in the container image via the ENTRYPOINT and CMD instructions. You specify PackageType: Image and the ImageUri. That is all.
Lambda does allow you to override CMD, ENTRYPOINT, and WORKDIR at the function configuration level, which is useful for running different handlers from the same image (for example, separate Lambda functions for API handling and background processing, all built from a single image with multiple handler modules).
| Property | Zip Deployment | Container Deployment |
|---|---|---|
| Package type | Zip (default) | Image |
| Code location | S3 bucket/key or inline | ECR image URI |
| Runtime | Required (e.g., python3.12) | Not set: embedded in image |
| Handler | Required (e.g., app.handler) | Not set: embedded in image CMD |
| Layers | Up to 5 layer ARNs | Not supported |
| Architecture | x86_64 or arm64 | x86_64 or arm64 |
| CMD override | Not applicable | Optional: overrides Dockerfile CMD |
| ENTRYPOINT override | Not applicable | Optional: overrides Dockerfile ENTRYPOINT |
Example: CDK TypeScript
import { DockerImageFunction, DockerImageCode } from 'aws-cdk-lib/aws-lambda';
import { Repository } from 'aws-cdk-lib/aws-ecr';
const repo = Repository.fromRepositoryName(this, 'Repo', 'my-func');
new DockerImageFunction(this, 'MyFunction', {
code: DockerImageCode.fromEcr(repo, {
tagOrDigest: 'sha256:abc123...',
}),
memorySize: 1024,
timeout: Duration.seconds(30),
architecture: Architecture.ARM_64,
environment: {
TABLE_NAME: table.tableName,
},
});
Example: Terraform HCL
resource "aws_lambda_function" "my_function" {
function_name = "my-function"
package_type = "Image"
image_uri = "123456789.dkr.ecr.us-east-1.amazonaws.com/my-func@sha256:abc123..."
role = aws_iam_role.lambda_role.arn
memory_size = 1024
timeout = 30
architectures = ["arm64"]
environment {
variables = {
TABLE_NAME = aws_dynamodb_table.my_table.name
}
}
}
Security Architecture
Container images introduce a different security surface compared to zip deployments. The image itself becomes an artifact that must be secured across its lifecycle: build, storage, deployment, and runtime. The key principle is establishing an immutable chain from source to running function. You should be able to trace any running Lambda back to the exact source commit and Dockerfile that produced it.
ECR repository policies control who can push and pull images. Lambda needs pull access (granted to the lambda.amazonaws.com service principal), and your CI/CD role needs push access. Avoid granting broad ecr:* permissions. Scope push access to CI/CD roles and pull access to Lambda's service principal and your deployment roles.
graph TD
A[CI/CD Role] -->|ecr:PutImage
ecr:InitiateLayerUpload| B[ECR Repository]
C[Lambda Service] -->|ecr:GetDownloadUrlForLayer
ecr:BatchGetImage| B
B --> D[Image Scan
on Push]
D -->|Findings| E{Severity
Gate}
E -->|Critical/High| F[Block
Deployment]
E -->|Low/Medium| G[Alert and
Continue]
B --> H[Immutable Tags
+ Digest Deploy]| Security Control | Implementation | Layer |
|---|---|---|
| Image scanning | ECR basic or enhanced scanning on push | Build/Registry |
| Deployment gate | Block deployments with Critical/High CVE findings | CI/CD |
| Immutable tags | Enable ECR immutable tag policy | Registry |
| Digest-based deployment | Deploy by @sha256:..., not by tag | CI/CD/IaC |
| Least-privilege ECR policy | Separate push (CI/CD) and pull (Lambda) permissions | Registry |
| No secrets in images | Use Lambda environment variables, Secrets Manager, or Parameter Store | Build |
| Base image patching | Scheduled weekly rebuilds pulling latest base image | CI/CD |
| Image signing | AWS Signer or Notation for supply chain integrity | Build/Registry |
| VPC deployment | Run Lambda in VPC for private resource access | Runtime |
| Execution role scoping | Minimal IAM permissions per function | Runtime |
Never bake secrets, API keys, or database credentials into your container image. Images are stored in ECR and can be inspected by anyone with repository read access. Use Lambda environment variables (encrypted at rest with KMS), AWS Secrets Manager, or Systems Manager Parameter Store for runtime secrets. If you need secrets during build time (for example, to access a private package registry), use Docker BuildKit secret mounts (--mount=type=secret) which are excluded from the final image layers.
Scheduled base image rebuilds are essential for security hygiene. Even if your application code has not changed, the base image's OS packages may have new CVE patches. A weekly CI/CD job that rebuilds all Lambda images from the latest base image tag and runs a vulnerability scan catches these patches automatically.
Cost Considerations
Lambda pricing is identical for zip and container deployments; you pay the same per-GB-second rate regardless of package type. The additional costs specific to container images come from ECR storage, image builds, and (optionally) enhanced scanning.
| Cost Component | Rate | Example | Monthly Cost |
|---|---|---|---|
| Lambda compute (x86_64) | $0.0000166667/GB-s | 1M invocations, 512 MB, 200 ms avg | ~$1.67 |
| Lambda compute (ARM64) | $0.0000133334/GB-s | Same workload as above | ~$1.33 |
| Lambda requests | $0.20/million | 1M invocations | $0.20 |
| ECR storage | $0.10/GB/month | 10 images × 1 GB each | $1.00 |
| ECR data transfer (same region) | Free | Lambda pulls from same-region ECR | $0.00 |
| ECR cross-region transfer | $0.01/GB | Replication to second region | Varies |
| CodeBuild (general1.small) | $0.005/minute | 100 builds × 5 min | $2.50 |
| Enhanced scanning | $0.09/image/month (continuous) | 10 images | $0.90 |
The most common cost surprise is ECR storage accumulation. Without lifecycle policies, every build pushes a new image that persists indefinitely. A single function with daily deployments and 1 GB images accumulates 30 GB/month ($3/month), and across 50 functions that reaches $150/month of dead images. Lifecycle policies that retain only the last 10 tagged images and expire untagged images after 3 days eliminate this waste entirely.
ARM64 (Graviton2) provides the easiest cost optimization: a 20% reduction in Lambda compute cost with a one-line configuration change (set the architecture to arm64). For most workloads, the performance is equivalent or better. The only prerequisite is that your container image and all native dependencies are built for the ARM64 architecture.
Common Failure Modes
| Failure Mode | Symptom | Root Cause | Mitigation |
|---|---|---|---|
| Image exceeds 10 GB | ImageTooLargeException on function create/update | Unoptimized image with unnecessary dependencies | Multi-stage builds; remove build tools, caches, docs |
| Missing RIC | Function times out immediately; no logs | Custom image does not include the Runtime Interface Client | Install the language-specific RIC package |
| Wrong architecture | exec format error in logs | ARM64 function referencing x86_64 image or vice versa | Build for target architecture; use multi-arch manifest |
| ECR permission denied | AccessDeniedException on function create/update | Lambda service principal lacks ecr:GetDownloadUrlForLayer and ecr:BatchGetImage | Add ECR repository policy granting Lambda pull access |
| Cross-account pull failure | ImageNotFoundException or access denied | ECR repository in different account lacks cross-account policy | Add cross-account resource policy on ECR repository |
| Cold start timeout | Function timeout on first invocation; subsequent invocations succeed | Init duration exceeds function timeout; large image or slow module imports | Increase timeout; optimize imports; use Provisioned Concurrency |
| Stale image after tag update | Old behavior persists after pushing new image to same tag | Lambda resolves tag to digest at deploy time, not at invocation time | Redeploy function with update-function-code; prefer digest-based deploys |
| Image not found | ImageNotFoundException | Typo in image URI, deleted image, or wrong region | Verify URI, check ECR lifecycle policies, confirm region |
| ENTRYPOINT conflict | Runtime.InvalidEntrypoint or handler not found | Custom ENTRYPOINT overrides RIC; or CMD missing handler | Ensure ENTRYPOINT runs RIC; CMD specifies module.handler |
| Slow QEMU builds | CI/CD pipeline takes 20+ minutes for ARM64 builds | Building ARM64 on x86_64 via QEMU emulation | Use ARM64-native CodeBuild instances or build matrix |
Key Architectural Recommendations
- Use container images when your deployment package exceeds 250 MB or when you need a custom runtime. For small, simple functions with standard runtimes, zip archives remain simpler and faster to deploy.
- Start with AWS base images unless you have a specific reason to build from scratch. They include the RIC, the RIE, and receive automatic OS patches. Move to custom base images only when you need a different OS, aggressive size optimization, or FIPS compliance.
- Deploy by image digest, never by mutable tag. Capture the digest from
docker pushin your CI/CD pipeline and pass it through to the deployment step. This guarantees reproducibility and makes rollback deterministic. - Enable immutable tags on ECR repositories. This prevents accidental overwrites and forces your pipeline to use unique tags, which makes audit trails clear and rollback reliable.
- Optimize image size. Use multi-stage builds, remove build tools and caches, pin slim base images, and use
.dockerignore. Every 100 MB you remove saves 50–150 ms of cold start time. - Use Graviton2 (ARM64) by default. The 20% cost saving requires minimal effort for most workloads. Build multi-architecture images with
docker buildxso you can switch architectures with a single configuration change. - Implement ECR lifecycle policies from day one. Retain the last 10 tagged images, expire untagged images after 3 days. This prevents unbounded storage cost growth.
- Use CodeDeploy traffic shifting for production deployments. The
Canary10Percent5MinutesorLinear10PercentEvery1Minutestrategies with CloudWatch alarm-based rollback provide safe, automated deployments. See the AWS CodeDeploy: An Architecture Deep-Dive for details. - Scan images on push and gate deployments on findings. At minimum, enable ECR basic scanning. For production workloads, use enhanced scanning with Amazon Inspector for continuous monitoring and language-package CVE detection.
- Never bake secrets into container images. Use Lambda environment variables with KMS encryption, Secrets Manager, or Parameter Store. Use Docker BuildKit secret mounts for build-time secrets.
Additional Resources
- AWS Lambda Container Image Support Documentation
- AWS Lambda Container Image Best Practices
- ECR User Guide
- Lambda Runtime Interface Client for Python
- Lambda Runtime Interface Emulator
- Docker Multi-Platform Builds
- AWS Lambda Pricing
- ECR Pricing
- AWS Lambda SnapStart
- CodeDeploy with Lambda
Let's Build Something!
I help teams ship cloud infrastructure that actually works at scale. Whether you're modernizing a legacy platform, designing a multi-region architecture from scratch, or figuring out how AI fits into your engineering workflow, I've seen your problem before. Let me help.
Currently taking on select consulting engagements through Vantalect.

