About the author: I'm Charles Sieg, a cloud architect and platform engineer who builds apps, services, and infrastructure for Fortune 1000 clients through Vantalect. If your organization is rethinking its software strategy in the age of AI-assisted engineering, let's talk.
Deployment automation is the single most impactful investment a team can make in operational reliability. Manual deployments (SSH into a box, pull the latest code, restart the service, pray) are slow, and they are the root cause of a disproportionate number of production incidents. Every manual step is an opportunity for human error: the wrong branch, a missed configuration file, a forgotten service restart, a deployment to the wrong environment. Having spent years building and operating deployment pipelines across hundreds of EC2 instances, Lambda functions, and ECS services, I have watched CodeDeploy evolve from a simple EC2 deployment tool into the foundational deployment engine that underpins most serious AWS CI/CD architectures. It lacks glamour and thorough documentation of its deeper behaviors, yet it is the service that actually puts your code onto your compute.
This article is an architecture reference for engineers and architects who need to understand how CodeDeploy works under the hood: the component hierarchy, the lifecycle event model, the deployment strategies per compute platform, the failure modes that will bite you in production, and the architectural patterns that keep deployments safe at scale.
What CodeDeploy Actually Is
AWS CodeDeploy is a fully managed deployment service that automates software deployments to Amazon EC2 instances, on-premises servers, AWS Lambda functions, and Amazon ECS services. It coordinates the entire deployment process, from pulling your application revision out of S3 or GitHub, to stopping your running application, installing the new files, restarting services, running validation scripts, and shifting traffic, all without requiring you to SSH into anything or manually orchestrate any step.
CodeDeploy is deliberately unopinionated about what you deploy. Java WAR file, Node.js bundle, compiled Go binary, static assets, Docker image reference: it does not care. No imposed framework, no required project structure, no mandated build system. You tell it what files to put where (via the AppSpec file), what scripts to run at each lifecycle stage, and how to shift traffic. Everything else is your business.
This generality drives most of the confusion around the service. Elastic Beanstalk provides a complete platform abstraction. ECS service updates are tightly coupled to the container orchestration model. CodeDeploy is neither. It is a deployment primitive, a building block you compose into your own deployment architecture.
| Dimension | CodeDeploy | Elastic Beanstalk | ECS Service Update | CloudFormation Deploy | Manual (SSH/SCP) |
|---|---|---|---|---|---|
| Compute Targets | EC2, on-premises, Lambda, ECS | EC2 (managed), Docker on EC2 | ECS tasks only | Any CloudFormation resource | Any server you can reach |
| Deployment Strategies | In-place, blue/green, canary, linear | Rolling, immutable, blue/green | Rolling, blue/green | Stack update (replacement or update) | Whatever you script |
| Rollback | Automatic (failure or alarm), manual | Automatic, environment swap | Circuit breaker, manual | Stack rollback | Manual redeploy |
| Health Monitoring | Lifecycle hooks, CloudWatch alarms | Enhanced health reporting | ELB health checks, container health | Stack events, drift detection | You watching logs |
| Agent Required | Yes (EC2/on-premises only) | No (managed by Beanstalk) | No | No | No |
| Cost | Free for EC2; $0.02/deployment for Lambda/ECS | No additional charge | No additional charge | No additional charge | Your time |
| Complexity | Medium: AppSpec + hooks | Low: platform-managed | Low: task definition update | Medium: template authoring | High: entirely manual |
| Flexibility | High: any app, any language | Medium: supported platforms only | Low: containers only | High: infrastructure + app | Unlimited: but unmanageable |
Architecture Internals
CodeDeploy is built around a hierarchy of concepts that maps directly to how deployments are organized and executed. You need to internalize this hierarchy before you can design deployment pipelines that scale across multiple environments, regions, and teams.
At the top level, you create an Application, which is a logical container that groups everything related to deploying a single application. Within each application, you define Deployment Groups, which represent the target environments: your staging EC2 fleet, your production Lambda function, your ECS service in us-east-1. Each deployment group specifies the target compute, the deployment configuration to use, and optional triggers, alarms, and rollback settings. A Deployment Configuration controls the mechanics of how the deployment proceeds: all at once, one at a time, a percentage at a time, or a custom traffic-shifting schedule. A Revision is the actual artifact being deployed: an S3 bundle, a GitHub commit, or an AppSpec-only definition for Lambda and ECS. When you create a Deployment, CodeDeploy combines a deployment group with a revision and executes the deployment according to the deployment configuration.
flowchart TD
A[Application] --> DG1[Deployment Group
e.g., Production EC2]
A --> DG2[Deployment Group
e.g., Staging EC2]
A --> DG3[Deployment Group
e.g., Lambda Prod]
DG1 --> DC[Deployment
Configuration]
DG1 --> REV[Revision
S3 / GitHub]
DC --> DEP[Deployment]
REV --> DEP
DEP --> I1[Instance 1]
DEP --> I2[Instance 2]
DEP --> I3[Instance N]
DG3 --> DC2[Deployment
Configuration]
DG3 --> REV2[Revision
AppSpec YAML]
DC2 --> DEP2[Deployment]
REV2 --> DEP2
DEP2 --> LF[Lambda Function
Version + Alias] The CodeDeploy Agent
For EC2 and on-premises deployments, the CodeDeploy agent is a mandatory component running on every target instance. The agent is a Ruby-based process that polls the CodeDeploy service every 15 seconds, asking "do you have a deployment for me?" When it receives a deployment command, the agent downloads the revision from S3 or GitHub, parses the AppSpec file, and executes the lifecycle event hooks in order. The agent manages its own log files (found at /var/log/aws/codedeploy-agent/ on Linux and C:\ProgramData\Amazon\CodeDeploy\log\ on Windows), which are your first stop when debugging deployment failures.
The polling model is important to understand. CodeDeploy does not push deployments to instances; instances pull them. This means network connectivity from the instance to the CodeDeploy service endpoint and to S3 (for revision download) is a hard requirement. In VPC environments without internet access, you need VPC endpoints for both codedeploy and s3. Missing either endpoint is one of the most common reasons deployments hang indefinitely with the agent showing no activity.
Lambda and ECS deployments do not use the agent at all. CodeDeploy interacts directly with the Lambda and ECS APIs to shift traffic between function versions or task sets.
| Concept | Definition | Scope |
|---|---|---|
| Application | Logical container for deployment groups and revisions | Per-account, per-region |
| Deployment Group | Set of target instances, Lambda function, or ECS service plus deployment settings | Per-application |
| Deployment Configuration | Rules governing how many targets are updated simultaneously and how traffic shifts | Per-deployment group or per-deployment |
| Revision | The application artifact: S3 bundle (zip/tar), GitHub commit, or AppSpec-only definition | Per-application, versioned in S3 |
| Deployment | A single execution of deploying a specific revision to a specific deployment group | Unique ID, tracks status per target |
| Instance (or Target) | An individual EC2 instance, Lambda function version, or ECS task set receiving the deployment | Per-deployment |
The Three Compute Platforms
CodeDeploy supports three fundamentally different compute platforms, and the deployment model differs so significantly across them that they are practically three different services sharing a control plane. The compute platform you choose determines the AppSpec format, the available lifecycle hooks, the deployment types, and the rollback mechanics.
EC2/On-Premises is the original and most flexible platform. Deployments work by copying files to instances and running scripts at defined lifecycle stages. You have full control over what happens at every step: stop the app, update files, run database migrations, start the app, validate health. The trade-off is complexity: you write and maintain all the hook scripts, and you are responsible for ensuring they are idempotent and handle failure gracefully.
Lambda deployments operate on a completely different model. There are no files to copy and no instances to manage. Instead, CodeDeploy shifts traffic between two versions of a Lambda function by updating an alias. The deployment configuration controls whether traffic shifts all at once, linearly over time, or in a canary pattern. Hook functions (which are themselves Lambda functions) run at defined points to validate the new version before and after traffic shifts.
ECS deployments are conceptually similar to Lambda deployments. CodeDeploy manages traffic shifting between two ECS task sets (the "blue" original and the "green" replacement) behind a load balancer. The deployment creates a new task set running the updated task definition, optionally routes test traffic to it, and then shifts production traffic according to the deployment configuration.
| Dimension | EC2/On-Premises | Lambda | ECS |
|---|---|---|---|
| Deployment Types | In-place, blue/green | Canary, linear, all-at-once | Canary, linear, all-at-once |
| AppSpec Format | YAML: files, permissions, hooks | YAML: resources, hooks | YAML: resources, hooks |
| Agent Required | Yes | No | No |
| Hooks Available | 10+ lifecycle events | BeforeAllowTraffic, AfterAllowTraffic | BeforeInstall, AfterInstall, AfterAllowTestTraffic, BeforeAllowTraffic, AfterAllowTraffic |
| Traffic Shifting | Via load balancer (blue/green only) | Lambda alias weighted routing | ECS task set + ALB/NLB listener rules |
| Rollback Mechanism | Redeploy previous revision (in-place) or reroute traffic (blue/green) | Revert alias to previous version | Reroute traffic to original task set |
| Health Checks | Hook scripts, CloudWatch alarms | Hook Lambda functions, CloudWatch alarms | ELB health checks, hook functions, CloudWatch alarms |
| Blue/Green Support | Yes: provision new ASG or use existing | All deployments are effectively blue/green | All deployments are effectively blue/green |
| Revision Storage | S3 bucket or GitHub repository | AppSpec in S3 or inline | AppSpec in S3 or inline |
| IAM Model | Service role + instance profile | Service role + function execution role | Service role + task execution role + task role |
| Pricing | Free | $0.02 per deployment | $0.02 per deployment |
| Primary Use Case | Traditional server-based apps, legacy migrations | Serverless function version management | Container workload updates |
Deployment Types
CodeDeploy offers two fundamental deployment types for EC2/On-Premises, and a traffic-shifting model for Lambda and ECS that is conceptually always blue/green.
In-Place Deployments
In-place deployments update the application on existing instances without provisioning new ones. The CodeDeploy agent on each instance stops the running application, pulls the new revision, installs the updated files, runs configuration scripts, starts the application, and validates health, all on the same instance. During the update, the instance is typically deregistered from the load balancer to avoid serving requests while the application is down.
In-place deployments are simple and cheap. No spare capacity, no provisioning new instances, no complexity. The trade-off is risk. If the new revision has a bug, rolling back means a full redeployment of the previous revision, which takes just as long as the original deployment. Each instance also goes offline during its update window, though deploying one at a time behind a load balancer mitigates this.
flowchart LR
A[Deregister from
Load Balancer] --> B[ApplicationStop
Hook Scripts]
B --> C[Download
Revision]
C --> D[BeforeInstall
Hook Scripts]
D --> E[Install Files
to Disk]
E --> F[AfterInstall
Hook Scripts]
F --> G[ApplicationStart
Hook Scripts]
G --> H[ValidateService
Hook Scripts]
H --> I[Register with
Load Balancer] Blue/Green Deployments
Blue/green deployments provision a completely new set of instances (the "green" environment), deploy the revision to them, run validation, shift traffic from the old instances (the "blue" environment) to the green, and then terminate the blue instances after a configurable wait period. For EC2 deployments, this requires an Auto Scaling group; CodeDeploy creates a new ASG with the same configuration, deploys to the new instances, and updates the load balancer target group.
I use blue/green for every production workload, full stop. Rollback is near-instantaneous: reroute traffic back to the original instances, which are still running the previous version. No redeployment, no downtime, no waiting. Yes, you run double capacity during the deployment window. Worth it every time.
flowchart LR
A[Provision Green
Instances via ASG] --> B[Deploy Revision
to Green]
B --> C[Run Lifecycle
Hooks on Green]
C --> D[Validate Green
Health]
D --> E[Shift Traffic
Blue to Green]
E --> F[Wait Period
for Observation]
F --> G[Terminate Blue
Instances] For Lambda and ECS, all deployments are inherently blue/green; CodeDeploy creates a new version or task set and shifts traffic according to the deployment configuration. The "in-place" concept does not apply because there are no persistent instances to update.
| Dimension | In-Place | Blue/Green |
|---|---|---|
| Downtime Risk | Yes: each instance is offline during update | No: traffic shifts only after green is healthy |
| Rollback Speed | Slow: requires full redeployment of previous revision | Fast: reroute traffic to original instances (seconds) |
| Cost During Deploy | No additional cost | Double capacity during deployment window |
| Complexity | Low: straightforward file replacement | Medium: requires ASG, load balancer, traffic shifting |
| Compute Platforms | EC2/On-Premises only | EC2/On-Premises, Lambda, ECS |
| Traffic Shifting | Per-instance (deregister/register) | Load balancer rerouting or alias/task set shifting |
| Capacity Requirement | No spare capacity needed | Must have capacity for two full environments |
| Data Migration | Not applicable: same instances, same storage | Must handle shared state (databases, caches) carefully |
The AppSpec File
The AppSpec file (appspec.yml) is the deployment manifest that tells CodeDeploy exactly what to do during a deployment. Its structure varies significantly across the three compute platforms, and getting the format wrong is one of the most common deployment failures.
For EC2/On-Premises, the AppSpec file defines which files from the revision should be copied to which locations on the instance, what permissions to apply, and which scripts to run at each lifecycle hook. It lives at the root of your revision bundle.
For Lambda, the AppSpec file specifies the Lambda function to deploy, the current version and the target version, and any hook functions to run for validation. There are no files to copy. The revision is the AppSpec itself plus a reference to the already-published Lambda function versions.
For ECS, the AppSpec file specifies the ECS task definition, the container name and port that the load balancer routes to, and hook functions for validation. Like Lambda, the revision is primarily the AppSpec plus a reference to the task definition registered separately.
| Section | EC2/On-Premises | Lambda | ECS |
|---|---|---|---|
| version | 0.0 (only valid value) | 0.0 | 0.0 |
| os | linux or windows | Not applicable | Not applicable |
| files | Source-to-destination file mappings | Not applicable | Not applicable |
| permissions | Owner, group, mode for installed files | Not applicable | Not applicable |
| resources | Not applicable | Function name, alias, current version, target version | Task definition ARN, container name, container port |
| hooks | Script-based hooks (path, timeout, runas) | Lambda function ARNs for validation hooks | Lambda function ARNs for validation hooks |
The files section in EC2 deployments maps source paths within your revision archive to destination paths on the instance. For example, mapping source: /config/app.conf to destination: /etc/myapp/ places the file at /etc/myapp/app.conf. The permissions section then lets you set ownership and file modes on the installed files, which is critical for applications that run as non-root users.
For Lambda, the resources section is where you specify the function's name, the alias that CodeDeploy will update, the CurrentVersion (the version currently receiving traffic), and the TargetVersion (the version to shift traffic to). CodeDeploy manages the alias weights to implement the traffic-shifting pattern defined by the deployment configuration.
For ECS, the resources section references the task definition (as a full ARN or <family>:<revision>), the container name that is the target of the load balancer, and the container port. CodeDeploy uses this information to create a replacement task set and configure the load balancer's target groups for traffic shifting.
Lifecycle Event Hooks
Lifecycle events are the mechanism by which CodeDeploy gives you control over what happens at each stage of a deployment. For EC2/On-Premises, hooks are shell scripts (or PowerShell scripts on Windows) that run on the instance. For Lambda and ECS, hooks are Lambda functions that run externally and validate the deployment state.
EC2 In-Place Hook Order
The in-place deployment lifecycle follows a strict sequence. Some events are scriptable (you provide the script), and some are managed by CodeDeploy internally.
flowchart TD
A[ApplicationStop] --> B[DownloadBundle]
B --> C[BeforeInstall]
C --> D[Install]
D --> E[AfterInstall]
E --> F[ApplicationStart]
F --> G[ValidateService]
style B fill:#555,color:#fff
style D fill:#555,color:#fff EC2 Blue/Green Hook Order
Blue/green deployments on EC2 add traffic-management hooks around the core lifecycle. The green instances go through the full install lifecycle, then traffic is blocked to the blue instances and allowed to the green instances.
flowchart TD
A[BeforeInstall] --> B[Install]
B --> C[AfterInstall]
C --> D[ApplicationStart]
D --> E[ValidateService]
E --> F[BeforeBlockTraffic]
F --> G[BlockTraffic]
G --> H[AfterBlockTraffic]
H --> I[BeforeAllowTraffic]
I --> J[AllowTraffic]
J --> K[AfterAllowTraffic]
style B fill:#555,color:#fff
style G fill:#555,color:#fff
style J fill:#555,color:#fff Complete Hook Reference
The following table covers all lifecycle events across all compute platforms. "Scriptable" means you provide the script or Lambda function; "Managed" means CodeDeploy handles it internally.
| Lifecycle Event | EC2 In-Place | EC2 Blue/Green | Lambda | ECS | Type |
|---|---|---|---|---|---|
| ApplicationStop | Yes | No | No | No | Scriptable |
| DownloadBundle | Yes | Yes | No | No | Managed |
| BeforeInstall | Yes | Yes | No | Yes | Scriptable |
| Install | Yes | Yes | No | Yes | Managed |
| AfterInstall | Yes | Yes | No | Yes | Scriptable |
| ApplicationStart | Yes | Yes | No | No | Scriptable |
| ValidateService | Yes | Yes | No | No | Scriptable |
| BeforeBlockTraffic | No | Yes | No | No | Scriptable |
| BlockTraffic | No | Yes | No | No | Managed |
| AfterBlockTraffic | No | Yes | No | No | Scriptable |
| BeforeAllowTraffic | No | Yes | Yes | Yes | Scriptable |
| AllowTraffic | No | Yes | Yes | Yes | Managed |
| AfterAllowTraffic | No | Yes | Yes | Yes | Scriptable |
| AfterAllowTestTraffic | No | No | No | Yes | Scriptable |
For EC2 hook scripts, each hook entry in the AppSpec file specifies three properties: location (the path to the script within the revision), timeout (maximum execution time in seconds, default 3600), and runas (the OS user to run the script as). A hook script signals success with exit code 0 and failure with any non-zero exit code. When a hook script fails, the deployment fails for that instance, and CodeDeploy proceeds according to the deployment configuration's minimum healthy hosts threshold.
For Lambda and ECS, hook functions are Lambda functions that receive a deployment lifecycle event as input and must call back to CodeDeploy with a PutLifecycleEventHookExecutionStatus API call indicating Succeeded or Failed. If the hook function does not call back within the timeout, CodeDeploy treats it as a failure. This callback model is a common source of bugs. Forgetting the callback causes the deployment to hang until timeout and then fail.
Deployment Configurations
Deployment configurations control the pace and pattern of a deployment. They determine how many instances are updated simultaneously (EC2), or how traffic is shifted between versions (Lambda and ECS).
EC2/On-Premises Configurations
For EC2, the deployment configuration defines the minimum number of healthy instances that must remain in service during the deployment. CodeDeploy deploys to instances in batches, ensuring the healthy count never drops below the threshold.
Lambda and ECS Configurations
For Lambda and ECS, deployment configurations control traffic shifting: the percentage of traffic routed to the new version over time. CodeDeploy offers three patterns: All-at-once (immediate 100% shift), Linear (equal increments at regular intervals), and Canary (a small percentage first, then the remainder after a wait period).
| Configuration | Platform | Behavior | Use Case |
|---|---|---|---|
| CodeDeployDefault.AllAtOnce | EC2 | Deploy to all instances simultaneously; succeed if any instance succeeds | Development, testing |
| CodeDeployDefault.HalfAtATime | EC2 | Deploy to up to half the instances at once; maintain 50% healthy | Staging environments |
| CodeDeployDefault.OneAtATime | EC2 | Deploy to one instance at a time; all but one must remain healthy | Production: maximum safety |
| CodeDeployDefault.LambdaAllAtOnce | Lambda | Shift 100% of traffic immediately | Development, testing |
| CodeDeployDefault.LambdaCanary10Percent5Minutes | Lambda | Shift 10% of traffic, wait 5 minutes, shift remaining 90% | Production: quick validation |
| CodeDeployDefault.LambdaCanary10Percent10Minutes | Lambda | Shift 10% of traffic, wait 10 minutes, shift remaining 90% | Production: extended validation |
| CodeDeployDefault.LambdaCanary10Percent15Minutes | Lambda | Shift 10% of traffic, wait 15 minutes, shift remaining 90% | Production: conservative validation |
| CodeDeployDefault.LambdaLinear10PercentEvery1Minute | Lambda | Shift 10% every minute over 10 minutes | Production: gradual rollout |
| CodeDeployDefault.LambdaLinear10PercentEvery2Minutes | Lambda | Shift 10% every 2 minutes over 20 minutes | Production: slower gradual rollout |
| CodeDeployDefault.LambdaLinear10PercentEvery3Minutes | Lambda | Shift 10% every 3 minutes over 30 minutes | Production: most conservative gradual |
| CodeDeployDefault.ECSAllAtOnce | ECS | Shift 100% of traffic immediately | Development, testing |
| CodeDeployDefault.ECSCanary10Percent5Minutes | ECS | Shift 10% of traffic, wait 5 minutes, shift remaining 90% | Production: quick validation |
| CodeDeployDefault.ECSCanary10Percent15Minutes | ECS | Shift 10% of traffic, wait 15 minutes, shift remaining 90% | Production: extended validation |
| CodeDeployDefault.ECSLinear10PercentEvery1Minute | ECS | Shift 10% every minute over 10 minutes | Production: gradual rollout |
| CodeDeployDefault.ECSLinear10PercentEvery3Minutes | ECS | Shift 10% every 3 minutes over 30 minutes | Production: conservative gradual |
You can also create custom deployment configurations for any platform. For EC2, you specify a minimumHealthyHosts value as either a count or percentage. For Lambda and ECS, you define custom linear or canary intervals and percentages. Custom configurations are essential when the built-in options do not match your risk tolerance. For example, you might define a canary that shifts 1% for 30 minutes before proceeding, or a linear that shifts 5% every 10 minutes.
Rollback Strategies
Rollback is where your deployment architecture proves its worth or falls apart. CodeDeploy provides several rollback mechanisms depending on the deployment type and platform.
Automatic Rollback on Deployment Failure
When any instance or target fails its lifecycle hooks, CodeDeploy can automatically roll back the entire deployment. This is configured at the deployment group level. For in-place deployments, the rollback deploys the last known good revision to all instances, essentially a full new deployment in the reverse direction. For blue/green, the rollback reroutes traffic back to the original (blue) environment without redeploying anything.
Automatic Rollback on CloudWatch Alarm
I configure this rollback trigger on every production deployment group. You associate one or more CloudWatch alarms with the deployment group, typically monitoring error rates, latency percentiles, or custom application health metrics. If any alarm enters the ALARM state during the deployment, CodeDeploy automatically triggers a rollback. This catches issues that individual lifecycle hook validations miss, particularly the subtle regressions that only manifest under real production traffic.
Manual Rollback
You can always manually roll back by creating a new deployment that deploys the previous revision. For blue/green deployments that are still in the termination wait period, you can also manually stop the deployment and reroute traffic back to the original environment.
Rollback speed is everything. Blue/green rollbacks take seconds because the original environment is still running. In-place rollbacks take the same amount of time as the original deployment because they are a full redeployment. That speed difference alone justifies blue/green for production.
| Scenario | Deployment Type | Platform | Rollback Behavior | Rollback Speed |
|---|---|---|---|---|
| Hook failure, blue/green | Blue/green | EC2 | Reroute traffic to original instances | Seconds |
| Hook failure, in-place | In-place | EC2 | Redeploy previous revision to all instances | Minutes (full deployment cycle) |
| CloudWatch alarm, blue/green | Blue/green | EC2 | Reroute traffic to original instances | Seconds |
| CloudWatch alarm, Lambda | Canary/Linear | Lambda | Revert alias to previous version | Seconds |
| CloudWatch alarm, ECS | Canary/Linear | ECS | Reroute traffic to original task set | Seconds |
| Manual stop during blue/green | Blue/green | EC2/ECS | Traffic remains on or reverts to original | Seconds (if original still running) |
Security Architecture
CodeDeploy's security model involves multiple IAM roles and policies working in concert. Get any of these wrong and you will spend an hour staring at permission-denied errors with maddeningly generic messages.
The CodeDeploy service role is an IAM role assumed by the CodeDeploy service itself. It needs permissions to interact with the compute resources that are the targets of deployments: reading Auto Scaling group configurations, modifying load balancer target groups, invoking Lambda functions, and managing ECS task sets. This role is attached to the deployment group.
The EC2 instance profile is the IAM role attached to the EC2 instances that are deployment targets. The instances need permission to read revisions from S3 (or access GitHub), communicate with the CodeDeploy service, and write logs to CloudWatch. Without the correct instance profile, the CodeDeploy agent can poll the service but cannot download the revision.
For Lambda deployments, the CodeDeploy service role needs lambda:GetFunction, lambda:GetAlias, lambda:UpdateAlias, and lambda:InvokeFunction permissions on the target function. The hook functions (which are separate Lambda functions) need their own execution roles with permission to call codedeploy:PutLifecycleEventHookExecutionStatus.
For ECS deployments, the service role needs permissions to manage ECS services, task sets, and task definitions, as well as modify load balancer target groups and listeners. The breadth of required permissions for ECS blue/green is significant.
| Relationship | Source | Target | Key Permissions Required |
|---|---|---|---|
| CodeDeploy to EC2 | CodeDeploy service role | EC2, Auto Scaling | ec2:Describe*, autoscaling:*, elasticloadbalancing:*, tag:GetResources |
| CodeDeploy to S3 | EC2 instance profile | S3 revision bucket | s3:GetObject, s3:GetBucketLocation, s3:ListBucket on the revision bucket |
| CodeDeploy to Lambda | CodeDeploy service role | Lambda function | lambda:GetFunction, lambda:GetAlias, lambda:UpdateAlias, lambda:InvokeFunction |
| CodeDeploy to ECS | CodeDeploy service role | ECS, ELB | ecs:*, elasticloadbalancing:*, iam:PassRole for task execution role |
| CodeDeploy to Auto Scaling | CodeDeploy service role | Auto Scaling groups | autoscaling:CompleteLifecycleAction, autoscaling:CreateAutoScalingGroup, autoscaling:DeleteAutoScalingGroup, autoscaling:UpdateAutoScalingGroup |
A common security mistake is granting overly broad s3:GetObject permissions on the instance profile, allowing instances to read any S3 object in the account. Scope the instance profile's S3 permissions to the specific bucket and prefix where revisions are stored.
Cost Model
CodeDeploy's pricing for EC2 workloads is hard to beat: free. Zero charge for deploying to EC2 instances or on-premises servers, regardless of instance count, deployment frequency, or configuration complexity.
For Lambda and ECS deployments, AWS charges $0.02 per deployment. A "deployment" is a single invocation of CreateDeployment, regardless of how many Lambda function invocations or ECS tasks are involved, or how long the traffic-shifting window lasts.
The real cost of CodeDeploy is in the underlying compute, not the service itself. Blue/green EC2 deployments double your instance count during the deployment window. Lambda deployments may run two concurrent versions, consuming concurrency quota. ECS deployments run two task sets simultaneously.
| Scenario | Platform | Deployments/Month | CodeDeploy Cost | Compute Overhead |
|---|---|---|---|---|
| 10-instance EC2, 20 deploys/month | EC2 | 20 | $0.00 | Blue/green: ~$0 (minutes of double capacity) |
| 50-instance EC2, 100 deploys/month | EC2 | 100 | $0.00 | Blue/green: minutes of double capacity per deploy |
| 5 Lambda functions, 200 deploys/month | Lambda | 200 | $4.00 | Minimal: concurrent versions share concurrency pool |
| 10 ECS services, 150 deploys/month | ECS | 150 | $3.00 | Double task count during traffic shifting (minutes) |
| Mixed: 20 EC2 + 10 Lambda + 5 ECS, 300 deploys/month | Mixed | 300 | $3.00 (Lambda + ECS only) | Variable by platform |
At these price points, CodeDeploy itself is a rounding error in your AWS bill. The real cost discussion centers on compute overhead during blue/green deployments, the operational burden of maintaining hook scripts and AppSpec files, and (most overlooked) the opportunity cost of not having automated deployments at all.
Common Failure Modes
These are the failure modes I have encountered repeatedly in production. Every one of them has cost teams hours of debugging time.
| Failure Mode | Symptom | Root Cause | Mitigation |
|---|---|---|---|
| Agent not running | Deployment hangs indefinitely; instance shows "Pending" | CodeDeploy agent crashed, was never installed, or cannot reach the CodeDeploy endpoint | Monitor agent status with CloudWatch agent or Systems Manager; ensure VPC endpoints exist for codedeploy-agent and s3 |
| AppSpec parse error | Deployment fails immediately with "Invalid AppSpec" | YAML syntax error, wrong version field, missing required section, incorrect indentation | Validate AppSpec locally with aws deploy push --dry-run; use a YAML linter in CI |
| Hook script timeout | Deployment fails after the configured timeout (default 3600s) | Script hangs on a network call, waits for user input, or enters an infinite loop | Set explicit, shorter timeouts per hook; ensure scripts have proper error handling and exit conditions |
| Health check failure | Instance fails ValidateService hook | Application did not start correctly, port not listening, dependency unavailable | Make ValidateService scripts check actual application health (HTTP endpoint, process status, port availability) beyond file existence |
| IAM permission denied | Deployment fails with "AccessDenied" during DownloadBundle or AllowTraffic | Instance profile lacks S3 read permissions, or service role lacks ELB/ASG/Lambda/ECS permissions | Use IAM Access Analyzer to audit roles; test with minimal permissions first, then restrict |
| Auto-rollback loop | Deployment rolls back, next deployment rolls back, cycle repeats | CloudWatch alarm stays in ALARM state from the previous failure; new deployment immediately triggers alarm-based rollback | Reset alarm state before redeploying; add an alarm evaluation period that spans the deployment warmup time |
| AllowTraffic timeout | Blue/green deployment hangs at AllowTraffic stage | Target group health check failing on new instances (wrong health check path, port, or protocol); security group blocking health check traffic | Verify target group health check configuration matches the application; check security groups allow health check traffic from the load balancer |
| Revision not found in S3 | Deployment fails with "Revision does not exist" | S3 key path is wrong, bucket is in a different region, revision was deleted by lifecycle policy, or ETag mismatch | Use aws deploy push to register revisions properly; set S3 lifecycle policies carefully; keep at least N previous revisions |
| Insufficient blue/green capacity | Blue/green deployment fails to provision green instances | ASG launch template references an instance type with no capacity in the AZ, or account EC2 limits reached | Use multiple instance types in the launch template; request limit increases proactively; test deployments in staging first |
| CodeDeploy throttling | API calls return "ThrottlingException"; deployments queued | Too many concurrent deployments, too many GetDeployment polling calls, or deployment history queries at scale | Stagger deployments across deployment groups; use exponential backoff in automation scripts; avoid polling deployment status in tight loops |
Key Architectural Recommendations
These recommendations come from years of operating CodeDeploy across production environments of various scales.
- Always use blue/green deployments for production. The near-instantaneous rollback capability alone justifies the temporary cost of double capacity. In-place deployments are acceptable for development and testing environments, but production workloads should never tolerate the slow rollback cycle of in-place.
- Use lifecycle hooks for smoke tests beyond file installation. The
ValidateServicehook (EC2) andAfterAllowTestTraffichook (ECS) exist specifically for running real health checks against the newly deployed application. Hit the health endpoint, verify the version response, run a subset of integration tests. A deployment that installs files correctly but leaves the application in a broken state is worse than a deployment that fails outright. - Configure CloudWatch alarm-based auto-rollback on every production deployment group. This is the safety net that catches regressions your smoke tests miss. Monitor error rates (5xx responses), latency (p99), and any custom application health metrics. An alarm that fires within the first 10 minutes of a deployment and triggers an automatic rollback has saved me from more incidents than I can count.
- Keep hook scripts idempotent. Hook scripts can execute multiple times during rollbacks and redeployments. If your
AfterInstallscript creates a database table, it must handle the case where the table already exists. If it starts a background process, it must handle the case where the process is already running. Non-idempotent hooks cause cascading failures during rollback. - Use separate deployment groups per environment, not separate applications. A single CodeDeploy application with deployment groups for dev, staging, and production keeps your revision history unified and makes it easy to promote the same revision through environments. Using separate applications fragments your deployment history and makes it harder to track which revision is where.
- Use linear traffic shifting for Lambda and ECS production deployments. Canary is tempting because it validates quickly, but linear shifting gives you a gradual ramp that catches load-dependent issues that a 10% canary might miss.
Linear10PercentEvery3Minutesis my default for production: 30 minutes of gradual rollout with ample time for alarms to fire. - Monitor CodeDeploy agent status on EC2 instances continuously. A dead agent means the instance silently drops out of the deployment pool. Use Systems Manager Run Command to periodically check agent status, or deploy a CloudWatch agent metric that reports agent health. An instance without a running CodeDeploy agent is an instance that will never receive deployments.
- Use CodeDeploy for Lambda deployments over in-console alias flipping. Manual alias updates in the Lambda console provide zero rollback capability, no traffic shifting, and no validation hooks. CodeDeploy's Lambda deployment model adds canary/linear traffic shifting, automatic rollback on CloudWatch alarms, and lifecycle hooks for validation, all for $0.02 per deployment. There is no reason to deploy Lambda functions any other way in production.
For related deployment pipeline architecture, see AWS CodeBuild: An Architecture Deep-Dive for build-stage patterns and AWS CodePipeline: An Architecture Deep-Dive for end-to-end pipeline orchestration.
Additional Resources
- AWS CodeDeploy User Guide: Comprehensive service documentation
- AppSpec File Reference: Complete AppSpec syntax for all three compute platforms
- Deployment Configurations: Built-in and custom configuration reference
- Lifecycle Event Hooks: Hook reference by compute platform and deployment type
- Tutorial: Deploy a Lambda Function with CodeDeploy: Lambda-specific deployment walkthrough
- Tutorial: Deploy an ECS Service with CodeDeploy: ECS blue/green deployment walkthrough
- CodeDeploy Agent Reference: Agent installation, configuration, and troubleshooting
- Working with Deployment Groups: Deployment group configuration and management
Let's Build Something!
I help teams ship cloud infrastructure that actually works at scale. Whether you're modernizing a legacy platform, designing a multi-region architecture from scratch, or figuring out how AI fits into your engineering workflow, I've seen your problem before. Let me help.
Currently taking on select consulting engagements through Vantalect.

