AWS CodeDeploy: An Architecture Deep-Dive

Deployment automation is the single most impactful investment a team can make in operational reliability. Manual deployments (SSH into a box, pull the latest code, restart the service, pray) are slow, and they are the root cause of a disproportionate number of production incidents. Every manual step is an opportunity for human error: the wrong branch, a missed configuration file, a forgotten service restart, a deployment to the wrong environment. Having spent years building and operating deployment pipelines across hundreds of EC2 instances, Lambda functions, and ECS services, I have watched CodeDeploy evolve from a simple EC2 deployment tool into the foundational deployment engine that underpins most serious AWS CI/CD architectures. It lacks glamour and thorough documentation of its deeper behaviors, yet it is the service that actually puts your code onto your compute.

This article is an architecture reference for engineers and architects who need to understand how CodeDeploy works under the hood: the component hierarchy, the lifecycle event model, the deployment strategies per compute platform, the failure modes that will bite you in production, and the architectural patterns that keep deployments safe at scale.

What CodeDeploy Actually Is

AWS CodeDeploy is a fully managed deployment service that automates software deployments to Amazon EC2 instances, on-premises servers, AWS Lambda functions, and Amazon ECS services. It coordinates the entire deployment process, from pulling your application revision out of S3 or GitHub, to stopping your running application, installing the new files, restarting services, running validation scripts, and shifting traffic, all without requiring you to SSH into anything or manually orchestrate any step.

CodeDeploy is deliberately unopinionated about what you deploy. Java WAR file, Node.js bundle, compiled Go binary, static assets, Docker image reference: it does not care. No imposed framework, no required project structure, no mandated build system. You tell it what files to put where (via the AppSpec file), what scripts to run at each lifecycle stage, and how to shift traffic. Everything else is your business.

This generality drives most of the confusion around the service. Elastic Beanstalk provides a complete platform abstraction. ECS service updates are tightly coupled to the container orchestration model. CodeDeploy is neither. It is a deployment primitive, a building block you compose into your own deployment architecture.

Dimension	CodeDeploy	Elastic Beanstalk	ECS Service Update	CloudFormation Deploy	Manual (SSH/SCP)
Compute Targets	EC2, on-premises, Lambda, ECS	EC2 (managed), Docker on EC2	ECS tasks only	Any CloudFormation resource	Any server you can reach
Deployment Strategies	In-place, blue/green, canary, linear	Rolling, immutable, blue/green	Rolling, blue/green	Stack update (replacement or update)	Whatever you script
Rollback	Automatic (failure or alarm), manual	Automatic, environment swap	Circuit breaker, manual	Stack rollback	Manual redeploy
Health Monitoring	Lifecycle hooks, CloudWatch alarms	Enhanced health reporting	ELB health checks, container health	Stack events, drift detection	You watching logs
Agent Required	Yes (EC2/on-premises only)	No (managed by Beanstalk)	No	No	No
Cost	Free for EC2; $0.02/deployment for Lambda/ECS	No additional charge	No additional charge	No additional charge	Your time
Complexity	Medium: AppSpec + hooks	Low: platform-managed	Low: task definition update	Medium: template authoring	High: entirely manual
Flexibility	High: any app, any language	Medium: supported platforms only	Low: containers only	High: infrastructure + app	Unlimited: but unmanageable

Architecture Internals

CodeDeploy is built around a hierarchy of concepts that maps directly to how deployments are organized and executed. You need to internalize this hierarchy before you can design deployment pipelines that scale across multiple environments, regions, and teams.

At the top level, you create an Application, which is a logical container that groups everything related to deploying a single application. Within each application, you define Deployment Groups, which represent the target environments: your staging EC2 fleet, your production Lambda function, your ECS service in us-east-1. Each deployment group specifies the target compute, the deployment configuration to use, and optional triggers, alarms, and rollback settings. A Deployment Configuration controls the mechanics of how the deployment proceeds: all at once, one at a time, a percentage at a time, or a custom traffic-shifting schedule. A Revision is the actual artifact being deployed: an S3 bundle, a GitHub commit, or an AppSpec-only definition for Lambda and ECS. When you create a Deployment, CodeDeploy combines a deployment group with a revision and executes the deployment according to the deployment configuration.

CodeDeploy component hierarchy

The CodeDeploy Agent

For EC2 and on-premises deployments, the CodeDeploy agent is a mandatory component running on every target instance. The agent is a Ruby-based process that polls the CodeDeploy service every 15 seconds, asking "do you have a deployment for me?" When it receives a deployment command, the agent downloads the revision from S3 or GitHub, parses the AppSpec file, and executes the lifecycle event hooks in order. The agent manages its own log files (found at /var/log/aws/codedeploy-agent/ on Linux and C:\ProgramData\Amazon\CodeDeploy\log\ on Windows), which are your first stop when debugging deployment failures.

The polling model is important to understand. CodeDeploy does not push deployments to instances; instances pull them. This means network connectivity from the instance to the CodeDeploy service endpoint and to S3 (for revision download) is a hard requirement. In VPC environments without internet access, you need VPC endpoints for both codedeploy and s3. Missing either endpoint is one of the most common reasons deployments hang indefinitely with the agent showing no activity.

Lambda and ECS deployments do not use the agent at all. CodeDeploy interacts directly with the Lambda and ECS APIs to shift traffic between function versions or task sets.

Concept	Definition	Scope
Application	Logical container for deployment groups and revisions	Per-account, per-region
Deployment Group	Set of target instances, Lambda function, or ECS service plus deployment settings	Per-application
Deployment Configuration	Rules governing how many targets are updated simultaneously and how traffic shifts	Per-deployment group or per-deployment
Revision	The application artifact: S3 bundle (zip/tar), GitHub commit, or AppSpec-only definition	Per-application, versioned in S3
Deployment	A single execution of deploying a specific revision to a specific deployment group	Unique ID, tracks status per target
Instance (or Target)	An individual EC2 instance, Lambda function version, or ECS task set receiving the deployment	Per-deployment

The Three Compute Platforms

CodeDeploy supports three fundamentally different compute platforms, and the deployment model differs so significantly across them that they are practically three different services sharing a control plane. The compute platform you choose determines the AppSpec format, the available lifecycle hooks, the deployment types, and the rollback mechanics.

EC2/On-Premises is the original and most flexible platform. Deployments work by copying files to instances and running scripts at defined lifecycle stages. You have full control over what happens at every step: stop the app, update files, run database migrations, start the app, validate health. The trade-off is complexity: you write and maintain all the hook scripts, and you are responsible for ensuring they are idempotent and handle failure gracefully.

Lambda deployments operate on a completely different model. There are no files to copy and no instances to manage. Instead, CodeDeploy shifts traffic between two versions of a Lambda function by updating an alias. The deployment configuration controls whether traffic shifts all at once, linearly over time, or in a canary pattern. Hook functions (which are themselves Lambda functions) run at defined points to validate the new version before and after traffic shifts.

ECS deployments are conceptually similar to Lambda deployments. CodeDeploy manages traffic shifting between two ECS task sets (the "blue" original and the "green" replacement) behind a load balancer. The deployment creates a new task set running the updated task definition, optionally routes test traffic to it, and then shifts production traffic according to the deployment configuration.

Dimension	EC2/On-Premises	Lambda	ECS
Deployment Types	In-place, blue/green	Canary, linear, all-at-once	Canary, linear, all-at-once
AppSpec Format	YAML: files, permissions, hooks	YAML: resources, hooks	YAML: resources, hooks
Agent Required	Yes	No	No
Hooks Available	10+ lifecycle events	BeforeAllowTraffic, AfterAllowTraffic	BeforeInstall, AfterInstall, AfterAllowTestTraffic, BeforeAllowTraffic, AfterAllowTraffic
Traffic Shifting	Via load balancer (blue/green only)	Lambda alias weighted routing	ECS task set + ALB/NLB listener rules
Rollback Mechanism	Redeploy previous revision (in-place) or reroute traffic (blue/green)	Revert alias to previous version	Reroute traffic to original task set
Health Checks	Hook scripts, CloudWatch alarms	Hook Lambda functions, CloudWatch alarms	ELB health checks, hook functions, CloudWatch alarms
Blue/Green Support	Yes: provision new ASG or use existing	All deployments are effectively blue/green	All deployments are effectively blue/green
Revision Storage	S3 bucket or GitHub repository	AppSpec in S3 or inline	AppSpec in S3 or inline
IAM Model	Service role + instance profile	Service role + function execution role	Service role + task execution role + task role
Pricing	Free	$0.02 per deployment	$0.02 per deployment
Primary Use Case	Traditional server-based apps, legacy migrations	Serverless function version management	Container workload updates

Deployment Types

CodeDeploy offers two fundamental deployment types for EC2/On-Premises, and a traffic-shifting model for Lambda and ECS that is conceptually always blue/green.

In-Place Deployments

In-place deployments update the application on existing instances without provisioning new ones. The CodeDeploy agent on each instance stops the running application, pulls the new revision, installs the updated files, runs configuration scripts, starts the application, and validates health, all on the same instance. During the update, the instance is typically deregistered from the load balancer to avoid serving requests while the application is down.

In-place deployments are simple and cheap. No spare capacity, no provisioning new instances, no complexity. The trade-off is risk. If the new revision has a bug, rolling back means a full redeployment of the previous revision, which takes just as long as the original deployment. Each instance also goes offline during its update window, though deploying one at a time behind a load balancer mitigates this.

In-place deployment flow

Blue/Green Deployments

Blue/green deployments provision a completely new set of instances (the "green" environment), deploy the revision to them, run validation, shift traffic from the old instances (the "blue" environment) to the green, and then terminate the blue instances after a configurable wait period. For EC2 deployments, this requires an Auto Scaling group; CodeDeploy creates a new ASG with the same configuration, deploys to the new instances, and updates the load balancer target group.

I use blue/green for every production workload, full stop. Rollback is near-instantaneous: reroute traffic back to the original instances, which are still running the previous version. No redeployment, no downtime, no waiting. Yes, you run double capacity during the deployment window. Worth it every time.

Blue/green deployment flow

For Lambda and ECS, all deployments are inherently blue/green; CodeDeploy creates a new version or task set and shifts traffic according to the deployment configuration. The "in-place" concept does not apply because there are no persistent instances to update.

Dimension	In-Place	Blue/Green
Downtime Risk	Yes: each instance is offline during update	No: traffic shifts only after green is healthy
Rollback Speed	Slow: requires full redeployment of previous revision	Fast: reroute traffic to original instances (seconds)
Cost During Deploy	No additional cost	Double capacity during deployment window
Complexity	Low: straightforward file replacement	Medium: requires ASG, load balancer, traffic shifting
Compute Platforms	EC2/On-Premises only	EC2/On-Premises, Lambda, ECS
Traffic Shifting	Per-instance (deregister/register)	Load balancer rerouting or alias/task set shifting
Capacity Requirement	No spare capacity needed	Must have capacity for two full environments
Data Migration	Not applicable: same instances, same storage	Must handle shared state (databases, caches) carefully

The AppSpec File

The AppSpec file (appspec.yml) is the deployment manifest that tells CodeDeploy exactly what to do during a deployment. Its structure varies significantly across the three compute platforms, and getting the format wrong is one of the most common deployment failures.

For EC2/On-Premises, the AppSpec file defines which files from the revision should be copied to which locations on the instance, what permissions to apply, and which scripts to run at each lifecycle hook. It lives at the root of your revision bundle.

For Lambda, the AppSpec file specifies the Lambda function to deploy, the current version and the target version, and any hook functions to run for validation. There are no files to copy. The revision is the AppSpec itself plus a reference to the already-published Lambda function versions.

For ECS, the AppSpec file specifies the ECS task definition, the container name and port that the load balancer routes to, and hook functions for validation. Like Lambda, the revision is primarily the AppSpec plus a reference to the task definition registered separately.

Section	EC2/On-Premises	Lambda	ECS
version	`0.0` (only valid value)	`0.0`	`0.0`
os	`linux` or `windows`	Not applicable	Not applicable
files	Source-to-destination file mappings	Not applicable	Not applicable
permissions	Owner, group, mode for installed files	Not applicable	Not applicable
resources	Not applicable	Function name, alias, current version, target version	Task definition ARN, container name, container port
hooks	Script-based hooks (path, timeout, runas)	Lambda function ARNs for validation hooks	Lambda function ARNs for validation hooks

The files section in EC2 deployments maps source paths within your revision archive to destination paths on the instance. For example, mapping source: /config/app.conf to destination: /etc/myapp/ places the file at /etc/myapp/app.conf. The permissions section then lets you set ownership and file modes on the installed files, which is critical for applications that run as non-root users.

For Lambda, the resources section is where you specify the function's name, the alias that CodeDeploy will update, the CurrentVersion (the version currently receiving traffic), and the TargetVersion (the version to shift traffic to). CodeDeploy manages the alias weights to implement the traffic-shifting pattern defined by the deployment configuration.

For ECS, the resources section references the task definition (as a full ARN or <family>:<revision>), the container name that is the target of the load balancer, and the container port. CodeDeploy uses this information to create a replacement task set and configure the load balancer's target groups for traffic shifting.

Lifecycle Event Hooks

Lifecycle events are the mechanism by which CodeDeploy gives you control over what happens at each stage of a deployment. For EC2/On-Premises, hooks are shell scripts (or PowerShell scripts on Windows) that run on the instance. For Lambda and ECS, hooks are Lambda functions that run externally and validate the deployment state.

EC2 In-Place Hook Order

The in-place deployment lifecycle follows a strict sequence. Some events are scriptable (you provide the script), and some are managed by CodeDeploy internally.

EC2 in-place deployment lifecycle event order

EC2 Blue/Green Hook Order

Blue/green deployments on EC2 add traffic-management hooks around the core lifecycle. The green instances go through the full install lifecycle, then traffic is blocked to the blue instances and allowed to the green instances.

EC2 blue/green deployment lifecycle event order

Complete Hook Reference

The following table covers all lifecycle events across all compute platforms. "Scriptable" means you provide the script or Lambda function; "Managed" means CodeDeploy handles it internally.

Lifecycle Event	EC2 In-Place	EC2 Blue/Green	Lambda	ECS	Type
ApplicationStop	Yes	No	No	No	Scriptable
DownloadBundle	Yes	Yes	No	No	Managed
BeforeInstall	Yes	Yes	No	Yes	Scriptable
Install	Yes	Yes	No	Yes	Managed
AfterInstall	Yes	Yes	No	Yes	Scriptable
ApplicationStart	Yes	Yes	No	No	Scriptable
ValidateService	Yes	Yes	No	No	Scriptable
BeforeBlockTraffic	No	Yes	No	No	Scriptable
BlockTraffic	No	Yes	No	No	Managed
AfterBlockTraffic	No	Yes	No	No	Scriptable
BeforeAllowTraffic	No	Yes	Yes	Yes	Scriptable
AllowTraffic	No	Yes	Yes	Yes	Managed
AfterAllowTraffic	No	Yes	Yes	Yes	Scriptable
AfterAllowTestTraffic	No	No	No	Yes	Scriptable

For EC2 hook scripts, each hook entry in the AppSpec file specifies three properties: location (the path to the script within the revision), timeout (maximum execution time in seconds, default 3600), and runas (the OS user to run the script as). A hook script signals success with exit code 0 and failure with any non-zero exit code. When a hook script fails, the deployment fails for that instance, and CodeDeploy proceeds according to the deployment configuration's minimum healthy hosts threshold.

For Lambda and ECS, hook functions are Lambda functions that receive a deployment lifecycle event as input and must call back to CodeDeploy with a PutLifecycleEventHookExecutionStatus API call indicating Succeeded or Failed. If the hook function does not call back within the timeout, CodeDeploy treats it as a failure. This callback model is a common source of bugs. Forgetting the callback causes the deployment to hang until timeout and then fail.

Deployment Configurations

Deployment configurations control the pace and pattern of a deployment. They determine how many instances are updated simultaneously (EC2), or how traffic is shifted between versions (Lambda and ECS).

EC2/On-Premises Configurations

For EC2, the deployment configuration defines the minimum number of healthy instances that must remain in service during the deployment. CodeDeploy deploys to instances in batches, ensuring the healthy count never drops below the threshold.

Lambda and ECS Configurations

For Lambda and ECS, deployment configurations control traffic shifting: the percentage of traffic routed to the new version over time. CodeDeploy offers three patterns: All-at-once (immediate 100% shift), Linear (equal increments at regular intervals), and Canary (a small percentage first, then the remainder after a wait period).

Configuration	Platform	Behavior	Use Case
CodeDeployDefault.AllAtOnce	EC2	Deploy to all instances simultaneously; succeed if any instance succeeds	Development, testing
CodeDeployDefault.HalfAtATime	EC2	Deploy to up to half the instances at once; maintain 50% healthy	Staging environments
CodeDeployDefault.OneAtATime	EC2	Deploy to one instance at a time; all but one must remain healthy	Production: maximum safety
CodeDeployDefault.LambdaAllAtOnce	Lambda	Shift 100% of traffic immediately	Development, testing
CodeDeployDefault.LambdaCanary10Percent5Minutes	Lambda	Shift 10% of traffic, wait 5 minutes, shift remaining 90%	Production: quick validation
CodeDeployDefault.LambdaCanary10Percent10Minutes	Lambda	Shift 10% of traffic, wait 10 minutes, shift remaining 90%	Production: extended validation
CodeDeployDefault.LambdaCanary10Percent15Minutes	Lambda	Shift 10% of traffic, wait 15 minutes, shift remaining 90%	Production: conservative validation
CodeDeployDefault.LambdaLinear10PercentEvery1Minute	Lambda	Shift 10% every minute over 10 minutes	Production: gradual rollout
CodeDeployDefault.LambdaLinear10PercentEvery2Minutes	Lambda	Shift 10% every 2 minutes over 20 minutes	Production: slower gradual rollout
CodeDeployDefault.LambdaLinear10PercentEvery3Minutes	Lambda	Shift 10% every 3 minutes over 30 minutes	Production: most conservative gradual
CodeDeployDefault.ECSAllAtOnce	ECS	Shift 100% of traffic immediately	Development, testing
CodeDeployDefault.ECSCanary10Percent5Minutes	ECS	Shift 10% of traffic, wait 5 minutes, shift remaining 90%	Production: quick validation
CodeDeployDefault.ECSCanary10Percent15Minutes	ECS	Shift 10% of traffic, wait 15 minutes, shift remaining 90%	Production: extended validation
CodeDeployDefault.ECSLinear10PercentEvery1Minute	ECS	Shift 10% every minute over 10 minutes	Production: gradual rollout
CodeDeployDefault.ECSLinear10PercentEvery3Minutes	ECS	Shift 10% every 3 minutes over 30 minutes	Production: conservative gradual

You can also create custom deployment configurations for any platform. For EC2, you specify a minimumHealthyHosts value as either a count or percentage. For Lambda and ECS, you define custom linear or canary intervals and percentages. Custom configurations are essential when the built-in options do not match your risk tolerance. For example, you might define a canary that shifts 1% for 30 minutes before proceeding, or a linear that shifts 5% every 10 minutes.

Rollback Strategies

Rollback is where your deployment architecture proves its worth or falls apart. CodeDeploy provides several rollback mechanisms depending on the deployment type and platform.

Automatic Rollback on Deployment Failure

When any instance or target fails its lifecycle hooks, CodeDeploy can automatically roll back the entire deployment. This is configured at the deployment group level. For in-place deployments, the rollback deploys the last known good revision to all instances, essentially a full new deployment in the reverse direction. For blue/green, the rollback reroutes traffic back to the original (blue) environment without redeploying anything.

Automatic Rollback on CloudWatch Alarm

I configure this rollback trigger on every production deployment group. You associate one or more CloudWatch alarms with the deployment group, typically monitoring error rates, latency percentiles, or custom application health metrics. If any alarm enters the ALARM state during the deployment, CodeDeploy automatically triggers a rollback. This catches issues that individual lifecycle hook validations miss, particularly the subtle regressions that only manifest under real production traffic.

Manual Rollback

You can always manually roll back by creating a new deployment that deploys the previous revision. For blue/green deployments that are still in the termination wait period, you can also manually stop the deployment and reroute traffic back to the original environment.

Rollback speed is everything. Blue/green rollbacks take seconds because the original environment is still running. In-place rollbacks take the same amount of time as the original deployment because they are a full redeployment. That speed difference alone justifies blue/green for production.

Scenario	Deployment Type	Platform	Rollback Behavior	Rollback Speed
Hook failure, blue/green	Blue/green	EC2	Reroute traffic to original instances	Seconds
Hook failure, in-place	In-place	EC2	Redeploy previous revision to all instances	Minutes (full deployment cycle)
CloudWatch alarm, blue/green	Blue/green	EC2	Reroute traffic to original instances	Seconds
CloudWatch alarm, Lambda	Canary/Linear	Lambda	Revert alias to previous version	Seconds
CloudWatch alarm, ECS	Canary/Linear	ECS	Reroute traffic to original task set	Seconds
Manual stop during blue/green	Blue/green	EC2/ECS	Traffic remains on or reverts to original	Seconds (if original still running)

Security Architecture

CodeDeploy's security model involves multiple IAM roles and policies working in concert. Get any of these wrong and you will spend an hour staring at permission-denied errors with maddeningly generic messages.

The CodeDeploy service role is an IAM role assumed by the CodeDeploy service itself. It needs permissions to interact with the compute resources that are the targets of deployments: reading Auto Scaling group configurations, modifying load balancer target groups, invoking Lambda functions, and managing ECS task sets. This role is attached to the deployment group.

The EC2 instance profile is the IAM role attached to the EC2 instances that are deployment targets. The instances need permission to read revisions from S3 (or access GitHub), communicate with the CodeDeploy service, and write logs to CloudWatch. Without the correct instance profile, the CodeDeploy agent can poll the service but cannot download the revision.

For Lambda deployments, the CodeDeploy service role needs lambda:GetFunction, lambda:GetAlias, lambda:UpdateAlias, and lambda:InvokeFunction permissions on the target function. The hook functions (which are separate Lambda functions) need their own execution roles with permission to call codedeploy:PutLifecycleEventHookExecutionStatus.

For ECS deployments, the service role needs permissions to manage ECS services, task sets, and task definitions, as well as modify load balancer target groups and listeners. The breadth of required permissions for ECS blue/green is significant.

Relationship	Source	Target	Key Permissions Required
CodeDeploy to EC2	CodeDeploy service role	EC2, Auto Scaling	`ec2:Describe`, `autoscaling:`, `elasticloadbalancing:*`, `tag:GetResources`
CodeDeploy to S3	EC2 instance profile	S3 revision bucket	`s3:GetObject`, `s3:GetBucketLocation`, `s3:ListBucket` on the revision bucket
CodeDeploy to Lambda	CodeDeploy service role	Lambda function	`lambda:GetFunction`, `lambda:GetAlias`, `lambda:UpdateAlias`, `lambda:InvokeFunction`
CodeDeploy to ECS	CodeDeploy service role	ECS, ELB	`ecs:`, `elasticloadbalancing:`, `iam:PassRole` for task execution role
CodeDeploy to Auto Scaling	CodeDeploy service role	Auto Scaling groups	`autoscaling:CompleteLifecycleAction`, `autoscaling:CreateAutoScalingGroup`, `autoscaling:DeleteAutoScalingGroup`, `autoscaling:UpdateAutoScalingGroup`

A common security mistake is granting overly broad s3:GetObject permissions on the instance profile, allowing instances to read any S3 object in the account. Scope the instance profile's S3 permissions to the specific bucket and prefix where revisions are stored.

Cost Model

CodeDeploy's pricing for EC2 workloads is hard to beat: free. Zero charge for deploying to EC2 instances or on-premises servers, regardless of instance count, deployment frequency, or configuration complexity.

For Lambda and ECS deployments, AWS charges $0.02 per deployment. A "deployment" is a single invocation of CreateDeployment, regardless of how many Lambda function invocations or ECS tasks are involved, or how long the traffic-shifting window lasts.

The real cost of CodeDeploy is in the underlying compute, not the service itself. Blue/green EC2 deployments double your instance count during the deployment window. Lambda deployments may run two concurrent versions, consuming concurrency quota. ECS deployments run two task sets simultaneously.

Scenario	Platform	Deployments/Month	CodeDeploy Cost	Compute Overhead
10-instance EC2, 20 deploys/month	EC2	20	$0.00	Blue/green: ~$0 (minutes of double capacity)
50-instance EC2, 100 deploys/month	EC2	100	$0.00	Blue/green: minutes of double capacity per deploy
5 Lambda functions, 200 deploys/month	Lambda	200	$4.00	Minimal: concurrent versions share concurrency pool
10 ECS services, 150 deploys/month	ECS	150	$3.00	Double task count during traffic shifting (minutes)
Mixed: 20 EC2 + 10 Lambda + 5 ECS, 300 deploys/month	Mixed	300	$3.00 (Lambda + ECS only)	Variable by platform

At these price points, CodeDeploy itself is a rounding error in your AWS bill. The real cost discussion centers on compute overhead during blue/green deployments, the operational burden of maintaining hook scripts and AppSpec files, and (most overlooked) the opportunity cost of not having automated deployments at all.

Common Failure Modes

These are the failure modes I have encountered repeatedly in production. Every one of them has cost teams hours of debugging time.

Failure Mode	Symptom	Root Cause	Mitigation
Agent not running	Deployment hangs indefinitely; instance shows "Pending"	CodeDeploy agent crashed, was never installed, or cannot reach the CodeDeploy endpoint	Monitor agent status with CloudWatch agent or Systems Manager; ensure VPC endpoints exist for `codedeploy-agent` and `s3`
AppSpec parse error	Deployment fails immediately with "Invalid AppSpec"	YAML syntax error, wrong version field, missing required section, incorrect indentation	Validate AppSpec locally with `aws deploy push --dry-run`; use a YAML linter in CI
Hook script timeout	Deployment fails after the configured timeout (default 3600s)	Script hangs on a network call, waits for user input, or enters an infinite loop	Set explicit, shorter timeouts per hook; ensure scripts have proper error handling and exit conditions
Health check failure	Instance fails ValidateService hook	Application did not start correctly, port not listening, dependency unavailable	Make ValidateService scripts check actual application health (HTTP endpoint, process status, port availability) beyond file existence
IAM permission denied	Deployment fails with "AccessDenied" during DownloadBundle or AllowTraffic	Instance profile lacks S3 read permissions, or service role lacks ELB/ASG/Lambda/ECS permissions	Use IAM Access Analyzer to audit roles; test with minimal permissions first, then restrict
Auto-rollback loop	Deployment rolls back, next deployment rolls back, cycle repeats	CloudWatch alarm stays in ALARM state from the previous failure; new deployment immediately triggers alarm-based rollback	Reset alarm state before redeploying; add an alarm evaluation period that spans the deployment warmup time
AllowTraffic timeout	Blue/green deployment hangs at AllowTraffic stage	Target group health check failing on new instances (wrong health check path, port, or protocol); security group blocking health check traffic	Verify target group health check configuration matches the application; check security groups allow health check traffic from the load balancer
Revision not found in S3	Deployment fails with "Revision does not exist"	S3 key path is wrong, bucket is in a different region, revision was deleted by lifecycle policy, or ETag mismatch	Use `aws deploy push` to register revisions properly; set S3 lifecycle policies carefully; keep at least N previous revisions
Insufficient blue/green capacity	Blue/green deployment fails to provision green instances	ASG launch template references an instance type with no capacity in the AZ, or account EC2 limits reached	Use multiple instance types in the launch template; request limit increases proactively; test deployments in staging first
CodeDeploy throttling	API calls return "ThrottlingException"; deployments queued	Too many concurrent deployments, too many `GetDeployment` polling calls, or deployment history queries at scale	Stagger deployments across deployment groups; use exponential backoff in automation scripts; avoid polling deployment status in tight loops

Key Architectural Recommendations

These recommendations come from years of operating CodeDeploy across production environments of various scales.

Always use blue/green deployments for production. The near-instantaneous rollback capability alone justifies the temporary cost of double capacity. In-place deployments are acceptable for development and testing environments, but production workloads should never tolerate the slow rollback cycle of in-place.
Use lifecycle hooks for smoke tests beyond file installation. The ValidateService hook (EC2) and AfterAllowTestTraffic hook (ECS) exist specifically for running real health checks against the newly deployed application. Hit the health endpoint, verify the version response, run a subset of integration tests. A deployment that installs files correctly but leaves the application in a broken state is worse than a deployment that fails outright.
Configure CloudWatch alarm-based auto-rollback on every production deployment group. This is the safety net that catches regressions your smoke tests miss. Monitor error rates (5xx responses), latency (p99), and any custom application health metrics. An alarm that fires within the first 10 minutes of a deployment and triggers an automatic rollback has saved me from more incidents than I can count.
Keep hook scripts idempotent. Hook scripts can execute multiple times during rollbacks and redeployments. If your AfterInstall script creates a database table, it must handle the case where the table already exists. If it starts a background process, it must handle the case where the process is already running. Non-idempotent hooks cause cascading failures during rollback.
Use separate deployment groups per environment, not separate applications. A single CodeDeploy application with deployment groups for dev, staging, and production keeps your revision history unified and makes it easy to promote the same revision through environments. Using separate applications fragments your deployment history and makes it harder to track which revision is where.
Use linear traffic shifting for Lambda and ECS production deployments. Canary is tempting because it validates quickly, but linear shifting gives you a gradual ramp that catches load-dependent issues that a 10% canary might miss. Linear10PercentEvery3Minutes is my default for production: 30 minutes of gradual rollout with ample time for alarms to fire.
Monitor CodeDeploy agent status on EC2 instances continuously. A dead agent means the instance silently drops out of the deployment pool. Use Systems Manager Run Command to periodically check agent status, or deploy a CloudWatch agent metric that reports agent health. An instance without a running CodeDeploy agent is an instance that will never receive deployments.
Use CodeDeploy for Lambda deployments over in-console alias flipping. Manual alias updates in the Lambda console provide zero rollback capability, no traffic shifting, and no validation hooks. CodeDeploy's Lambda deployment model adds canary/linear traffic shifting, automatic rollback on CloudWatch alarms, and lifecycle hooks for validation, all for $0.02 per deployment. There is no reason to deploy Lambda functions any other way in production.

For related deployment pipeline architecture, see AWS CodeBuild: An Architecture Deep-Dive for build-stage patterns and AWS CodePipeline: An Architecture Deep-Dive for end-to-end pipeline orchestration.

Additional Resources

AWS CodeDeploy User Guide: Comprehensive service documentation
AppSpec File Reference: Complete AppSpec syntax for all three compute platforms
Deployment Configurations: Built-in and custom configuration reference
Lifecycle Event Hooks: Hook reference by compute platform and deployment type
Tutorial: Deploy a Lambda Function with CodeDeploy: Lambda-specific deployment walkthrough
Tutorial: Deploy an ECS Service with CodeDeploy: ECS blue/green deployment walkthrough
CodeDeploy Agent Reference: Agent installation, configuration, and troubleshooting
Working with Deployment Groups: Deployment group configuration and management