Skip to main content

AWS CodeDeploy: An Architecture Deep-Dive

About the author: I'm Charles Sieg, a cloud architect and platform engineer who builds apps, services, and infrastructure for Fortune 1000 clients through Vantalect. If your organization is rethinking its software strategy in the age of AI-assisted engineering, let's talk.

Deployment automation is the single most impactful investment a team can make in operational reliability. Manual deployments (SSH into a box, pull the latest code, restart the service, pray) are slow, and they are the root cause of a disproportionate number of production incidents. Every manual step is an opportunity for human error: the wrong branch, a missed configuration file, a forgotten service restart, a deployment to the wrong environment. Having spent years building and operating deployment pipelines across hundreds of EC2 instances, Lambda functions, and ECS services, I have watched CodeDeploy evolve from a simple EC2 deployment tool into the foundational deployment engine that underpins most serious AWS CI/CD architectures. It lacks glamour and thorough documentation of its deeper behaviors, yet it is the service that actually puts your code onto your compute.

This article is an architecture reference for engineers and architects who need to understand how CodeDeploy works under the hood: the component hierarchy, the lifecycle event model, the deployment strategies per compute platform, the failure modes that will bite you in production, and the architectural patterns that keep deployments safe at scale.

What CodeDeploy Actually Is

AWS CodeDeploy is a fully managed deployment service that automates software deployments to Amazon EC2 instances, on-premises servers, AWS Lambda functions, and Amazon ECS services. It coordinates the entire deployment process, from pulling your application revision out of S3 or GitHub, to stopping your running application, installing the new files, restarting services, running validation scripts, and shifting traffic, all without requiring you to SSH into anything or manually orchestrate any step.

CodeDeploy is deliberately unopinionated about what you deploy. Java WAR file, Node.js bundle, compiled Go binary, static assets, Docker image reference: it does not care. No imposed framework, no required project structure, no mandated build system. You tell it what files to put where (via the AppSpec file), what scripts to run at each lifecycle stage, and how to shift traffic. Everything else is your business.

This generality drives most of the confusion around the service. Elastic Beanstalk provides a complete platform abstraction. ECS service updates are tightly coupled to the container orchestration model. CodeDeploy is neither. It is a deployment primitive, a building block you compose into your own deployment architecture.

DimensionCodeDeployElastic BeanstalkECS Service UpdateCloudFormation DeployManual (SSH/SCP)
Compute TargetsEC2, on-premises, Lambda, ECSEC2 (managed), Docker on EC2ECS tasks onlyAny CloudFormation resourceAny server you can reach
Deployment StrategiesIn-place, blue/green, canary, linearRolling, immutable, blue/greenRolling, blue/greenStack update (replacement or update)Whatever you script
RollbackAutomatic (failure or alarm), manualAutomatic, environment swapCircuit breaker, manualStack rollbackManual redeploy
Health MonitoringLifecycle hooks, CloudWatch alarmsEnhanced health reportingELB health checks, container healthStack events, drift detectionYou watching logs
Agent RequiredYes (EC2/on-premises only)No (managed by Beanstalk)NoNoNo
CostFree for EC2; $0.02/deployment for Lambda/ECSNo additional chargeNo additional chargeNo additional chargeYour time
ComplexityMedium: AppSpec + hooksLow: platform-managedLow: task definition updateMedium: template authoringHigh: entirely manual
FlexibilityHigh: any app, any languageMedium: supported platforms onlyLow: containers onlyHigh: infrastructure + appUnlimited: but unmanageable

Architecture Internals

CodeDeploy is built around a hierarchy of concepts that maps directly to how deployments are organized and executed. You need to internalize this hierarchy before you can design deployment pipelines that scale across multiple environments, regions, and teams.

At the top level, you create an Application, which is a logical container that groups everything related to deploying a single application. Within each application, you define Deployment Groups, which represent the target environments: your staging EC2 fleet, your production Lambda function, your ECS service in us-east-1. Each deployment group specifies the target compute, the deployment configuration to use, and optional triggers, alarms, and rollback settings. A Deployment Configuration controls the mechanics of how the deployment proceeds: all at once, one at a time, a percentage at a time, or a custom traffic-shifting schedule. A Revision is the actual artifact being deployed: an S3 bundle, a GitHub commit, or an AppSpec-only definition for Lambda and ECS. When you create a Deployment, CodeDeploy combines a deployment group with a revision and executes the deployment according to the deployment configuration.

flowchart TD
    A[Application] --> DG1[Deployment Group
e.g., Production EC2]
    A --> DG2[Deployment Group
e.g., Staging EC2]
    A --> DG3[Deployment Group
e.g., Lambda Prod]
    DG1 --> DC[Deployment
Configuration]
    DG1 --> REV[Revision
S3 / GitHub]
    DC --> DEP[Deployment]
    REV --> DEP
    DEP --> I1[Instance 1]
    DEP --> I2[Instance 2]
    DEP --> I3[Instance N]
    DG3 --> DC2[Deployment
Configuration]
    DG3 --> REV2[Revision
AppSpec YAML]
    DC2 --> DEP2[Deployment]
    REV2 --> DEP2
    DEP2 --> LF[Lambda Function
Version + Alias]
CodeDeploy component hierarchy

The CodeDeploy Agent

For EC2 and on-premises deployments, the CodeDeploy agent is a mandatory component running on every target instance. The agent is a Ruby-based process that polls the CodeDeploy service every 15 seconds, asking "do you have a deployment for me?" When it receives a deployment command, the agent downloads the revision from S3 or GitHub, parses the AppSpec file, and executes the lifecycle event hooks in order. The agent manages its own log files (found at /var/log/aws/codedeploy-agent/ on Linux and C:\ProgramData\Amazon\CodeDeploy\log\ on Windows), which are your first stop when debugging deployment failures.

The polling model is important to understand. CodeDeploy does not push deployments to instances; instances pull them. This means network connectivity from the instance to the CodeDeploy service endpoint and to S3 (for revision download) is a hard requirement. In VPC environments without internet access, you need VPC endpoints for both codedeploy and s3. Missing either endpoint is one of the most common reasons deployments hang indefinitely with the agent showing no activity.

Lambda and ECS deployments do not use the agent at all. CodeDeploy interacts directly with the Lambda and ECS APIs to shift traffic between function versions or task sets.

ConceptDefinitionScope
ApplicationLogical container for deployment groups and revisionsPer-account, per-region
Deployment GroupSet of target instances, Lambda function, or ECS service plus deployment settingsPer-application
Deployment ConfigurationRules governing how many targets are updated simultaneously and how traffic shiftsPer-deployment group or per-deployment
RevisionThe application artifact: S3 bundle (zip/tar), GitHub commit, or AppSpec-only definitionPer-application, versioned in S3
DeploymentA single execution of deploying a specific revision to a specific deployment groupUnique ID, tracks status per target
Instance (or Target)An individual EC2 instance, Lambda function version, or ECS task set receiving the deploymentPer-deployment

The Three Compute Platforms

CodeDeploy supports three fundamentally different compute platforms, and the deployment model differs so significantly across them that they are practically three different services sharing a control plane. The compute platform you choose determines the AppSpec format, the available lifecycle hooks, the deployment types, and the rollback mechanics.

EC2/On-Premises is the original and most flexible platform. Deployments work by copying files to instances and running scripts at defined lifecycle stages. You have full control over what happens at every step: stop the app, update files, run database migrations, start the app, validate health. The trade-off is complexity: you write and maintain all the hook scripts, and you are responsible for ensuring they are idempotent and handle failure gracefully.

Lambda deployments operate on a completely different model. There are no files to copy and no instances to manage. Instead, CodeDeploy shifts traffic between two versions of a Lambda function by updating an alias. The deployment configuration controls whether traffic shifts all at once, linearly over time, or in a canary pattern. Hook functions (which are themselves Lambda functions) run at defined points to validate the new version before and after traffic shifts.

ECS deployments are conceptually similar to Lambda deployments. CodeDeploy manages traffic shifting between two ECS task sets (the "blue" original and the "green" replacement) behind a load balancer. The deployment creates a new task set running the updated task definition, optionally routes test traffic to it, and then shifts production traffic according to the deployment configuration.

DimensionEC2/On-PremisesLambdaECS
Deployment TypesIn-place, blue/greenCanary, linear, all-at-onceCanary, linear, all-at-once
AppSpec FormatYAML: files, permissions, hooksYAML: resources, hooksYAML: resources, hooks
Agent RequiredYesNoNo
Hooks Available10+ lifecycle eventsBeforeAllowTraffic, AfterAllowTrafficBeforeInstall, AfterInstall, AfterAllowTestTraffic, BeforeAllowTraffic, AfterAllowTraffic
Traffic ShiftingVia load balancer (blue/green only)Lambda alias weighted routingECS task set + ALB/NLB listener rules
Rollback MechanismRedeploy previous revision (in-place) or reroute traffic (blue/green)Revert alias to previous versionReroute traffic to original task set
Health ChecksHook scripts, CloudWatch alarmsHook Lambda functions, CloudWatch alarmsELB health checks, hook functions, CloudWatch alarms
Blue/Green SupportYes: provision new ASG or use existingAll deployments are effectively blue/greenAll deployments are effectively blue/green
Revision StorageS3 bucket or GitHub repositoryAppSpec in S3 or inlineAppSpec in S3 or inline
IAM ModelService role + instance profileService role + function execution roleService role + task execution role + task role
PricingFree$0.02 per deployment$0.02 per deployment
Primary Use CaseTraditional server-based apps, legacy migrationsServerless function version managementContainer workload updates

Deployment Types

CodeDeploy offers two fundamental deployment types for EC2/On-Premises, and a traffic-shifting model for Lambda and ECS that is conceptually always blue/green.

In-Place Deployments

In-place deployments update the application on existing instances without provisioning new ones. The CodeDeploy agent on each instance stops the running application, pulls the new revision, installs the updated files, runs configuration scripts, starts the application, and validates health, all on the same instance. During the update, the instance is typically deregistered from the load balancer to avoid serving requests while the application is down.

In-place deployments are simple and cheap. No spare capacity, no provisioning new instances, no complexity. The trade-off is risk. If the new revision has a bug, rolling back means a full redeployment of the previous revision, which takes just as long as the original deployment. Each instance also goes offline during its update window, though deploying one at a time behind a load balancer mitigates this.

flowchart LR
    A[Deregister from
Load Balancer] --> B[ApplicationStop
Hook Scripts]
    B --> C[Download
Revision]
    C --> D[BeforeInstall
Hook Scripts]
    D --> E[Install Files
to Disk]
    E --> F[AfterInstall
Hook Scripts]
    F --> G[ApplicationStart
Hook Scripts]
    G --> H[ValidateService
Hook Scripts]
    H --> I[Register with
Load Balancer]
In-place deployment flow

Blue/Green Deployments

Blue/green deployments provision a completely new set of instances (the "green" environment), deploy the revision to them, run validation, shift traffic from the old instances (the "blue" environment) to the green, and then terminate the blue instances after a configurable wait period. For EC2 deployments, this requires an Auto Scaling group; CodeDeploy creates a new ASG with the same configuration, deploys to the new instances, and updates the load balancer target group.

I use blue/green for every production workload, full stop. Rollback is near-instantaneous: reroute traffic back to the original instances, which are still running the previous version. No redeployment, no downtime, no waiting. Yes, you run double capacity during the deployment window. Worth it every time.

flowchart LR
    A[Provision Green
Instances via ASG] --> B[Deploy Revision
to Green]
    B --> C[Run Lifecycle
Hooks on Green]
    C --> D[Validate Green
Health]
    D --> E[Shift Traffic
Blue to Green]
    E --> F[Wait Period
for Observation]
    F --> G[Terminate Blue
Instances]
Blue/green deployment flow

For Lambda and ECS, all deployments are inherently blue/green; CodeDeploy creates a new version or task set and shifts traffic according to the deployment configuration. The "in-place" concept does not apply because there are no persistent instances to update.

DimensionIn-PlaceBlue/Green
Downtime RiskYes: each instance is offline during updateNo: traffic shifts only after green is healthy
Rollback SpeedSlow: requires full redeployment of previous revisionFast: reroute traffic to original instances (seconds)
Cost During DeployNo additional costDouble capacity during deployment window
ComplexityLow: straightforward file replacementMedium: requires ASG, load balancer, traffic shifting
Compute PlatformsEC2/On-Premises onlyEC2/On-Premises, Lambda, ECS
Traffic ShiftingPer-instance (deregister/register)Load balancer rerouting or alias/task set shifting
Capacity RequirementNo spare capacity neededMust have capacity for two full environments
Data MigrationNot applicable: same instances, same storageMust handle shared state (databases, caches) carefully

The AppSpec File

The AppSpec file (appspec.yml) is the deployment manifest that tells CodeDeploy exactly what to do during a deployment. Its structure varies significantly across the three compute platforms, and getting the format wrong is one of the most common deployment failures.

For EC2/On-Premises, the AppSpec file defines which files from the revision should be copied to which locations on the instance, what permissions to apply, and which scripts to run at each lifecycle hook. It lives at the root of your revision bundle.

For Lambda, the AppSpec file specifies the Lambda function to deploy, the current version and the target version, and any hook functions to run for validation. There are no files to copy. The revision is the AppSpec itself plus a reference to the already-published Lambda function versions.

For ECS, the AppSpec file specifies the ECS task definition, the container name and port that the load balancer routes to, and hook functions for validation. Like Lambda, the revision is primarily the AppSpec plus a reference to the task definition registered separately.

SectionEC2/On-PremisesLambdaECS
version0.0 (only valid value)0.00.0
oslinux or windowsNot applicableNot applicable
filesSource-to-destination file mappingsNot applicableNot applicable
permissionsOwner, group, mode for installed filesNot applicableNot applicable
resourcesNot applicableFunction name, alias, current version, target versionTask definition ARN, container name, container port
hooksScript-based hooks (path, timeout, runas)Lambda function ARNs for validation hooksLambda function ARNs for validation hooks

The files section in EC2 deployments maps source paths within your revision archive to destination paths on the instance. For example, mapping source: /config/app.conf to destination: /etc/myapp/ places the file at /etc/myapp/app.conf. The permissions section then lets you set ownership and file modes on the installed files, which is critical for applications that run as non-root users.

For Lambda, the resources section is where you specify the function's name, the alias that CodeDeploy will update, the CurrentVersion (the version currently receiving traffic), and the TargetVersion (the version to shift traffic to). CodeDeploy manages the alias weights to implement the traffic-shifting pattern defined by the deployment configuration.

For ECS, the resources section references the task definition (as a full ARN or <family>:<revision>), the container name that is the target of the load balancer, and the container port. CodeDeploy uses this information to create a replacement task set and configure the load balancer's target groups for traffic shifting.

Lifecycle Event Hooks

Lifecycle events are the mechanism by which CodeDeploy gives you control over what happens at each stage of a deployment. For EC2/On-Premises, hooks are shell scripts (or PowerShell scripts on Windows) that run on the instance. For Lambda and ECS, hooks are Lambda functions that run externally and validate the deployment state.

EC2 In-Place Hook Order

The in-place deployment lifecycle follows a strict sequence. Some events are scriptable (you provide the script), and some are managed by CodeDeploy internally.

flowchart TD
    A[ApplicationStop] --> B[DownloadBundle]
    B --> C[BeforeInstall]
    C --> D[Install]
    D --> E[AfterInstall]
    E --> F[ApplicationStart]
    F --> G[ValidateService]

    style B fill:#555,color:#fff
    style D fill:#555,color:#fff
EC2 in-place deployment lifecycle event order

EC2 Blue/Green Hook Order

Blue/green deployments on EC2 add traffic-management hooks around the core lifecycle. The green instances go through the full install lifecycle, then traffic is blocked to the blue instances and allowed to the green instances.

flowchart TD
    A[BeforeInstall] --> B[Install]
    B --> C[AfterInstall]
    C --> D[ApplicationStart]
    D --> E[ValidateService]
    E --> F[BeforeBlockTraffic]
    F --> G[BlockTraffic]
    G --> H[AfterBlockTraffic]
    H --> I[BeforeAllowTraffic]
    I --> J[AllowTraffic]
    J --> K[AfterAllowTraffic]

    style B fill:#555,color:#fff
    style G fill:#555,color:#fff
    style J fill:#555,color:#fff
EC2 blue/green deployment lifecycle event order

Complete Hook Reference

The following table covers all lifecycle events across all compute platforms. "Scriptable" means you provide the script or Lambda function; "Managed" means CodeDeploy handles it internally.

Lifecycle EventEC2 In-PlaceEC2 Blue/GreenLambdaECSType
ApplicationStopYesNoNoNoScriptable
DownloadBundleYesYesNoNoManaged
BeforeInstallYesYesNoYesScriptable
InstallYesYesNoYesManaged
AfterInstallYesYesNoYesScriptable
ApplicationStartYesYesNoNoScriptable
ValidateServiceYesYesNoNoScriptable
BeforeBlockTrafficNoYesNoNoScriptable
BlockTrafficNoYesNoNoManaged
AfterBlockTrafficNoYesNoNoScriptable
BeforeAllowTrafficNoYesYesYesScriptable
AllowTrafficNoYesYesYesManaged
AfterAllowTrafficNoYesYesYesScriptable
AfterAllowTestTrafficNoNoNoYesScriptable

For EC2 hook scripts, each hook entry in the AppSpec file specifies three properties: location (the path to the script within the revision), timeout (maximum execution time in seconds, default 3600), and runas (the OS user to run the script as). A hook script signals success with exit code 0 and failure with any non-zero exit code. When a hook script fails, the deployment fails for that instance, and CodeDeploy proceeds according to the deployment configuration's minimum healthy hosts threshold.

For Lambda and ECS, hook functions are Lambda functions that receive a deployment lifecycle event as input and must call back to CodeDeploy with a PutLifecycleEventHookExecutionStatus API call indicating Succeeded or Failed. If the hook function does not call back within the timeout, CodeDeploy treats it as a failure. This callback model is a common source of bugs. Forgetting the callback causes the deployment to hang until timeout and then fail.

Deployment Configurations

Deployment configurations control the pace and pattern of a deployment. They determine how many instances are updated simultaneously (EC2), or how traffic is shifted between versions (Lambda and ECS).

EC2/On-Premises Configurations

For EC2, the deployment configuration defines the minimum number of healthy instances that must remain in service during the deployment. CodeDeploy deploys to instances in batches, ensuring the healthy count never drops below the threshold.

Lambda and ECS Configurations

For Lambda and ECS, deployment configurations control traffic shifting: the percentage of traffic routed to the new version over time. CodeDeploy offers three patterns: All-at-once (immediate 100% shift), Linear (equal increments at regular intervals), and Canary (a small percentage first, then the remainder after a wait period).

ConfigurationPlatformBehaviorUse Case
CodeDeployDefault.AllAtOnceEC2Deploy to all instances simultaneously; succeed if any instance succeedsDevelopment, testing
CodeDeployDefault.HalfAtATimeEC2Deploy to up to half the instances at once; maintain 50% healthyStaging environments
CodeDeployDefault.OneAtATimeEC2Deploy to one instance at a time; all but one must remain healthyProduction: maximum safety
CodeDeployDefault.LambdaAllAtOnceLambdaShift 100% of traffic immediatelyDevelopment, testing
CodeDeployDefault.LambdaCanary10Percent5MinutesLambdaShift 10% of traffic, wait 5 minutes, shift remaining 90%Production: quick validation
CodeDeployDefault.LambdaCanary10Percent10MinutesLambdaShift 10% of traffic, wait 10 minutes, shift remaining 90%Production: extended validation
CodeDeployDefault.LambdaCanary10Percent15MinutesLambdaShift 10% of traffic, wait 15 minutes, shift remaining 90%Production: conservative validation
CodeDeployDefault.LambdaLinear10PercentEvery1MinuteLambdaShift 10% every minute over 10 minutesProduction: gradual rollout
CodeDeployDefault.LambdaLinear10PercentEvery2MinutesLambdaShift 10% every 2 minutes over 20 minutesProduction: slower gradual rollout
CodeDeployDefault.LambdaLinear10PercentEvery3MinutesLambdaShift 10% every 3 minutes over 30 minutesProduction: most conservative gradual
CodeDeployDefault.ECSAllAtOnceECSShift 100% of traffic immediatelyDevelopment, testing
CodeDeployDefault.ECSCanary10Percent5MinutesECSShift 10% of traffic, wait 5 minutes, shift remaining 90%Production: quick validation
CodeDeployDefault.ECSCanary10Percent15MinutesECSShift 10% of traffic, wait 15 minutes, shift remaining 90%Production: extended validation
CodeDeployDefault.ECSLinear10PercentEvery1MinuteECSShift 10% every minute over 10 minutesProduction: gradual rollout
CodeDeployDefault.ECSLinear10PercentEvery3MinutesECSShift 10% every 3 minutes over 30 minutesProduction: conservative gradual

You can also create custom deployment configurations for any platform. For EC2, you specify a minimumHealthyHosts value as either a count or percentage. For Lambda and ECS, you define custom linear or canary intervals and percentages. Custom configurations are essential when the built-in options do not match your risk tolerance. For example, you might define a canary that shifts 1% for 30 minutes before proceeding, or a linear that shifts 5% every 10 minutes.

Rollback Strategies

Rollback is where your deployment architecture proves its worth or falls apart. CodeDeploy provides several rollback mechanisms depending on the deployment type and platform.

Automatic Rollback on Deployment Failure

When any instance or target fails its lifecycle hooks, CodeDeploy can automatically roll back the entire deployment. This is configured at the deployment group level. For in-place deployments, the rollback deploys the last known good revision to all instances, essentially a full new deployment in the reverse direction. For blue/green, the rollback reroutes traffic back to the original (blue) environment without redeploying anything.

Automatic Rollback on CloudWatch Alarm

I configure this rollback trigger on every production deployment group. You associate one or more CloudWatch alarms with the deployment group, typically monitoring error rates, latency percentiles, or custom application health metrics. If any alarm enters the ALARM state during the deployment, CodeDeploy automatically triggers a rollback. This catches issues that individual lifecycle hook validations miss, particularly the subtle regressions that only manifest under real production traffic.

Manual Rollback

You can always manually roll back by creating a new deployment that deploys the previous revision. For blue/green deployments that are still in the termination wait period, you can also manually stop the deployment and reroute traffic back to the original environment.

Rollback speed is everything. Blue/green rollbacks take seconds because the original environment is still running. In-place rollbacks take the same amount of time as the original deployment because they are a full redeployment. That speed difference alone justifies blue/green for production.

ScenarioDeployment TypePlatformRollback BehaviorRollback Speed
Hook failure, blue/greenBlue/greenEC2Reroute traffic to original instancesSeconds
Hook failure, in-placeIn-placeEC2Redeploy previous revision to all instancesMinutes (full deployment cycle)
CloudWatch alarm, blue/greenBlue/greenEC2Reroute traffic to original instancesSeconds
CloudWatch alarm, LambdaCanary/LinearLambdaRevert alias to previous versionSeconds
CloudWatch alarm, ECSCanary/LinearECSReroute traffic to original task setSeconds
Manual stop during blue/greenBlue/greenEC2/ECSTraffic remains on or reverts to originalSeconds (if original still running)

Security Architecture

CodeDeploy's security model involves multiple IAM roles and policies working in concert. Get any of these wrong and you will spend an hour staring at permission-denied errors with maddeningly generic messages.

The CodeDeploy service role is an IAM role assumed by the CodeDeploy service itself. It needs permissions to interact with the compute resources that are the targets of deployments: reading Auto Scaling group configurations, modifying load balancer target groups, invoking Lambda functions, and managing ECS task sets. This role is attached to the deployment group.

The EC2 instance profile is the IAM role attached to the EC2 instances that are deployment targets. The instances need permission to read revisions from S3 (or access GitHub), communicate with the CodeDeploy service, and write logs to CloudWatch. Without the correct instance profile, the CodeDeploy agent can poll the service but cannot download the revision.

For Lambda deployments, the CodeDeploy service role needs lambda:GetFunction, lambda:GetAlias, lambda:UpdateAlias, and lambda:InvokeFunction permissions on the target function. The hook functions (which are separate Lambda functions) need their own execution roles with permission to call codedeploy:PutLifecycleEventHookExecutionStatus.

For ECS deployments, the service role needs permissions to manage ECS services, task sets, and task definitions, as well as modify load balancer target groups and listeners. The breadth of required permissions for ECS blue/green is significant.

RelationshipSourceTargetKey Permissions Required
CodeDeploy to EC2CodeDeploy service roleEC2, Auto Scalingec2:Describe*, autoscaling:*, elasticloadbalancing:*, tag:GetResources
CodeDeploy to S3EC2 instance profileS3 revision buckets3:GetObject, s3:GetBucketLocation, s3:ListBucket on the revision bucket
CodeDeploy to LambdaCodeDeploy service roleLambda functionlambda:GetFunction, lambda:GetAlias, lambda:UpdateAlias, lambda:InvokeFunction
CodeDeploy to ECSCodeDeploy service roleECS, ELBecs:*, elasticloadbalancing:*, iam:PassRole for task execution role
CodeDeploy to Auto ScalingCodeDeploy service roleAuto Scaling groupsautoscaling:CompleteLifecycleAction, autoscaling:CreateAutoScalingGroup, autoscaling:DeleteAutoScalingGroup, autoscaling:UpdateAutoScalingGroup

A common security mistake is granting overly broad s3:GetObject permissions on the instance profile, allowing instances to read any S3 object in the account. Scope the instance profile's S3 permissions to the specific bucket and prefix where revisions are stored.

Cost Model

CodeDeploy's pricing for EC2 workloads is hard to beat: free. Zero charge for deploying to EC2 instances or on-premises servers, regardless of instance count, deployment frequency, or configuration complexity.

For Lambda and ECS deployments, AWS charges $0.02 per deployment. A "deployment" is a single invocation of CreateDeployment, regardless of how many Lambda function invocations or ECS tasks are involved, or how long the traffic-shifting window lasts.

The real cost of CodeDeploy is in the underlying compute, not the service itself. Blue/green EC2 deployments double your instance count during the deployment window. Lambda deployments may run two concurrent versions, consuming concurrency quota. ECS deployments run two task sets simultaneously.

ScenarioPlatformDeployments/MonthCodeDeploy CostCompute Overhead
10-instance EC2, 20 deploys/monthEC220$0.00Blue/green: ~$0 (minutes of double capacity)
50-instance EC2, 100 deploys/monthEC2100$0.00Blue/green: minutes of double capacity per deploy
5 Lambda functions, 200 deploys/monthLambda200$4.00Minimal: concurrent versions share concurrency pool
10 ECS services, 150 deploys/monthECS150$3.00Double task count during traffic shifting (minutes)
Mixed: 20 EC2 + 10 Lambda + 5 ECS, 300 deploys/monthMixed300$3.00 (Lambda + ECS only)Variable by platform

At these price points, CodeDeploy itself is a rounding error in your AWS bill. The real cost discussion centers on compute overhead during blue/green deployments, the operational burden of maintaining hook scripts and AppSpec files, and (most overlooked) the opportunity cost of not having automated deployments at all.

Common Failure Modes

These are the failure modes I have encountered repeatedly in production. Every one of them has cost teams hours of debugging time.

Failure ModeSymptomRoot CauseMitigation
Agent not runningDeployment hangs indefinitely; instance shows "Pending"CodeDeploy agent crashed, was never installed, or cannot reach the CodeDeploy endpointMonitor agent status with CloudWatch agent or Systems Manager; ensure VPC endpoints exist for codedeploy-agent and s3
AppSpec parse errorDeployment fails immediately with "Invalid AppSpec"YAML syntax error, wrong version field, missing required section, incorrect indentationValidate AppSpec locally with aws deploy push --dry-run; use a YAML linter in CI
Hook script timeoutDeployment fails after the configured timeout (default 3600s)Script hangs on a network call, waits for user input, or enters an infinite loopSet explicit, shorter timeouts per hook; ensure scripts have proper error handling and exit conditions
Health check failureInstance fails ValidateService hookApplication did not start correctly, port not listening, dependency unavailableMake ValidateService scripts check actual application health (HTTP endpoint, process status, port availability) beyond file existence
IAM permission deniedDeployment fails with "AccessDenied" during DownloadBundle or AllowTrafficInstance profile lacks S3 read permissions, or service role lacks ELB/ASG/Lambda/ECS permissionsUse IAM Access Analyzer to audit roles; test with minimal permissions first, then restrict
Auto-rollback loopDeployment rolls back, next deployment rolls back, cycle repeatsCloudWatch alarm stays in ALARM state from the previous failure; new deployment immediately triggers alarm-based rollbackReset alarm state before redeploying; add an alarm evaluation period that spans the deployment warmup time
AllowTraffic timeoutBlue/green deployment hangs at AllowTraffic stageTarget group health check failing on new instances (wrong health check path, port, or protocol); security group blocking health check trafficVerify target group health check configuration matches the application; check security groups allow health check traffic from the load balancer
Revision not found in S3Deployment fails with "Revision does not exist"S3 key path is wrong, bucket is in a different region, revision was deleted by lifecycle policy, or ETag mismatchUse aws deploy push to register revisions properly; set S3 lifecycle policies carefully; keep at least N previous revisions
Insufficient blue/green capacityBlue/green deployment fails to provision green instancesASG launch template references an instance type with no capacity in the AZ, or account EC2 limits reachedUse multiple instance types in the launch template; request limit increases proactively; test deployments in staging first
CodeDeploy throttlingAPI calls return "ThrottlingException"; deployments queuedToo many concurrent deployments, too many GetDeployment polling calls, or deployment history queries at scaleStagger deployments across deployment groups; use exponential backoff in automation scripts; avoid polling deployment status in tight loops

Key Architectural Recommendations

These recommendations come from years of operating CodeDeploy across production environments of various scales.

  1. Always use blue/green deployments for production. The near-instantaneous rollback capability alone justifies the temporary cost of double capacity. In-place deployments are acceptable for development and testing environments, but production workloads should never tolerate the slow rollback cycle of in-place.
  2. Use lifecycle hooks for smoke tests beyond file installation. The ValidateService hook (EC2) and AfterAllowTestTraffic hook (ECS) exist specifically for running real health checks against the newly deployed application. Hit the health endpoint, verify the version response, run a subset of integration tests. A deployment that installs files correctly but leaves the application in a broken state is worse than a deployment that fails outright.
  3. Configure CloudWatch alarm-based auto-rollback on every production deployment group. This is the safety net that catches regressions your smoke tests miss. Monitor error rates (5xx responses), latency (p99), and any custom application health metrics. An alarm that fires within the first 10 minutes of a deployment and triggers an automatic rollback has saved me from more incidents than I can count.
  4. Keep hook scripts idempotent. Hook scripts can execute multiple times during rollbacks and redeployments. If your AfterInstall script creates a database table, it must handle the case where the table already exists. If it starts a background process, it must handle the case where the process is already running. Non-idempotent hooks cause cascading failures during rollback.
  5. Use separate deployment groups per environment, not separate applications. A single CodeDeploy application with deployment groups for dev, staging, and production keeps your revision history unified and makes it easy to promote the same revision through environments. Using separate applications fragments your deployment history and makes it harder to track which revision is where.
  6. Use linear traffic shifting for Lambda and ECS production deployments. Canary is tempting because it validates quickly, but linear shifting gives you a gradual ramp that catches load-dependent issues that a 10% canary might miss. Linear10PercentEvery3Minutes is my default for production: 30 minutes of gradual rollout with ample time for alarms to fire.
  7. Monitor CodeDeploy agent status on EC2 instances continuously. A dead agent means the instance silently drops out of the deployment pool. Use Systems Manager Run Command to periodically check agent status, or deploy a CloudWatch agent metric that reports agent health. An instance without a running CodeDeploy agent is an instance that will never receive deployments.
  8. Use CodeDeploy for Lambda deployments over in-console alias flipping. Manual alias updates in the Lambda console provide zero rollback capability, no traffic shifting, and no validation hooks. CodeDeploy's Lambda deployment model adds canary/linear traffic shifting, automatic rollback on CloudWatch alarms, and lifecycle hooks for validation, all for $0.02 per deployment. There is no reason to deploy Lambda functions any other way in production.

For related deployment pipeline architecture, see AWS CodeBuild: An Architecture Deep-Dive for build-stage patterns and AWS CodePipeline: An Architecture Deep-Dive for end-to-end pipeline orchestration.

Additional Resources

Let's Build Something!

I help teams ship cloud infrastructure that actually works at scale. Whether you're modernizing a legacy platform, designing a multi-region architecture from scratch, or figuring out how AI fits into your engineering workflow, I've seen your problem before. Let me help.

Currently taking on select consulting engagements through Vantalect.