Skip to main content

AWS CodeBuild: An Architecture Deep-Dive

AWSArchitectureCodeBuildCI/CDDevOps

About the author: I'm Charles Sieg, a cloud architect and platform engineer who builds apps, services, and infrastructure for Fortune 1000 clients through Vantalect. If your organization is rethinking its software strategy in the age of AI-assisted engineering, let's talk.

Nobody wants to own build infrastructure. Everybody depends on it. I have spent years managing Jenkins clusters, debugging flaky build agents, patching security holes on build servers, and scaling CI/CD capacity for growing engineering teams. The operational overhead? Wildly disproportionate to the business value. AWS CodeBuild kills that burden. It is a fully managed, container-based build service. Fresh, isolated compute for every build. Automatic scaling to any workload. You pay only for the minutes you actually use. The architectural decisions baked into CodeBuild (ephemeral containers, pay-per-minute pricing, deep AWS service integration) reflect hard-won lessons about what matters in build infrastructure. And what does not.

This is not a getting-started tutorial. It is an architecture reference for engineers who need to understand how CodeBuild works under the hood. How to design build configurations that are fast, cost-effective, and secure. How to dodge the failure modes that catch teams in production. Looking for "Hello World with CodeBuild"? The AWS documentation covers that. This article covers architecture, trade-offs, pricing mechanics, and the operational patterns I actually use when designing CI/CD systems on AWS.

What CodeBuild Actually Is

CodeBuild is a fully managed build service. It runs your build commands inside a fresh, isolated Docker container on AWS-managed infrastructure. You define a build specification (the buildspec), point CodeBuild at your source code, and the service handles the rest. Provisioning compute, pulling source, executing build phases, uploading artifacts, tearing down the environment. No servers to provision. No build agents to patch. No capacity planning. Every build runs in a clean container with zero residual state from previous builds. That kills an entire category of "works on my build agent but not on yours" problems. Anyone who has managed Jenkins agents knows exactly what I mean.

The architecture is simple. CodeBuild receives a build request, provisions a container from an AWS-managed image or a custom Docker image you provide, clones source code into the container, executes the phases in your buildspec, uploads artifacts, and terminates the container. Then it destroys the container. No persistent state. No shared filesystem between builds. No opportunity for one build to contaminate another. This ephemeral model is CodeBuild's most important architectural property, full stop.

CodeBuild plugs into the AWS ecosystem natively. CodePipeline for orchestration. S3 and ECR for artifacts and images. CloudWatch for logging and metrics. EventBridge for event-driven automation. IAM for access control. VPC for network isolation. It also works fine as a standalone service, triggered via the CLI, SDK, or console. You are not locked into CodePipeline to use CodeBuild. Plenty of teams run CodeBuild as their build engine while orchestrating with other tools entirely.

DimensionCodeBuildJenkinsGitHub ActionsGitLab CI
ManagedFully managed, zero infrastructureSelf-managed servers and agentsManaged runners (or self-hosted)Managed runners (or self-hosted)
ScalingAutomatic, unlimited concurrent buildsManual agent provisioning or pluginsAuto-scales managed runnersAuto-scales shared runners
PricingPer-minute, pay only for build timeInfrastructure cost (EC2, EBS, etc.)Per-minute on managed runners, free tier availablePer-minute on shared runners, free tier available
IsolationFresh container per build, fully isolatedShared agents unless containerizedFresh VM per job (managed), varies (self-hosted)Fresh VM per job (shared), varies (self-hosted)
VPC SupportNative VPC integration with private subnetsRuns wherever you deploy agentsNo native VPC (self-hosted runners can be in VPC)No native VPC (self-hosted runners can be in VPC)
CachingS3 and local caching modesFilesystem caching on persistent agentsBuilt-in cache action (limited)Cache artifacts via S3 or GCS
Custom ImagesECR, Docker Hub, or any registryAny Docker image or native installCustom container actions, limited base imagesAny Docker image
Max Duration8 hours (480 minutes)No limit (agent-bound)6 hours (managed runners)24 hours (shared runners)
Concurrent BuildsDefault 60, increasable to thousandsLimited by agent count20 concurrent jobs (free), varies by planVaries by plan
EcosystemDeep AWS integration (CodePipeline, ECR, S3, IAM)1,800+ plugins, vast ecosystemGitHub Marketplace, 15,000+ actionsBuilt-in DevSecOps, registry, Kubernetes integration

The trade-offs are right there in the table. CodeBuild wins on operational simplicity, build isolation, and AWS integration. Jenkins wins on flexibility and that massive plugin ecosystem. GitHub Actions wins on developer experience. GitLab CI wins on integrated DevSecOps tooling. Building on AWS and want zero CI/CD infrastructure management? CodeBuild. Need the broadest plugin ecosystem or run multi-cloud? Probably Jenkins or GitHub Actions.

Architecture Internals

You need to understand how CodeBuild provisions and executes builds. Otherwise you are guessing when optimizing performance, diagnosing failures, and predicting costs.

When you start a build, the control plane accepts the request, validates the configuration, and drops the build into a queue. The data plane picks it up. It provisions a container on AWS-managed infrastructure (on-demand EC2 or, for Lambda compute, a Lambda execution environment), attaches any required VPC networking, and begins executing the build lifecycle. Important detail here: the control plane and data plane are separated. Configuration changes, project updates, API operations; those happen on the control plane. Build execution is entirely data plane. So even during control plane issues, running builds finish normally.

The build lifecycle follows a fixed sequence of phases. Each one has distinct behavior and failure semantics.

flowchart LR
  A[SUBMITTED] --> B[QUEUED]
  B --> C[PROVISIONING]
  C --> D[DOWNLOAD_SOURCE]
  D --> E[INSTALL]
  E --> F[PRE_BUILD]
  F --> G[BUILD]
  G --> H[POST_BUILD]
  H --> I[UPLOAD_ARTIFACTS]
  I --> J[FINALIZING]
  J --> K[SUCCEEDED]
CodeBuild build lifecycle phases

Phases transition sequentially. A failure in any phase can short-circuit the rest, though the exact behavior depends on which phase and the on-failure configuration. Worth understanding thoroughly. Both for debugging failures and for figuring out where your build time is actually going.

PhaseDescriptionTypical DurationFailure Behavior
SUBMITTEDBuild request accepted by control plane< 1 secondRare failure; API validation errors return immediately
QUEUEDBuild waiting for available compute capacity0-60 seconds (can be minutes at scale)Build stays queued until capacity is available or timeout
PROVISIONINGContainer is created, image pulled, VPC ENI attached20-90 seconds (longer with VPC or large custom images)Failure if image pull fails, VPC ENI unavailable, or provisioning timeout
DOWNLOAD_SOURCESource code cloned from CodeCommit, S3, GitHub, Bitbucket5-60 seconds depending on repo sizeFailure if credentials invalid, repo unreachable, or branch missing
INSTALLBuildspec install phase executes (runtime install, dependencies)10-120 secondsBuild fails; POST_BUILD does not run by default
PRE_BUILDBuildspec pre_build phase executes (login to ECR, run linters)VariesBuild fails; POST_BUILD does not run by default
BUILDBuildspec build phase executes (compile, test, package)Varies: this is where your actual build runsBuild fails; POST_BUILD runs if on-failure: CONTINUE is set
POST_BUILDBuildspec post_build phase executes (push images, notifications)VariesBuild status depends on whether BUILD phase succeeded
UPLOAD_ARTIFACTSArtifacts uploaded to S3 or configured destination5-30 seconds depending on artifact sizeBuild fails if S3 permissions are missing or bucket unreachable
FINALIZINGBuild environment torn down, logs flushed, metrics emitted5-15 secondsTransparent to user; always completes

PROVISIONING deserves a closer look. It is the most variable phase and the least understood. For on-demand builds, provisioning means pulling the Docker image, creating the container, and setting up the filesystem. Using VPC mode? Add 20-40 seconds for attaching an Elastic Network Interface. Using a large custom image from ECR? The image pull can take 30-60 seconds or more. These provisioning times hit every single build. You cannot optimize them away in the buildspec. They are pure infrastructure setup costs.

Compute Types and Build Environments

Three compute modes. Different performance characteristics, pricing, and use cases for each.

On-demand compute is the default. CodeBuild provisions a fresh container for each build from a pool of shared EC2 capacity. No reserved infrastructure. You pay only for the minutes your builds run. On-demand supports the widest range of compute types and fits most workloads.

Reserved capacity (fleets) lets you pre-provision a fixed number of build instances that stay running and ready. This eliminates the PROVISIONING phase overhead entirely (containers are already warm) and guarantees capacity during traffic spikes. The catch: you are billed hourly whether builds are running or not. Only cost-effective for teams running a high volume of builds consistently throughout the day.

Lambda compute runs your build in a Lambda execution environment instead of a Docker container. Provisioning takes seconds, not tens of seconds. Great for short builds: linting, formatting checks, lightweight tests, small deployments. But Lambda compute has real restrictions. No Docker-in-Docker. No custom Docker images. Limited runtimes. Maximum build duration of 15 minutes. If your build fits those constraints? Significantly faster and cheaper than on-demand.

Compute Type Specifications

Compute TypevCPUsMemoryDiskOn-Demand Price (Linux, per minute)
build.general1.small23 GB64 GB$0.005
build.general1.medium47 GB128 GB$0.010
build.general1.large815 GB128 GB$0.020
build.general1.xlarge3670 GB128 GB$0.040
build.general1.2xlarge72145 GB824 GB$0.080
build.lambda.1gb21 GB10 GB$0.002
build.lambda.2gb22 GB10 GB$0.004
build.lambda.4gb24 GB10 GB$0.007
build.lambda.8gb48 GB10 GB$0.013
build.lambda.10gb410 GB10 GB$0.017

Pricing scales linearly with capability: 2xlarge costs 16x what small costs but gives you 36x the vCPUs and 48x the memory. For CPU-bound builds (compilation, image building), bigger compute types are actually more cost-efficient. Builds complete proportionally faster. For I/O-bound builds (downloading dependencies, uploading artifacts)? Diminishing returns. The bottleneck is network throughput, not CPU. Throwing more vCPUs at a slow npm install does nothing.

AWS Managed Images

AWS provides pre-built Docker images with common runtimes and tools pre-installed:

ImageOperating SystemArchitecturePre-installed Runtimes
aws/codebuild/amazonlinux2-x86_64-standard:5.0Amazon Linux 2x86_64Python, Node.js, Java, Go, Ruby, .NET, Docker, PHP
aws/codebuild/amazonlinux2-aarch64-standard:3.0Amazon Linux 2ARM64Python, Node.js, Java, Go, Ruby, Docker
aws/codebuild/standard:7.0Ubuntu 22.04x86_64Python, Node.js, Java, Go, Ruby, .NET, Docker, PHP
aws/codebuild/amazonlinux2-x86_64-standard:corretto11Amazon Linux 2x86_64Amazon Corretto 11 (Java), Docker
aws/codebuild/windows-base:2019-3.0Windows Server 2019x86_64.NET Framework, .NET Core, PowerShell

For most workloads, start with the Ubuntu standard image. Broadest runtime coverage. Most frequent updates. The Amazon Linux 2 ARM image is also worth a look for ARM-based builds. They run roughly 20% cheaper than x86 equivalents at equivalent performance for many workloads.

Custom Docker images from ECR or Docker Hub work fine. Use one when your build requires specific system libraries, proprietary tools, or a preconfigured environment that would take too long to install during every build's INSTALL phase. The trade-off is image maintenance. You own the patching. You own the vulnerability scanning. You own the version management. That overhead adds up.

The Buildspec File

The buildspec file is the heart of every CodeBuild project. A YAML file, typically named buildspec.yml, sitting at the root of your source repository. It defines what happens during each build phase. You can also specify the buildspec inline in the project definition. Useful for simple builds or when you want the build definition managed outside the source repo.

Six top-level sections:

version: 0.2

env:
  variables:
    APP_ENV: "production"
  parameter-store:
    DB_PASSWORD: "/myapp/db/password"
  secrets-manager:
    API_KEY: "myapp/api-key:api_key"

phases:
  install:
    runtime-versions:
      nodejs: 18
    commands:
      - npm ci
  pre_build:
    commands:
      - aws ecr get-login-password | docker login --username AWS --password-stdin $ECR_URI
      - npm run lint
  build:
    commands:
      - npm run build
      - npm test
      - docker build -t $IMAGE_URI .
  post_build:
    commands:
      - docker push $IMAGE_URI
      - echo "Build completed at $(date)"

artifacts:
  files:
    - 'dist/**/*'
    - 'package.json'
  base-directory: .
  discard-paths: no

cache:
  paths:
    - 'node_modules/**/*'
    - '/root/.npm/**/*'

reports:
  jest-reports:
    files:
      - 'coverage/clover.xml'
    file-format: CLOVERXML

Buildspec Phases

PhasePurposeCommon CommandsFailure Behavior
installInstall build dependencies and configure runtime versionsnpm ci, pip install, apt-get install, runtime-versionsBuild fails immediately; subsequent phases skipped
pre_buildPre-build steps: authenticate, validate, lintaws ecr get-login-password, docker login, npm run lint, terraform validateBuild fails; BUILD and POST_BUILD skipped by default
buildCore build commands: compile, test, packagenpm run build, mvn package, docker build, go buildBuild marked as failed; POST_BUILD runs only if on-failure: CONTINUE
post_buildPost-build steps: push images, deploy, notifydocker push, aws s3 cp, aws ecs update-service, notification scriptsBuild status depends on BUILD phase result

The failure behavior between phases matters more than you think. If BUILD fails, POSTBUILD does not run. By default. So if your POSTBUILD phase has cleanup commands or notifications? They never execute on build failure. Unless you explicitly set on-failure: CONTINUE on the BUILD phase. I have watched teams miss failure notifications for weeks because the notification command was in POST_BUILD and the BUILD phase failed silently. Easy mistake. Painful to debug.

Environment Variables

Three sources for environment variables. Each with different security properties.

Plaintext variables live directly in the buildspec or project configuration. Visible in build logs. Visible in the project definition. Never use them for secrets. I still see this constantly. It is the single most common security mistake in CodeBuild configurations.

Parameter Store references pull values from AWS Systems Manager Parameter Store at build start. Values resolve before the INSTALL phase and stay available as environment variables throughout the build. Use SecureString parameters for sensitive values; they get decrypted at build time and masked in logs.

Secrets Manager references work similarly but pull from AWS Secrets Manager. The syntax lets you extract individual keys from JSON secrets: SECRET_NAME:JSON_KEY:VERSION_STAGE:VERSION_ID. Secrets Manager is the right choice for credentials, API keys, and secrets that benefit from automatic rotation.

Artifacts

The artifacts section controls which files get uploaded to S3 after the build completes. Key options:

  • files: Glob patterns for which files to include. Use **/* to grab everything.
  • base-directory: The directory relative to the build root from which artifact paths are calculated.
  • discard-paths: When yes, artifacts upload flat, no directory structure. All files dumped into the artifact root.
  • secondary-artifacts: Multiple artifact outputs with different file selections and destinations. Useful when one build produces artifacts for several downstream consumers.

Caching Strategies

Caching. The single most effective optimization for reducing CodeBuild costs and build times. Without it, every build starts from a clean filesystem. Downloads all dependencies. Pulls all Docker base layers. Regenerates every intermediate artifact. With effective caching, builds skip the redundant work and finish in a fraction of the time.

Two caching backends: local and S3. Different scope, persistence, and performance characteristics for each.

Local caching stores cached data on the build host's local disk. If a subsequent build lands on the same host, it reuses the cached data with zero network transfer. Three flavors: Docker layer cache (reuses image layers from previous builds), source cache (reuses the Git checkout), and custom cache (arbitrary directories you specify, like node_modules or .m2). The limitation? It is host-scoped. If your next build provisions on a different host (common with on-demand compute), cache miss. You get nothing.

S3 caching stores cached data in an S3 bucket. Every build uploads its cache to S3 after completion and downloads the cache before starting. Persistent across all builds regardless of host. The trade-off is network transfer overhead. Downloading a 500 MB node_modules cache from S3 takes 5-15 seconds depending on the build environment. Still way faster than a fresh npm ci.

Cache TypeScopePersistenceSpeedBest Use Case
Local: Docker LayerSame host onlyEvicted under memory pressureFastest (no network transfer)Docker image builds with large base layers
Local: SourceSame host onlyEvicted under memory pressureFast (incremental Git fetch vs full clone)Large repositories with frequent builds
Local: CustomSame host onlyEvicted under memory pressureFast (local disk read)Build dependencies (node_modules, .m2, pip cache)
S3All builds in the projectPersistent until TTL or manual deletionModerate (S3 download adds 5-30s depending on size)Cross-build dependency caching, teams with many build projects

In practice, I use both. Local caching for Docker layer and custom caches, because when it hits, it is essentially free. S3 caching as the fallback for cold hosts. The combination means fast builds in the common case (local hit) and still faster than uncached on cold starts (S3 fallback). No reason not to configure both.

What about cache invalidation? S3 caches are keyed by a hash of the cache paths you specify. Change the paths, cache key changes, fresh cache. No automatic TTL. S3 caches persist until you manually delete them or change the configuration. Here is the subtle problem: stale caches can actually slow builds down. If cached dependencies are outdated, they trigger extensive update operations that take longer than a clean install. For Node.js projects, I key the cache on a combination of node_modules and the package-lock.json hash. Lock file changes, cache invalidates, dependencies install fresh.

VPC Integration

By default, CodeBuild containers run in AWS-managed VPC infrastructure. Full internet access. No access to your private VPC resources. For most builds (compiling code, running unit tests, pushing Docker images), that is exactly what you want. But what about when your build needs to hit resources inside your VPC? An RDS database for integration tests. An ElastiCache cluster for cache warming. An internal API for contract testing. A private artifact repository. That is when you need VPC mode.

Enable VPC mode and CodeBuild attaches an Elastic Network Interface (ENI) to the build container in one of the private subnets you specify. The build container then has the same network access as any other resource in that subnet. Security groups and route table rules govern what it can reach.

flowchart TD
  subgraph VPC["Your VPC"]
    subgraph PrivSubnet["Private Subnet"]
      CB["CodeBuild
Container
(ENI attached)"]
      RDS["RDS
Database"]
      EC["ElastiCache
Cluster"]
      API["Internal
API"]
    end
    subgraph PubSubnet["Public Subnet"]
      NAT["NAT
Gateway"]
    end
    CB -->|Integration tests| RDS
    CB -->|Cache operations| EC
    CB -->|Contract tests| API
    CB -->|Internet access
via route table| NAT end NAT -->|Internet| IGW["Internet Gateway"] IGW --> NPM["npm Registry Docker Hub GitHub"] CB -.->|VPC Endpoint| S3["S3 (Artifacts)"] CB -.->|VPC Endpoint| CW["CloudWatch Logs"]
VPC-enabled CodeBuild architecture

Here is what catches teams: the NAT Gateway. Run CodeBuild in a private subnet and it loses direct internet access. Need to pull dependencies from npm, PyPI, Maven Central, Docker Hub? Traffic must route through a NAT Gateway. Without one, dependency downloads fail with timeout errors. And the error messages are terrible. You see ETIMEOUT from npm. Not "you forgot to configure a NAT Gateway." I have seen engineers burn hours on this one.

For AWS service access from VPC-enabled builds, use VPC endpoints (PrivateLink). Why route S3, CloudWatch Logs, ECR, and Secrets Manager traffic through the NAT Gateway at $0.045/GB when VPC endpoints give you private, direct connectivity at lower cost and lower latency? At minimum, I configure VPC endpoints for S3 (gateway endpoint, free) and CloudWatch Logs (interface endpoint, required for build log delivery in VPC mode).

The performance hit from VPC mode is real. ENI attachment adds 20-40 seconds to the PROVISIONING phase. Every build. Unavoidable. The ENI must be created, attached, and confirmed operational before the build starts. For builds that run frequently and are time-sensitive, this overhead argues strongly for using VPC mode only when you genuinely need private resource access. Do not enable it as a default security posture. You are paying for it on every single build.

Security groups should follow least-privilege. The security group on CodeBuild's ENI should allow outbound traffic only to the specific resources the build needs. RDS on port 5432. ElastiCache on port 6379. HTTPS on 443 for external dependencies. No inbound traffic. Ever. There is zero reason for anything to initiate a connection to a build container.

Batch Builds and Build Graphs

Batch builds let a single trigger execute multiple builds. In parallel, in sequence, or in a dependency graph. This is how CodeBuild handles the monorepo problem. One repository, multiple services, libraries, or deployment targets. A batch build compiles, tests, and deploys each component independently while respecting dependencies between them.

Fan-out builds run multiple independent tasks in parallel. Typical use case: linting, unit tests, and security scanning all at once.

flowchart TD
  SRC["Source
Checkout"] --> LINT["Lint"]
  SRC --> UT["Unit Tests"]
  SRC --> SEC["Security
Scan"]
  LINT --> INT["Integration
Tests"]
  UT --> INT
  SEC --> INT
  INT --> PKG["Package &
Deploy"]
Batch build graph with fan-out and fan-in

Lint, unit tests, and security scan run in parallel after source checkout. Integration tests wait until all three succeed. Package runs only after integration tests pass. Upstream failure? Downstream tasks get skipped automatically.

Build graphs define these dependency relationships declaratively in the buildspec:

batch:
  fast-fail: true
  build-graph:
    - identifier: lint
      buildspec: buildspec-lint.yml
      env:
        compute-type: BUILD_LAMBDA_1GB
    - identifier: unit_test
      buildspec: buildspec-test.yml
      env:
        compute-type: BUILD_GENERAL1_MEDIUM
    - identifier: security_scan
      buildspec: buildspec-security.yml
      env:
        compute-type: BUILD_LAMBDA_2GB
    - identifier: integration_test
      buildspec: buildspec-integration.yml
      depend-on:
        - lint
        - unit_test
        - security_scan
    - identifier: package
      buildspec: buildspec-package.yml
      depend-on:
        - integration_test
Configuration OptionDescriptionDefault
fast-failStop all builds if any build in the batch failstrue
max-parallelMaximum number of builds running concurrently in the batchNo limit (account concurrent build limit applies)
build-graphDefine dependency relationships between build tasksNone (all tasks run in parallel)
build-listRun multiple independent builds with different configurationsNone
build-matrixGenerate builds from combinations of environment variables and imagesNone
batch-report-modeHow batch build status is reported (REPORTINDIVIDUALBUILDS or REPORTAGGREGATEDBATCH)REPORTAGGREGATEDBATCH

build-matrix is great for cross-platform builds. Define a matrix of operating systems, language versions, and architectures. CodeBuild generates a build for each combination. Three OS images times four language versions equals 12 builds, all running in parallel with their own compute and buildspec context.

Security Architecture

Three pillars to CodeBuild's security model: IAM-based access control, ephemeral build isolation, and integration with AWS secrets management services.

IAM Service Role

Every CodeBuild project needs an IAM service role. The build environment assumes it during execution. This role determines what AWS resources the build can touch. Least privilege is critical here. A build that only pushes a Docker image to ECR and uploads artifacts to S3 should not have permissions to modify IAM roles, delete DynamoDB tables, or access production databases. Obviously.

And yet. A common anti-pattern: granting AdministratorAccess to the CodeBuild service role "because the build needs access to a lot of things." Dangerous. A compromised dependency (a malicious npm package, a typo-squatted Python library) executes with whatever permissions the build role has. If that role has admin access, a supply-chain attack through a build dependency owns your entire AWS account.

Secrets Management

Environment variables can reference Parameter Store and Secrets Manager values. The approach you choose has real security implications.

Secret SourceEncryptionRotationAudit TrailCostBest For
Plaintext env varsNone: visible in project config and build logsManualCloudTrail (project config changes only)FreeNon-sensitive configuration only
Parameter Store (SecureString)KMS encryption at rest, decrypted at build timeManualCloudTrail API callsFree (standard) / $0.05 per 10K API calls (advanced)Database passwords, API endpoints, feature flags
Secrets ManagerKMS encryption at rest, decrypted at build timeAutomatic rotation supportedCloudTrail API calls$0.40/secret/month + $0.05 per 10K API callsAPI keys, OAuth tokens, credentials requiring rotation

My approach: Secrets Manager for all credentials that could be rotated (database passwords, API keys, service account tokens). Parameter Store for sensitive config that does not need rotation (internal service endpoints, environment-specific settings). Plaintext environment variables? Strictly for non-sensitive stuff. Build mode flags. Log levels. That is it.

Build Container Isolation

Each build runs in a dedicated container. Provisioned fresh. Destroyed after completion. No persistent filesystem shared between builds. No ability for one build to inspect another build's data. No network connectivity between concurrent builds. So even if a build gets compromised (malicious dependency, for example), the blast radius is limited to that single execution. The compromised build cannot access secrets from other builds, modify source code from other projects, or persist a backdoor for future builds. The ephemeral model makes containment automatic.

IAM Permissions Matrix

RelationshipRequired PermissionsPurpose
CodeBuild to S3s3:GetObject, s3:PutObject, s3:GetBucketAcl, s3:GetBucketLocationSource download, artifact upload, cache read/write
CodeBuild to ECRecr:GetAuthorizationToken, ecr:BatchCheckLayerAvailability, ecr:GetDownloadUrlForLayer, ecr:BatchGetImage, ecr:PutImage, ecr:InitiateLayerUpload, ecr:UploadLayerPart, ecr:CompleteLayerUploadPull custom build images, push built Docker images
CodeBuild to CloudWatch Logslogs:CreateLogGroup, logs:CreateLogStream, logs:PutLogEventsBuild log delivery (required for all builds)
CodeBuild to Secrets Managersecretsmanager:GetSecretValueResolve secret references in buildspec environment variables
CodeBuild to VPCec2:CreateNetworkInterface, ec2:DescribeNetworkInterfaces, ec2:DeleteNetworkInterface, ec2:DescribeSubnets, ec2:DescribeSecurityGroups, ec2:DescribeDhcpOptions, ec2:DescribeVpcs, ec2:CreateNetworkInterfacePermissionAttach ENI for VPC-enabled builds
CodeBuild to KMSkms:Decrypt, kms:DescribeKey, kms:GenerateDataKeyDecrypt secrets, encrypt/decrypt artifacts and cache

Pricing and Cost Optimization

Pricing is simple. Pay per build minute. One-minute minimum. No charge for QUEUED or PROVISIONING phases. Billing starts at DOWNLOAD_SOURCE and ends when FINALIZING completes. So the 20-40 seconds of VPC ENI attachment overhead? Free. The 5-15 seconds of source download? Billed. Good to know where the meter starts.

Per-Minute Pricing

Compute TypeOn-Demand (per minute)Reserved Capacity (per minute, estimated)
build.general1.small$0.005~$0.003 (40% savings)
build.general1.medium$0.010~$0.006 (40% savings)
build.general1.large$0.020~$0.012 (40% savings)
build.general1.xlarge$0.040~$0.024 (40% savings)
build.general1.2xlarge$0.080~$0.048 (40% savings)
build.lambda.1gb$0.002N/A
build.lambda.2gb$0.004N/A
build.lambda.4gb$0.007N/A
build.lambda.8gb$0.013N/A
build.lambda.10gb$0.017N/A

Monthly Cost Estimates

Assuming a 5-minute average build on build.general1.medium ($0.010/min):

Builds per MonthBuild MinutesOn-Demand Monthly CostReserved Capacity Monthly CostSavings
100500$5.00~$3.00$2.00
5002,500$25.00~$15.00$10.00
2,00010,000$100.00~$60.00$40.00
5,00025,000$250.00~$150.00$100.00

These numbers are pure build compute. They do not include data transfer, S3 storage for artifacts and cache, CloudWatch Logs ingestion, NAT Gateway charges for VPC-enabled builds, or Secrets Manager API calls. Watch out for VPC-enabled builds with heavy dependency downloads. NAT Gateway data processing at $0.045/GB can easily exceed the CodeBuild compute cost itself. I have seen NAT Gateway bills that were 3x the CodeBuild bill on the same project.

Cost Optimization Strategies

Right-size your compute. Most builds do not need xlarge or 2xlarge. Start with small or medium. Upgrade only when build times show a CPU or memory bottleneck. Check MemoryUtilized and CPUUtilized in CloudWatch. A build using 2 GB of memory on a 15 GB large instance? Wasting 87% of the memory you are paying for.

Always enable caching. S3 caching alone cuts build times 30-60% for dependency-heavy projects. Local Docker layer caching can reduce Docker builds from minutes to seconds when only the application layer changes. S3 storage for caches costs $0.023/GB/month. Negligible compared to the build minutes saved.

Use Lambda compute for lightweight builds. Linting, formatting, static analysis, simple deployments. These often complete in under a minute. Lambda at $0.002/min is 5x cheaper than the smallest on-demand instance. Provisions 10x faster.

Reserved capacity for predictable workloads. Running more than 500 builds per day consistently? Reserved capacity fleets save roughly 40% over on-demand. The break-even depends on your build volume distribution. Fleet utilization needs to stay above 60% for the math to work.

Minimize the INSTALL phase. Every minute installing dependencies is a minute you are paying for. Bake frequently-used tools into a custom Docker image. Available immediately when the build starts. If your INSTALL phase takes 3 minutes and you run 1,000 builds per month, that is 3,000 minutes. $30 on medium compute. Reinstalling the same dependencies every single time.

Integration Patterns

CodeBuild gets more useful when you wire it into other AWS services. The integrations are where the real value lives.

With CodePipeline

CodePipeline is the natural orchestration layer. A pipeline stage can contain one or more CodeBuild actions, sequential or parallel. CodePipeline handles the source checkout, passes artifacts between stages, manages manual approval gates, and provides pipeline-level visibility. The standard pattern for AWS-native CI/CD: CodePipeline triggers on source change, CodeBuild compiles and tests, CodeDeploy deploys. See AWS CodePipeline: An Architecture Deep-Dive for more on pipeline orchestration.

With EventBridge

CodeBuild emits build state change events to EventBridge automatically. Every phase transition (started, succeeded, failed) generates an event. Match it with EventBridge rules and route to any target. SNS for notifications. Lambda for custom processing. Step Functions for complex post-build workflows. Much more flexible than polling the CodeBuild API. This is the right approach for build notifications, automated deployments triggered by successful builds, and pipeline orchestration outside of CodePipeline.

With S3

S3 plays three roles in CodeBuild architectures. Source provider: you can trigger builds from S3 object uploads. Artifact store: build outputs upload to S3. Cache backend: S3 caching persists build caches across builds. Teams that package source as zip files or use S3 in a deployment pipeline get a simple integration point. No CodeCommit or GitHub required.

With ECR

The CodeBuild-ECR integration goes both directions. CodeBuild pulls custom build images from ECR at the start of every build (when configured). Builds commonly push Docker images to ECR as output. One gotcha here: the ECR pull credential for the build image itself is automatic when the service role has the right permissions. No manual docker login needed. But pushing images to ECR within the build phase? That requires an explicit aws ecr get-login-password in PRE_BUILD. Easy to forget.

Common Failure Modes

Production CodeBuild environments hit a predictable set of failure modes. Know them in advance. Build mitigations into your configurations. Saves hours of debugging.

Failure ModeSymptomRoot CauseMitigation
Build timeoutBuild status FAILED, phase shows TIMED_OUT at 60 minutes (default)Build commands hang or take longer than the timeout allowsSet explicit timeoutInMinutes (max 480). Investigate hanging commands, often a test waiting for a network resource that is unreachable.
Out of memory (OOM)Build fails with exit code 137 or COMMAND_EXECUTION_ERROR with no clear errorBuild process exceeds available container memoryUpgrade to a larger compute type. Monitor memory usage during builds. Java builds are the most common offender: set explicit -Xmx heap limits.
VPC ENI limit exhaustedBuild stuck in PROVISIONING phase for extended period, then failsAccount or subnet has reached its ENI limit, preventing CodeBuild from attaching a network interfaceRequest ENI limit increase via Service Quotas. Use larger subnets with more available IPs. Spread builds across multiple subnets.
Docker pull rate limittoomanyrequests error during PROVISIONING or INSTALL phaseDocker Hub rate limits: 100 pulls/6h (anonymous), 200 pulls/6h (authenticated)Mirror images to ECR. Authenticate Docker Hub pulls. Use AWS-managed images which pull from internal registries.
S3 cache miss after TTLBuild time suddenly increases; INSTALL/PRE_BUILD phases take much longerCache object was deleted or expired in S3, or cache key changedMonitor cache hit rates. Set appropriate S3 lifecycle policies. Ensure cache key includes dependency lock file hash.
Secrets Manager access deniedAccessDeniedException during environment variable resolution before INSTALL phaseCodeBuild service role lacks secretsmanager:GetSecretValue permission for the specific secret ARNGrant least-privilege Secrets Manager permissions on the specific secret ARN, not *. Check KMS key policy if using CMK encryption.
Buildspec syntax errorBuild fails immediately at DOWNLOAD_SOURCE or INSTALL with YAML parse errorInvalid YAML syntax, incorrect indentation, or unsupported buildspec versionValidate buildspec YAML locally before committing. Use version: 0.2 (current version). Test with aws codebuild start-build using inline buildspec.
Artifact upload failureBuild phase succeeds but overall build fails at UPLOAD_ARTIFACTSS3 bucket does not exist, permissions missing, or artifact path patterns match no filesVerify S3 bucket exists and service role has PutObject permission. Test artifact file patterns locally. Use discard-paths: no to preserve directory structure.
Concurrent build limitBuilds stuck in QUEUED state for extended periodAccount concurrent build limit (default 60) reached during peak activityRequest limit increase via Service Quotas. Stagger builds. Use batch builds to consolidate related builds. Prioritize builds using queued timeout.

OOM failures deserve a closer look. They are common and hard to diagnose. When the Linux OOM killer terminates a process, the build log often shows nothing. No error message. The process just stops. The build times out or reports exit code 137. Most common causes: Java builds with unbounded heap sizes, Node.js builds with huge dependency trees, and Docker builds caching large intermediate layers in memory. Set explicit memory limits. Always. -Xmx for Java. --max-old-space-size for Node.js. And pick a compute type with at least 50% headroom above your expected peak memory usage. You will thank yourself later.

Key Architectural Recommendations

What I have found most valuable when running CodeBuild in production:

  1. Right-size compute from the start. Begin every new project on build.general1.small or build.general1.medium. Let metrics guide upgrades. Most builds are not CPU-bound. They spend the majority of time downloading dependencies and uploading artifacts. More vCPUs does not help. Review CloudWatch metrics after the first week of production builds. Adjust only if CPU or memory utilization consistently exceeds 70%.
  2. Always configure caching. Both local and S3. Enable local Docker layer cache and local custom cache on every project. Add S3 caching for the dependency directories specific to your build tool (node_modules, .m2/repository, .pip/cache, vendor/). Local plus S3 gives you fast hits in the common case and reliable availability on cold starts. Cache configuration takes five minutes. The time savings compound across every build for the life of the project.
  3. Use Secrets Manager for credentials. Not plaintext environment variables. Non-negotiable. Plaintext env vars are visible in the project configuration, in CloudTrail logs, and potentially in build logs. One leaked API key or database password can compromise an entire environment. Secrets Manager provides encryption, access auditing, and automatic rotation. $0.40/secret/month. Cheap insurance.
  4. Enable VPC mode only when you need private resource access. VPC mode adds 20-40 seconds of provisioning overhead to every build. Requires a NAT Gateway for internet access ($0.045/GB plus hourly charge). Consumes ENIs from your subnet. If your build only compiles code and pushes to S3 or ECR, skip it. Default mode is faster, cheaper, simpler. Reserve VPC mode for integration tests against RDS, ElastiCache, or internal APIs.
  5. Use batch builds for monorepos. One repository, multiple services or components. Batch builds with a build graph let each component build and test independently while respecting inter-component dependencies. Faster than serial builds (independent components run in parallel). More reliable than a monolithic build (a failure in one component does not block unrelated components).
  6. Lambda compute for linting, formatting, and lightweight checks. Provisions in seconds. Costs a fraction of on-demand EC2. More than sufficient for tasks that do not need Docker or heavy compilation. Moving lint and format checks to Lambda can cut the cost and latency of your pre-merge CI pipeline by 50% or more.
  7. Consider reserved capacity fleets above 500 builds per day. On-demand provisioning overhead (20-90 seconds per build) adds up fast at scale. Reserved capacity fleets eliminate provisioning time entirely. Containers are pre-warmed and waiting. The roughly 40% savings over on-demand pricing offset the commitment of paying for idle capacity during off-peak hours. Run the numbers for your specific volume before committing.
  8. Standardize buildspec files with shared templates. Twenty-plus CodeBuild projects with individual buildspec files becomes a configuration management nightmare. Define a standard structure: consistent phase naming, common caching configuration, shared artifact patterns. Use buildspec overrides or shared files in S3 to cut duplication. Makes it easier to enforce security policies, operational standards, and cost controls across the board.

Additional Resources

  • AWS CodeBuild User Guide: comprehensive reference for project configuration, build environment settings, and operational guidance
  • AWS CodeBuild buildspec reference: complete specification of all buildspec sections, phases, environment variable sources, artifact configurations, and cache settings
  • AWS CodeBuild VPC support documentation: detailed guidance on VPC configuration, subnet requirements, security group rules, and NAT Gateway setup for VPC-enabled builds
  • AWS CodeBuild batch builds and build graphs: configuration reference for fan-out builds, build matrices, and dependency graphs
  • AWS CodeBuild pricing page: current per-minute pricing for all compute types, regions, and reserved capacity options
  • AWS CodeBuild Docker sample: step-by-step guide for building and pushing Docker images using CodeBuild with ECR integration
  • AWS CodePipeline documentation: orchestration layer for multi-stage CI/CD pipelines using CodeBuild as the build provider
  • AWS Well-Architected DevOps Lens: architectural guidance for CI/CD systems including build infrastructure best practices
  • AWS CodeBuild quotas and service limits: current limits for concurrent builds, build timeout, artifact sizes, and API throttling rates
  • AWS Security Blog, CI/CD pipeline security best practices: guidance on securing build environments, managing secrets, and implementing least-privilege IAM policies

Let's Build Something!

I help teams ship cloud infrastructure that actually works at scale. Whether you're modernizing a legacy platform, designing a multi-region architecture from scratch, or figuring out how AI fits into your engineering workflow, I've seen your problem before. Let me help.

Currently taking on select consulting engagements through Vantalect.