Skip to main content

AWS DynamoDB: An Architecture Deep-Dive

AWSArchitectureDatabasesServerless

About the author: I'm Charles Sieg, a cloud architect and platform engineer who builds apps, services, and infrastructure for Fortune 1000 clients through Vantalect. If your organization is rethinking its software strategy in the age of AI-assisted engineering, let's talk.

DynamoDB sits at the center of more AWS architectures than any other database service. I've used it for everything from mobile backends handling millions of daily active users to event-sourced systems processing tens of thousands of writes per second. Most teams treat it as a simple key-value store, plug it in, and move on. That works until they hit a hot partition at 3 AM, discover their GSI is throttling independently of the base table, or realize their on-demand table costs three times what provisioned capacity would have. After years of running DynamoDB at scale, I've accumulated enough operational scars to fill this reference. Patterns, trade-offs, cost traps, and the internal mechanics that explain why DynamoDB behaves the way it does.

This is an architecture reference for engineers who already know the basics. If you need a getting-started tutorial, AWS has plenty. What follows covers how DynamoDB actually works under the hood, how to design for its strengths, and which failure modes will find you in production.

How DynamoDB Works Internally

Understanding DynamoDB's internals changes how you design for it. AWS has published enough about the architecture (particularly through the 2022 USENIX paper and re:Invent talks) to build a solid mental model.

Request Routing

Every DynamoDB API call hits a request router first. The router authenticates the request, resolves which partition holds the target data by hashing the partition key, and forwards the request to the correct storage node. The router maintains a partition map that tells it which storage node group owns which key range. When partitions split, the router's map updates accordingly.

This architecture means the partition key is the single most important design decision you make. Every read and every write goes through a hash function that determines the physical location. A poorly chosen partition key funnels traffic to a single storage node group, and no amount of provisioned capacity fixes that.

Storage and Replication

Each partition stores data in a B-tree structure on SSD-backed storage. Every partition maintains three replicas spread across three Availability Zones within the region. One replica serves as the leader; the other two are followers. DynamoDB uses a Multi-Paxos consensus protocol to coordinate writes.

The write path works as follows: the leader generates a write-ahead log (WAL) entry, sends it to both followers, and acknowledges the write once two of three replicas (a quorum) persist the log record. This gives you durability across two AZs before the client receives a success response. The leader then applies the write to its B-tree.

Strongly consistent reads go to the leader replica. Eventually consistent reads can go to any replica, which is why they cost half as much (the load spreads across all three replicas instead of concentrating on the leader).

Control Plane vs. Data Plane

DynamoDB maintains a clean separation between control plane and data plane:

ComponentResponsibilityOperationsAvailability Impact
Control PlaneTable management, configurationCreateTable, UpdateTable, DeleteTable, DescribeTableConfiguration changes only
Data PlaneRead/write trafficGetItem, PutItem, Query, Scan, BatchWriteDirect application impact
Auto AdminPartition management, splitting, health monitoringAutomatic (no API)Background operations

The data plane operates independently of the control plane. Once a table is configured, it keeps serving traffic even if the control plane is degraded. This is the same pattern you see across AWS services (AWS Elastic Load Balancing: An Architecture Deep-Dive covers the same separation for ELB). During a control plane outage, your existing tables keep working; you just cannot create new tables or modify existing ones.

Auto Admin and Partition Management

The auto-admin subsystem handles partition health monitoring, splitting, and rebalancing. It runs continuously, watching for partitions that are approaching size limits (10 GB per partition) or throughput limits. When a partition needs to split, auto-admin selects a split point, creates new partitions, migrates data, and updates the request router's partition map.

This process is transparent, but understanding it explains several DynamoDB behaviors that surprise teams in production. Partition splits take time. During splits, the old partition continues serving traffic while data migrates. If you see brief latency spikes during sustained high-throughput writes, partition splits are often the cause.

Capacity Modes and Throughput

DynamoDB offers two capacity modes. Picking the wrong one is the most common cost mistake I see.

On-Demand Mode

On-demand mode charges per request. No capacity planning required. DynamoDB automatically scales to handle your traffic. The service tracks your recent peak and can instantly handle double that peak. If sustained traffic exceeds the doubled peak, the new level becomes the baseline for future scaling.

MetricOn-Demand Behavior
ScalingAutomatic, based on recent traffic peaks
Instant capacity2x the previous traffic peak
New table default4,000 WRU/s and 12,000 RRU/s
Throttling riskPossible if traffic spikes beyond 2x previous peak instantly
Pricing (US East, Standard)$1.25 per million WRU; $0.25 per million RRU

On-demand tables can still throttle. If your traffic jumps from near-zero to 50,000 writes per second without any ramp-up, DynamoDB has not observed enough traffic history to pre-allocate capacity. The 2x scaling rule requires a previous peak to double. For launch-day scenarios or scheduled batch jobs that spike from zero, pre-warm by gradually ramping traffic or temporarily switching to provisioned mode.

Provisioned Mode

Provisioned mode lets you specify read capacity units (RCUs) and write capacity units (WCUs). You pay for the capacity you provision, whether you use it or not.

MetricProvisioned Behavior
ScalingManual or auto-scaling (CloudWatch-driven)
Cost (US East, Standard)$0.00065/WCU/hour; $0.00013/RCU/hour
Reserved capacityUp to 77% savings with 1-year or 3-year commitments
Burst capacity300 seconds of unused capacity banked
Decrease limits4 per day initially, up to 27 per 24-hour period

Auto-scaling with provisioned mode is the cost-optimal choice for workloads with predictable patterns. Set a target utilization of 70%, configure reasonable min/max bounds, and let CloudWatch Alarms trigger scaling actions. The catch: auto-scaling reacts to CloudWatch metrics, which have a 1-2 minute delay. If your traffic spikes faster than that, you will see throttling before auto-scaling catches up.

Choosing Between Modes

flowchart TD
    A[New DynamoDB
Table] --> B{Traffic pattern
predictable?} B -->|Yes| C{Steady baseline
with known peaks?} B -->|No| D[On-Demand Mode] C -->|Yes| E[Provisioned +
Auto-Scaling] C -->|No| F{Budget
constrained?} F -->|Yes| G[Provisioned +
Reserved Capacity] F -->|No| D E --> H{Cost optimization
priority?} H -->|Yes| I[Add Reserved
Capacity for baseline] H -->|No| J[Provisioned with
Auto-Scaling only]
Capacity mode decision tree

The rule I follow: start with on-demand for new workloads where you do not know the traffic pattern. After 2-4 weeks of production data, evaluate whether provisioned mode with auto-scaling would cost less. For most steady-state workloads, provisioned mode saves 40-60% compared to on-demand.

Read and Write Unit Mechanics

Understanding how RCUs and WCUs map to actual operations prevents surprise throttling:

OperationUnit CostSize Unit
Strongly consistent read1 RCUPer 4 KB
Eventually consistent read0.5 RCUPer 4 KB
Transactional read2 RCUPer 4 KB
Standard write1 WCUPer 1 KB
Transactional write2 WCUPer 1 KB

Items larger than these thresholds consume proportionally more units, rounded up. A 6 KB strongly consistent read costs 2 RCUs. A 3.5 KB write costs 4 WCUs. Keeping items small directly reduces throughput consumption.

Partition Mechanics and Scaling

Partitions are the fundamental scaling unit. Every performance problem I have debugged in DynamoDB traces back to partition behavior.

Partition Throughput Limits

Each partition supports a fixed maximum throughput:

ResourceLimit per Partition
Read throughput3,000 RCU
Write throughput1,000 WCU
Storage10 GB

Your table's total throughput is the sum of all partition throughputs. A table with 10 partitions supports up to 30,000 RCUs and 10,000 WCUs in aggregate, but only if traffic distributes evenly across partitions. If 80% of your reads target items in a single partition, that partition's 3,000 RCU limit becomes your effective ceiling regardless of how much capacity you provisioned at the table level.

Split for Heat

When DynamoDB detects a partition receiving sustained high throughput, it automatically splits that partition into two. The split point is chosen based on recent traffic patterns to distribute load evenly between the new partitions. This doubles the available throughput for that key range at no additional cost.

Split for heat cannot help in every scenario:

ScenarioSplit for Heat Effective?Explanation
Many items, distributed hot keysYesSplit distributes items across new partitions
Single hot itemNoThe item lives in one partition; splitting does not help
LSI present on tableLimitedCannot split within an item collection
Ever-increasing sort keyLimitedNew writes always target the latest partition

The single hot item problem is the most common production issue I encounter. A counter, a leaderboard, a "latest" record: any pattern where the majority of writes target one partition key cannot benefit from split for heat. The solution is application-level sharding: append a random suffix to the partition key and aggregate on read.

Adaptive Capacity

Adaptive capacity complements split for heat. It automatically reallocates unused throughput from cold partitions to hot partitions. If partition A uses 200 WCUs of its 1,000 WCU allocation while partition B needs 1,500 WCUs, adaptive capacity shifts some of partition A's unused capacity to partition B.

This happens automatically with no configuration required. Adaptive capacity reduces, but does not eliminate, throttling from uneven access patterns. It works best when the overall table has enough provisioned capacity; it simply redistributes it more effectively.

Partition Key Design

Good partition key design is the single highest-leverage architecture decision for DynamoDB:

PatternExampleDistribution
High cardinality, uniform accessUserID, OrderID, SessionIDExcellent
High cardinality, skewed accessCustomerID (some customers 1000x more active)Needs write sharding
Low cardinalityStatus (ACTIVE/INACTIVE), Region (5 values)Poor; hot partitions guaranteed
Time-basedDate, HourPoor; all writes target current period

For skewed access patterns, use write sharding: append a random number (0-N) to the partition key and scatter-gather on reads. For time-series data, combine the timestamp with a high-cardinality attribute (device ID, sensor ID) as the partition key.

Secondary Indexes

DynamoDB provides two types of secondary indexes, and choosing wrong creates problems that are expensive to fix.

Global Secondary Indexes (GSI)

A GSI creates a separate, fully independent partition structure with its own partition key and sort key. DynamoDB asynchronously replicates items from the base table to each GSI.

Key architecture details:

  • GSIs have their own throughput capacity, separate from the base table
  • GSI writes are eventually consistent (the replication is asynchronous)
  • A throttled GSI back-pressures writes to the base table
  • Maximum 20 GSIs per table (adjustable)
  • GSIs can project a subset of attributes, reducing storage and throughput costs
Note
GSI throttling is the most common surprise in production DynamoDB. If your GSI cannot keep up with base table write throughput, DynamoDB throttles writes to the base table itself. Always provision GSI write capacity at or above the base table's write capacity.

Local Secondary Indexes (LSI)

An LSI shares the partition key with the base table but uses a different sort key. All items with the same partition key (across the base table and all LSIs) form an item collection.

CharacteristicGSILSI
Partition keyAny attributeSame as base table
Sort keyAny attributeDifferent from base table
ThroughputIndependent capacityShares base table capacity
ConsistencyEventually consistent onlyEventually or strongly consistent
When to createAnytimeTable creation only
Item collection limitNone10 GB per partition key value
Maximum per table205

The 10 GB item collection limit on LSIs is a hard constraint. If any partition key's total data (base table items plus all LSI entries) exceeds 10 GB, writes for that partition key fail. I have seen this kill production systems when a high-volume entity (a busy tenant in a multi-tenant system) crosses the threshold with no warning. If you use LSIs, monitor the ItemCollectionSizeLimitExceeded metric.

Single-Table Design

Single-table design stores multiple entity types in one DynamoDB table, using composite partition keys and sort keys to model relationships. The pattern gained popularity through Rick Houlihan's re:Invent talks and Alex DeBrie's "The DynamoDB Book."

Advantages: all related entities co-located in the same partitions, enabling efficient queries across entity types with a single Query operation. No joins needed.

Drawbacks: complex key design, harder to reason about, GSI overloading can make the table opaque to new team members. With the November 2025 launch of multi-attribute composite keys for GSIs (up to four attributes per key), some of the synthetic key concatenation complexity has been reduced.

My recommendation: use single-table design for access-pattern-heavy workloads where you know all query patterns upfront. Use multi-table design when your access patterns evolve frequently or when different entity types have vastly different throughput characteristics.

Global Tables

Global tables replicate a DynamoDB table across multiple AWS regions with sub-second replication latency.

Replication Architecture

Each region maintains a full, independent replica. Writes to any replica propagate to all other replicas asynchronously. DynamoDB uses last-writer-wins conflict resolution based on timestamps for concurrent writes to the same item in different regions.

Consistency Models

Global tables now support two consistency models:

ModelAbbreviationBehaviorUse Case
Multi-Region Eventually ConsistentMRECWrites replicate asynchronously; brief inconsistency windowMost applications; highest availability
Multi-Region Strongly ConsistentMRSCReads guaranteed to reflect all prior writes globallyFinancial transactions, inventory systems

MRSC is a significant addition (launched at re:Invent 2024). Previously, global tables only supported eventual consistency, which made them unsuitable for workloads requiring guaranteed read-after-write consistency across regions. MRSC uses a coordination protocol across regions, which increases write latency (cross-region round trip) but guarantees consistency.

Multi-Account Global Tables

As of 2025, DynamoDB supports multi-account global tables. You can replicate table data across different AWS accounts and regions, adding account-level isolation. This is valuable for organizations using separate accounts for production, staging, and disaster recovery, or for regulated industries requiring strict account boundaries.

Global Tables and DAX

A critical operational gotcha: writes that arrive at a replica via global table replication bypass DAX. The DAX cache does not update when a replication write occurs. Your cache will serve stale data until the TTL expires. If you use both global tables and DAX, set aggressive TTLs on DAX and accept that reads may lag behind cross-region writes.

DynamoDB Streams and Change Data Capture

DynamoDB Streams captures a time-ordered sequence of item-level modifications. Every write (put, update, delete) generates a stream record.

Stream Architecture

Stream records are organized into shards (similar to Kinesis shards). Each shard has a parent-child relationship that reflects partition splits. Stream records are available for 24 hours. You configure stream view type at the table level:

View TypeContentsUse Case
KEYS_ONLYPartition key and sort key onlyTriggering downstream by key
NEW_IMAGEComplete item after modificationReplication, search index updates
OLD_IMAGEComplete item before modificationAudit trails, rollback
NEWANDOLD_IMAGESBoth before and afterChange comparison, CDC pipelines

Integration Patterns

DynamoDB Streams integrates directly with Lambda for event-driven architectures. Each stream shard can support up to 2 simultaneous readers (1 reader for global tables to avoid throttling). Common patterns:

Kinesis Data Streams Integration

As an alternative to DynamoDB Streams, you can route change data capture records to a Kinesis Data Stream. This gives you longer retention (up to 365 days vs. 24 hours), more consumers per shard, and integration with the broader Kinesis ecosystem. The trade-off is additional cost for the Kinesis stream.

DynamoDB Accelerator (DAX)

DAX is a fully managed, in-memory cache that sits in front of DynamoDB. It provides microsecond read latency for cached items.

DAX Architecture

A DAX cluster runs within your VPC with one primary node and up to 10 read replica nodes. The primary handles writes; read replicas serve read traffic. DAX maintains two caches:

CacheStoresPopulated ByTTL Default
Item cacheIndividual items by primary keyGetItem, BatchGetItem5 minutes
Query cacheFull result setsQuery, Scan5 minutes

DAX is a write-through cache: writes go through DAX to DynamoDB, and the item cache updates immediately. The query cache does not invalidate on writes; it relies purely on TTL expiration.

When to Use DAX (and When Not To)

ScenarioDAX Recommended?Reason
Read-heavy, repeated key accessYesMicrosecond latency, reduced RCU consumption
Write-heavy workloadsNoDAX adds latency to writes; minimal benefit
Strongly consistent reads requiredNoDAX serves eventually consistent data only
Infrequent, unique key accessNoCache miss rate too high; adds latency and cost
Global tablesUse with cautionReplication writes bypass DAX; stale cache risk

DAX Pricing

DAX instance pricing varies by node type:

Instance TypevCPUsMemoryCost/Hour (US East)
dax.t3.small22 GB~$0.04
dax.r5.large216 GB~$0.29
dax.r5.xlarge432 GB~$0.58
dax.r5.8xlarge32256 GB~$4.64

A production DAX cluster (3 nodes across AZs using r5.large) costs approximately $630/month. Compare that against the RCU cost it replaces to determine ROI.

Pricing and Cost Optimization

DynamoDB pricing catches teams off guard because the cost model is fundamentally different from traditional databases. You pay for throughput, storage, and features independently.

Cost Breakdown

ComponentOn-Demand (US East)Provisioned (US East)
Writes$1.25/million WRU$0.00065/WCU/hour (~$0.47/WCU/month)
Reads$0.25/million RRU$0.00013/RCU/hour (~$0.09/RCU/month)
Storage (Standard)$0.25/GB/month$0.25/GB/month
Storage (Standard-IA)$0.10/GB/month$0.10/GB/month
Backups (warm)$0.10/GB/month$0.10/GB/month
Backups (cold)$0.03/GB/month$0.03/GB/month
Streams reads$0.02/100K read requests$0.02/100K read requests
Global table rWRUs$1.875/million rWRUN/A (uses CRR WCU)

Cost Optimization Strategies

1. Right-size capacity mode. On-demand costs 5-7x more per unit than provisioned capacity for steady workloads. Run on-demand for the first month to establish baselines, then switch to provisioned with auto-scaling.

2. Use reserved capacity. For predictable baseline throughput, reserved capacity (1-year or 3-year term) saves up to 77% compared to on-demand pricing.

3. Database Savings Plans. Launched December 2025, these plans offer committed-use discounts across DynamoDB (including on-demand tables), RDS, and other managed databases. Unlike reserved capacity, they apply automatically across accounts and regions.

4. Standard-IA table class. For tables with infrequent access (archival data, configuration stores), Standard-IA cuts storage costs by 60% ($0.10/GB vs. $0.25/GB). Read and write unit costs are approximately 25% higher, so this only saves money when storage dominates your bill.

5. Minimize item size. Compress large attribute values. Store large blobs in S3 and keep only a reference in DynamoDB. Every KB matters because it directly multiplies your RCU/WCU consumption.

6. Project only needed attributes in GSIs. Full attribute projection on GSIs doubles storage and write costs. Use KEYS_ONLY or INCLUDE with specific attributes.

7. Use eventually consistent reads. If your application tolerates it, eventually consistent reads cost half of strongly consistent reads. For many use cases (product catalogs, user profiles displayed on dashboards), eventual consistency is perfectly acceptable.

Failure Modes and Operational Lessons

Throttling

Throttling is the most common operational issue. DynamoDB returns ProvisionedThroughputExceededException when a partition exceeds its throughput limit. The AWS SDKs implement exponential backoff with jitter by default, but sustained throttling degrades application performance.

Root causes I see most frequently:

CauseSymptomFix
Hot partition keyThrottling despite low table-level utilizationRedesign partition key; add write sharding
GSI throttlingBase table writes rejectedIncrease GSI capacity; review GSI key design
On-demand cold startThrottling on new or idle table receiving sudden trafficPre-warm with gradual ramp; consider provisioned mode
Under-provisioned auto-scalingBrief throttling during rapid scale-upLower target utilization; increase minimum capacity
Scan operationsBroad throttling across partitionsReplace Scans with Queries; use parallel scan with rate limiting

The October 2025 Outage

On October 19-20, 2025, a race condition in an internal DynamoDB microservice that manages DNS records for regional cells caused a 3-hour DynamoDB outage in US-EAST-1. The failure cascaded: because EC2 instance creation depends on DynamoDB for metadata, EC2 could not launch new instances for an additional 12 hours. Over 17 million outage reports across services including Snapchat, Roblox, Reddit, and Venmo.

Lessons from that incident:

  • Multi-region is real DR. Single-region DynamoDB (even with multi-AZ replication) does not protect against regional failures. Global tables provide genuine regional independence.
  • Understand cascading dependencies. Your application depends on DynamoDB. Other AWS services also depend on DynamoDB internally. A DynamoDB outage affects services you did not realize were coupled.
  • Pre-provision, do not scale-on-demand during recovery. After the DynamoDB outage resolved, EC2 could not scale because instance creation was still impaired. If your recovery plan involves launching new compute, and the outage affects compute provisioning, your plan fails.

Item Size Limits

The 400 KB item size limit is hard. DynamoDB rejects any write that would create an item larger than 400 KB. This includes all attribute names and values. Long attribute names consume your item size budget. Use short, abbreviated attribute names for high-volume tables (store the human-readable mapping in your application code).

Transaction Limits

DynamoDB transactions support up to 100 items per transaction, with a total transaction size limit of 4 MB. Transactions cost 2x the standard read/write units. Design your data model to minimize transaction scope; if you regularly need to update more than 100 items atomically, DynamoDB may not be the right fit for that specific operation.

Service Quotas Quick Reference

QuotaDefaultAdjustable
Tables per account per region2,500Yes (max 10,000)
Item size400 KBNo
Partition key length2,048 bytesNo
Sort key length1,024 bytesNo
GSIs per table20Yes
LSIs per table5No
LSI item collection size10 GBNo
Partition throughput (read)3,000 RCUNo
Partition throughput (write)1,000 WCUNo
Partition storage10 GBNo
Account read throughput per region80,000 RCUYes
Account write throughput per region80,000 WCUYes
Table throughput (on-demand)40,000 RRU/WRUYes
Projected attributes across all indexes100No
Concurrent backups50Yes
Stream readers per shard2No
Transaction items100No
Transaction size4 MBNo

DynamoDB vs. Alternatives

Choosing DynamoDB requires understanding where it fits and where it does not.

flowchart TD
    A[Data Storage
Decision] --> B{Need relational
queries, joins,
complex SQL?} B -->|Yes| C[Aurora / RDS] B -->|No| D{Access patterns
known and stable?} D -->|Yes| E{Need single-digit
ms latency at
any scale?} D -->|No| F{Need flexible
queries on
document data?} E -->|Yes| G[DynamoDB] E -->|No| H{Need full-text
search or
analytics?} F -->|Yes| I[DocumentDB /
MongoDB] F -->|No| G H -->|Yes| J[OpenSearch] H -->|No| G
Database selection decision tree
CriteriaDynamoDBAurora (MySQL/PostgreSQL)Cassandra (Self-Managed)
Operational burdenZero (fully managed)Low (managed, some tuning)High (cluster ops, compaction, repairs)
Scaling modelAutomatic partitioningVertical + read replicasHorizontal (add nodes)
ConsistencyEventual or strong (per-read)Strong (ACID)Tunable per query
Query flexibilityPartition key + sort key + GSIsFull SQLCQL (SQL-like, limited joins)
Cost at scaleHigh for write-heavy workloadsModerate (compute-based)Low (commodity hardware)
Multi-regionGlobal tables (managed)Aurora Global DatabaseBuilt-in (manual config)
Vendor lock-inHigh (proprietary API)Moderate (standard SQL)None (open source)

DynamoDB excels when you need a fully managed, infinitely scalable database with predictable single-digit millisecond latency and you can design your access patterns around partition keys. It struggles when you need ad-hoc queries, complex aggregations, or when write volumes make the per-request cost model prohibitive.

Key Patterns

After years of building on DynamoDB, these are the patterns that consistently matter:

Design for partitions first. Every performance characteristic flows from partition key design. Invest time upfront modeling your access patterns and validating that your partition key distributes traffic evenly. Fixing a partition key in production means migrating data.

Start on-demand, migrate to provisioned. On-demand removes the risk of under-provisioning during early development and launch. After you have production traffic data, provisioned mode with auto-scaling and reserved capacity typically saves 50-70%.

Monitor GSI throttling independently. GSI throughput is separate from base table throughput. A throttled GSI throttles your base table writes. Set CloudWatch alarms on WriteThrottleEvents for every GSI.

Use eventually consistent reads by default. Strongly consistent reads cost twice as much and concentrate load on the partition leader. Only use strong consistency when your application genuinely requires read-after-write guarantees.

Keep items small. Every additional KB in item size multiplies your RCU and WCU consumption. Use short attribute names for high-throughput tables. Store large values in S3.

Plan for the 400 KB limit. Applications that store growing lists or embedded arrays in a single item will hit the 400 KB wall. Design your data model with item growth in mind; use a one-to-many pattern with separate items instead of unbounded lists within a single item.

Test your DR story with global tables. The October 2025 US-EAST-1 outage proved that single-region deployments have a regional blast radius. If DynamoDB availability matters to your business, deploy global tables and validate that your application actually fails over correctly.

Additional Resources

Let's Build Something!

I help teams ship cloud infrastructure that actually works at scale. Whether you're modernizing a legacy platform, designing a multi-region architecture from scratch, or figuring out how AI fits into your engineering workflow, I've seen your problem before. Let me help.

Currently taking on select consulting engagements through Vantalect.