AWS DynamoDB: An Architecture Deep-Dive

February 24, 2026 at 00:00AWS Architecture Databases Serverless

About the author: I'm Charles Sieg, a cloud architect and platform engineer who builds apps, services, and infrastructure for Fortune 1000 clients through Vantalect. If your organization is rethinking its software strategy in the age of AI-assisted engineering, let's talk.

DynamoDB sits at the center of more AWS architectures than any other database service. I've used it for everything from mobile backends handling millions of daily active users to event-sourced systems processing tens of thousands of writes per second. Most teams treat it as a simple key-value store, plug it in, and move on. That works until they hit a hot partition at 3 AM, discover their GSI is throttling independently of the base table, or realize their on-demand table costs three times what provisioned capacity would have. After years of running DynamoDB at scale, I've accumulated enough operational scars to fill this reference. Patterns, trade-offs, cost traps, and the internal mechanics that explain why DynamoDB behaves the way it does.

This is an architecture reference for engineers who already know the basics. If you need a getting-started tutorial, AWS has plenty. What follows covers how DynamoDB actually works under the hood, how to design for its strengths, and which failure modes will find you in production.

How DynamoDB Works Internally

Understanding DynamoDB's internals changes how you design for it. AWS has published enough about the architecture (particularly through the 2022 USENIX paper and re:Invent talks) to build a solid mental model.

Request Routing

Every DynamoDB API call hits a request router first. The router authenticates the request, resolves which partition holds the target data by hashing the partition key, and forwards the request to the correct storage node. The router maintains a partition map that tells it which storage node group owns which key range. When partitions split, the router's map updates accordingly.

This architecture means the partition key is the single most important design decision you make. Every read and every write goes through a hash function that determines the physical location. A poorly chosen partition key funnels traffic to a single storage node group, and no amount of provisioned capacity fixes that.

Storage and Replication

Each partition stores data in a B-tree structure on SSD-backed storage. Every partition maintains three replicas spread across three Availability Zones within the region. One replica serves as the leader; the other two are followers. DynamoDB uses a Multi-Paxos consensus protocol to coordinate writes.

The write path works as follows: the leader generates a write-ahead log (WAL) entry, sends it to both followers, and acknowledges the write once two of three replicas (a quorum) persist the log record. This gives you durability across two AZs before the client receives a success response. The leader then applies the write to its B-tree.

Strongly consistent reads go to the leader replica. Eventually consistent reads can go to any replica, which is why they cost half as much (the load spreads across all three replicas instead of concentrating on the leader).

Control Plane vs. Data Plane

DynamoDB maintains a clean separation between control plane and data plane:

Component	Responsibility	Operations	Availability Impact
Control Plane	Table management, configuration	CreateTable, UpdateTable, DeleteTable, DescribeTable	Configuration changes only
Data Plane	Read/write traffic	GetItem, PutItem, Query, Scan, BatchWrite	Direct application impact
Auto Admin	Partition management, splitting, health monitoring	Automatic (no API)	Background operations

The data plane operates independently of the control plane. Once a table is configured, it keeps serving traffic even if the control plane is degraded. This is the same pattern you see across AWS services (AWS Elastic Load Balancing: An Architecture Deep-Dive covers the same separation for ELB). During a control plane outage, your existing tables keep working; you just cannot create new tables or modify existing ones.

Auto Admin and Partition Management

The auto-admin subsystem handles partition health monitoring, splitting, and rebalancing. It runs continuously, watching for partitions that are approaching size limits (10 GB per partition) or throughput limits. When a partition needs to split, auto-admin selects a split point, creates new partitions, migrates data, and updates the request router's partition map.

This process is transparent, but understanding it explains several DynamoDB behaviors that surprise teams in production. Partition splits take time. During splits, the old partition continues serving traffic while data migrates. If you see brief latency spikes during sustained high-throughput writes, partition splits are often the cause.

Capacity Modes and Throughput

DynamoDB offers two capacity modes. Picking the wrong one is the most common cost mistake I see.

On-Demand Mode

On-demand mode charges per request. No capacity planning required. DynamoDB automatically scales to handle your traffic. The service tracks your recent peak and can instantly handle double that peak. If sustained traffic exceeds the doubled peak, the new level becomes the baseline for future scaling.

Metric	On-Demand Behavior
Scaling	Automatic, based on recent traffic peaks
Instant capacity	2x the previous traffic peak
New table default	4,000 WRU/s and 12,000 RRU/s
Throttling risk	Possible if traffic spikes beyond 2x previous peak instantly
Pricing (US East, Standard)	$1.25 per million WRU; $0.25 per million RRU

On-demand tables can still throttle. If your traffic jumps from near-zero to 50,000 writes per second without any ramp-up, DynamoDB has not observed enough traffic history to pre-allocate capacity. The 2x scaling rule requires a previous peak to double. For launch-day scenarios or scheduled batch jobs that spike from zero, pre-warm by gradually ramping traffic or temporarily switching to provisioned mode.

Provisioned Mode

Provisioned mode lets you specify read capacity units (RCUs) and write capacity units (WCUs). You pay for the capacity you provision, whether you use it or not.

Metric	Provisioned Behavior
Scaling	Manual or auto-scaling (CloudWatch-driven)
Cost (US East, Standard)	$0.00065/WCU/hour; $0.00013/RCU/hour
Reserved capacity	Up to 77% savings with 1-year or 3-year commitments
Burst capacity	300 seconds of unused capacity banked
Decrease limits	4 per day initially, up to 27 per 24-hour period

Auto-scaling with provisioned mode is the cost-optimal choice for workloads with predictable patterns. Set a target utilization of 70%, configure reasonable min/max bounds, and let CloudWatch Alarms trigger scaling actions. The catch: auto-scaling reacts to CloudWatch metrics, which have a 1-2 minute delay. If your traffic spikes faster than that, you will see throttling before auto-scaling catches up.

Choosing Between Modes

flowchart TD
    A[New DynamoDB
Table] --> B{Traffic pattern
predictable?}
    B -->|Yes| C{Steady baseline
with known peaks?}
    B -->|No| D[On-Demand Mode]
    C -->|Yes| E[Provisioned +
Auto-Scaling]
    C -->|No| F{Budget
constrained?}
    F -->|Yes| G[Provisioned +
Reserved Capacity]
    F -->|No| D
    E --> H{Cost optimization
priority?}
    H -->|Yes| I[Add Reserved
Capacity for baseline]
    H -->|No| J[Provisioned with
Auto-Scaling only]

Capacity mode decision tree

The rule I follow: start with on-demand for new workloads where you do not know the traffic pattern. After 2-4 weeks of production data, evaluate whether provisioned mode with auto-scaling would cost less. For most steady-state workloads, provisioned mode saves 40-60% compared to on-demand.

Read and Write Unit Mechanics

Understanding how RCUs and WCUs map to actual operations prevents surprise throttling:

Operation	Unit Cost	Size Unit
Strongly consistent read	1 RCU	Per 4 KB
Eventually consistent read	0.5 RCU	Per 4 KB
Transactional read	2 RCU	Per 4 KB
Standard write	1 WCU	Per 1 KB
Transactional write	2 WCU	Per 1 KB

Items larger than these thresholds consume proportionally more units, rounded up. A 6 KB strongly consistent read costs 2 RCUs. A 3.5 KB write costs 4 WCUs. Keeping items small directly reduces throughput consumption.

Partition Mechanics and Scaling

Partitions are the fundamental scaling unit. Every performance problem I have debugged in DynamoDB traces back to partition behavior.

Partition Throughput Limits

Each partition supports a fixed maximum throughput:

Resource	Limit per Partition
Read throughput	3,000 RCU
Write throughput	1,000 WCU
Storage	10 GB

Your table's total throughput is the sum of all partition throughputs. A table with 10 partitions supports up to 30,000 RCUs and 10,000 WCUs in aggregate, but only if traffic distributes evenly across partitions. If 80% of your reads target items in a single partition, that partition's 3,000 RCU limit becomes your effective ceiling regardless of how much capacity you provisioned at the table level.

Split for Heat

When DynamoDB detects a partition receiving sustained high throughput, it automatically splits that partition into two. The split point is chosen based on recent traffic patterns to distribute load evenly between the new partitions. This doubles the available throughput for that key range at no additional cost.

Split for heat cannot help in every scenario:

Scenario	Split for Heat Effective?	Explanation
Many items, distributed hot keys	Yes	Split distributes items across new partitions
Single hot item	No	The item lives in one partition; splitting does not help
LSI present on table	Limited	Cannot split within an item collection
Ever-increasing sort key	Limited	New writes always target the latest partition

The single hot item problem is the most common production issue I encounter. A counter, a leaderboard, a "latest" record: any pattern where the majority of writes target one partition key cannot benefit from split for heat. The solution is application-level sharding: append a random suffix to the partition key and aggregate on read.

Adaptive Capacity

Adaptive capacity complements split for heat. It automatically reallocates unused throughput from cold partitions to hot partitions. If partition A uses 200 WCUs of its 1,000 WCU allocation while partition B needs 1,500 WCUs, adaptive capacity shifts some of partition A's unused capacity to partition B.

This happens automatically with no configuration required. Adaptive capacity reduces, but does not eliminate, throttling from uneven access patterns. It works best when the overall table has enough provisioned capacity; it simply redistributes it more effectively.

Partition Key Design

Good partition key design is the single highest-leverage architecture decision for DynamoDB:

Pattern	Example	Distribution
High cardinality, uniform access	UserID, OrderID, SessionID	Excellent
High cardinality, skewed access	CustomerID (some customers 1000x more active)	Needs write sharding
Low cardinality	Status (ACTIVE/INACTIVE), Region (5 values)	Poor; hot partitions guaranteed
Time-based	Date, Hour	Poor; all writes target current period

For skewed access patterns, use write sharding: append a random number (0-N) to the partition key and scatter-gather on reads. For time-series data, combine the timestamp with a high-cardinality attribute (device ID, sensor ID) as the partition key.

Secondary Indexes

DynamoDB provides two types of secondary indexes, and choosing wrong creates problems that are expensive to fix.

Global Secondary Indexes (GSI)

A GSI creates a separate, fully independent partition structure with its own partition key and sort key. DynamoDB asynchronously replicates items from the base table to each GSI.

Key architecture details:

GSIs have their own throughput capacity, separate from the base table
GSI writes are eventually consistent (the replication is asynchronous)
A throttled GSI back-pressures writes to the base table
Maximum 20 GSIs per table (adjustable)
GSIs can project a subset of attributes, reducing storage and throughput costs

Note

GSI throttling is the most common surprise in production DynamoDB. If your GSI cannot keep up with base table write throughput, DynamoDB throttles writes to the base table itself. Always provision GSI write capacity at or above the base table's write capacity.

Local Secondary Indexes (LSI)

An LSI shares the partition key with the base table but uses a different sort key. All items with the same partition key (across the base table and all LSIs) form an item collection.

Characteristic	GSI	LSI
Partition key	Any attribute	Same as base table
Sort key	Any attribute	Different from base table
Throughput	Independent capacity	Shares base table capacity
Consistency	Eventually consistent only	Eventually or strongly consistent
When to create	Anytime	Table creation only
Item collection limit	None	10 GB per partition key value
Maximum per table	20	5

The 10 GB item collection limit on LSIs is a hard constraint. If any partition key's total data (base table items plus all LSI entries) exceeds 10 GB, writes for that partition key fail. I have seen this kill production systems when a high-volume entity (a busy tenant in a multi-tenant system) crosses the threshold with no warning. If you use LSIs, monitor the ItemCollectionSizeLimitExceeded metric.

Single-Table Design

Single-table design stores multiple entity types in one DynamoDB table, using composite partition keys and sort keys to model relationships. The pattern gained popularity through Rick Houlihan's re:Invent talks and Alex DeBrie's "The DynamoDB Book."

Advantages: all related entities co-located in the same partitions, enabling efficient queries across entity types with a single Query operation. No joins needed.

Drawbacks: complex key design, harder to reason about, GSI overloading can make the table opaque to new team members. With the November 2025 launch of multi-attribute composite keys for GSIs (up to four attributes per key), some of the synthetic key concatenation complexity has been reduced.

My recommendation: use single-table design for access-pattern-heavy workloads where you know all query patterns upfront. Use multi-table design when your access patterns evolve frequently or when different entity types have vastly different throughput characteristics.

Global Tables

Global tables replicate a DynamoDB table across multiple AWS regions with sub-second replication latency.

Replication Architecture

Each region maintains a full, independent replica. Writes to any replica propagate to all other replicas asynchronously. DynamoDB uses last-writer-wins conflict resolution based on timestamps for concurrent writes to the same item in different regions.

Consistency Models

Global tables now support two consistency models:

Model	Abbreviation	Behavior	Use Case
Multi-Region Eventually Consistent	MREC	Writes replicate asynchronously; brief inconsistency window	Most applications; highest availability
Multi-Region Strongly Consistent	MRSC	Reads guaranteed to reflect all prior writes globally	Financial transactions, inventory systems

MRSC is a significant addition (launched at re:Invent 2024). Previously, global tables only supported eventual consistency, which made them unsuitable for workloads requiring guaranteed read-after-write consistency across regions. MRSC uses a coordination protocol across regions, which increases write latency (cross-region round trip) but guarantees consistency.

Multi-Account Global Tables

As of 2025, DynamoDB supports multi-account global tables. You can replicate table data across different AWS accounts and regions, adding account-level isolation. This is valuable for organizations using separate accounts for production, staging, and disaster recovery, or for regulated industries requiring strict account boundaries.

Global Tables and DAX

A critical operational gotcha: writes that arrive at a replica via global table replication bypass DAX. The DAX cache does not update when a replication write occurs. Your cache will serve stale data until the TTL expires. If you use both global tables and DAX, set aggressive TTLs on DAX and accept that reads may lag behind cross-region writes.

DynamoDB Streams and Change Data Capture

DynamoDB Streams captures a time-ordered sequence of item-level modifications. Every write (put, update, delete) generates a stream record.

Stream Architecture

Stream records are organized into shards (similar to Kinesis shards). Each shard has a parent-child relationship that reflects partition splits. Stream records are available for 24 hours. You configure stream view type at the table level:

View Type	Contents	Use Case
KEYS_ONLY	Partition key and sort key only	Triggering downstream by key
NEW_IMAGE	Complete item after modification	Replication, search index updates
OLD_IMAGE	Complete item before modification	Audit trails, rollback
NEWANDOLD_IMAGES	Both before and after	Change comparison, CDC pipelines

Integration Patterns

DynamoDB Streams integrates directly with Lambda for event-driven architectures. Each stream shard can support up to 2 simultaneous readers (1 reader for global tables to avoid throttling). Common patterns:

Search indexing: Stream changes to OpenSearch (AWS OpenSearch Service: An Architecture Deep-Dive)
Cross-service replication: Fan out changes via SNS/SQS (AWS Event-Driven Messaging: SNS, SQS, EventBridge, and Beyond)
Analytics pipeline: Zero-ETL integration with Redshift and SageMaker Lakehouse (launched January 2025)
Materialized views: Build aggregated views in another table

Kinesis Data Streams Integration

As an alternative to DynamoDB Streams, you can route change data capture records to a Kinesis Data Stream. This gives you longer retention (up to 365 days vs. 24 hours), more consumers per shard, and integration with the broader Kinesis ecosystem. The trade-off is additional cost for the Kinesis stream.

DynamoDB Accelerator (DAX)

DAX is a fully managed, in-memory cache that sits in front of DynamoDB. It provides microsecond read latency for cached items.

DAX Architecture

A DAX cluster runs within your VPC with one primary node and up to 10 read replica nodes. The primary handles writes; read replicas serve read traffic. DAX maintains two caches:

Cache	Stores	Populated By	TTL Default
Item cache	Individual items by primary key	GetItem, BatchGetItem	5 minutes
Query cache	Full result sets	Query, Scan	5 minutes

DAX is a write-through cache: writes go through DAX to DynamoDB, and the item cache updates immediately. The query cache does not invalidate on writes; it relies purely on TTL expiration.

When to Use DAX (and When Not To)

Scenario	DAX Recommended?	Reason
Read-heavy, repeated key access	Yes	Microsecond latency, reduced RCU consumption
Write-heavy workloads	No	DAX adds latency to writes; minimal benefit
Strongly consistent reads required	No	DAX serves eventually consistent data only
Infrequent, unique key access	No	Cache miss rate too high; adds latency and cost
Global tables	Use with caution	Replication writes bypass DAX; stale cache risk

DAX Pricing

DAX instance pricing varies by node type:

Instance Type	vCPUs	Memory	Cost/Hour (US East)
dax.t3.small	2	2 GB	~$0.04
dax.r5.large	2	16 GB	~$0.29
dax.r5.xlarge	4	32 GB	~$0.58
dax.r5.8xlarge	32	256 GB	~$4.64

A production DAX cluster (3 nodes across AZs using r5.large) costs approximately $630/month. Compare that against the RCU cost it replaces to determine ROI.

Pricing and Cost Optimization

DynamoDB pricing catches teams off guard because the cost model is fundamentally different from traditional databases. You pay for throughput, storage, and features independently.

Cost Breakdown

Component	On-Demand (US East)	Provisioned (US East)
Writes	$1.25/million WRU	$0.00065/WCU/hour (~$0.47/WCU/month)
Reads	$0.25/million RRU	$0.00013/RCU/hour (~$0.09/RCU/month)
Storage (Standard)	$0.25/GB/month	$0.25/GB/month
Storage (Standard-IA)	$0.10/GB/month	$0.10/GB/month
Backups (warm)	$0.10/GB/month	$0.10/GB/month
Backups (cold)	$0.03/GB/month	$0.03/GB/month
Streams reads	$0.02/100K read requests	$0.02/100K read requests
Global table rWRUs	$1.875/million rWRU	N/A (uses CRR WCU)

Cost Optimization Strategies

1. Right-size capacity mode. On-demand costs 5-7x more per unit than provisioned capacity for steady workloads. Run on-demand for the first month to establish baselines, then switch to provisioned with auto-scaling.

2. Use reserved capacity. For predictable baseline throughput, reserved capacity (1-year or 3-year term) saves up to 77% compared to on-demand pricing.

3. Database Savings Plans. Launched December 2025, these plans offer committed-use discounts across DynamoDB (including on-demand tables), RDS, and other managed databases. Unlike reserved capacity, they apply automatically across accounts and regions.

4. Standard-IA table class. For tables with infrequent access (archival data, configuration stores), Standard-IA cuts storage costs by 60% ($0.10/GB vs. $0.25/GB). Read and write unit costs are approximately 25% higher, so this only saves money when storage dominates your bill.

5. Minimize item size. Compress large attribute values. Store large blobs in S3 and keep only a reference in DynamoDB. Every KB matters because it directly multiplies your RCU/WCU consumption.

6. Project only needed attributes in GSIs. Full attribute projection on GSIs doubles storage and write costs. Use KEYS_ONLY or INCLUDE with specific attributes.

7. Use eventually consistent reads. If your application tolerates it, eventually consistent reads cost half of strongly consistent reads. For many use cases (product catalogs, user profiles displayed on dashboards), eventual consistency is perfectly acceptable.

Failure Modes and Operational Lessons

Throttling

Throttling is the most common operational issue. DynamoDB returns ProvisionedThroughputExceededException when a partition exceeds its throughput limit. The AWS SDKs implement exponential backoff with jitter by default, but sustained throttling degrades application performance.

Root causes I see most frequently:

Cause	Symptom	Fix
Hot partition key	Throttling despite low table-level utilization	Redesign partition key; add write sharding
GSI throttling	Base table writes rejected	Increase GSI capacity; review GSI key design
On-demand cold start	Throttling on new or idle table receiving sudden traffic	Pre-warm with gradual ramp; consider provisioned mode
Under-provisioned auto-scaling	Brief throttling during rapid scale-up	Lower target utilization; increase minimum capacity
Scan operations	Broad throttling across partitions	Replace Scans with Queries; use parallel scan with rate limiting

The October 2025 Outage

On October 19-20, 2025, a race condition in an internal DynamoDB microservice that manages DNS records for regional cells caused a 3-hour DynamoDB outage in US-EAST-1. The failure cascaded: because EC2 instance creation depends on DynamoDB for metadata, EC2 could not launch new instances for an additional 12 hours. Over 17 million outage reports across services including Snapchat, Roblox, Reddit, and Venmo.

Lessons from that incident:

Multi-region is real DR. Single-region DynamoDB (even with multi-AZ replication) does not protect against regional failures. Global tables provide genuine regional independence.
Understand cascading dependencies. Your application depends on DynamoDB. Other AWS services also depend on DynamoDB internally. A DynamoDB outage affects services you did not realize were coupled.
Pre-provision, do not scale-on-demand during recovery. After the DynamoDB outage resolved, EC2 could not scale because instance creation was still impaired. If your recovery plan involves launching new compute, and the outage affects compute provisioning, your plan fails.

Item Size Limits

The 400 KB item size limit is hard. DynamoDB rejects any write that would create an item larger than 400 KB. This includes all attribute names and values. Long attribute names consume your item size budget. Use short, abbreviated attribute names for high-volume tables (store the human-readable mapping in your application code).

Transaction Limits

DynamoDB transactions support up to 100 items per transaction, with a total transaction size limit of 4 MB. Transactions cost 2x the standard read/write units. Design your data model to minimize transaction scope; if you regularly need to update more than 100 items atomically, DynamoDB may not be the right fit for that specific operation.

Service Quotas Quick Reference

Quota	Default	Adjustable
Tables per account per region	2,500	Yes (max 10,000)
Item size	400 KB	No
Partition key length	2,048 bytes	No
Sort key length	1,024 bytes	No
GSIs per table	20	Yes
LSIs per table	5	No
LSI item collection size	10 GB	No
Partition throughput (read)	3,000 RCU	No
Partition throughput (write)	1,000 WCU	No
Partition storage	10 GB	No
Account read throughput per region	80,000 RCU	Yes
Account write throughput per region	80,000 WCU	Yes
Table throughput (on-demand)	40,000 RRU/WRU	Yes
Projected attributes across all indexes	100	No
Concurrent backups	50	Yes
Stream readers per shard	2	No
Transaction items	100	No
Transaction size	4 MB	No

DynamoDB vs. Alternatives

Choosing DynamoDB requires understanding where it fits and where it does not.

flowchart TD
    A[Data Storage
Decision] --> B{Need relational
queries, joins,
complex SQL?}
    B -->|Yes| C[Aurora / RDS]
    B -->|No| D{Access patterns
known and stable?}
    D -->|Yes| E{Need single-digit
ms latency at
any scale?}
    D -->|No| F{Need flexible
queries on
document data?}
    E -->|Yes| G[DynamoDB]
    E -->|No| H{Need full-text
search or
analytics?}
    F -->|Yes| I[DocumentDB /
MongoDB]
    F -->|No| G
    H -->|Yes| J[OpenSearch]
    H -->|No| G

Database selection decision tree

Criteria	DynamoDB	Aurora (MySQL/PostgreSQL)	Cassandra (Self-Managed)
Operational burden	Zero (fully managed)	Low (managed, some tuning)	High (cluster ops, compaction, repairs)
Scaling model	Automatic partitioning	Vertical + read replicas	Horizontal (add nodes)
Consistency	Eventual or strong (per-read)	Strong (ACID)	Tunable per query
Query flexibility	Partition key + sort key + GSIs	Full SQL	CQL (SQL-like, limited joins)
Cost at scale	High for write-heavy workloads	Moderate (compute-based)	Low (commodity hardware)
Multi-region	Global tables (managed)	Aurora Global Database	Built-in (manual config)
Vendor lock-in	High (proprietary API)	Moderate (standard SQL)	None (open source)

DynamoDB excels when you need a fully managed, infinitely scalable database with predictable single-digit millisecond latency and you can design your access patterns around partition keys. It struggles when you need ad-hoc queries, complex aggregations, or when write volumes make the per-request cost model prohibitive.

Key Patterns

After years of building on DynamoDB, these are the patterns that consistently matter:

Design for partitions first. Every performance characteristic flows from partition key design. Invest time upfront modeling your access patterns and validating that your partition key distributes traffic evenly. Fixing a partition key in production means migrating data.

Start on-demand, migrate to provisioned. On-demand removes the risk of under-provisioning during early development and launch. After you have production traffic data, provisioned mode with auto-scaling and reserved capacity typically saves 50-70%.

Monitor GSI throttling independently. GSI throughput is separate from base table throughput. A throttled GSI throttles your base table writes. Set CloudWatch alarms on WriteThrottleEvents for every GSI.

Use eventually consistent reads by default. Strongly consistent reads cost twice as much and concentrate load on the partition leader. Only use strong consistency when your application genuinely requires read-after-write guarantees.

Keep items small. Every additional KB in item size multiplies your RCU and WCU consumption. Use short attribute names for high-throughput tables. Store large values in S3.

Plan for the 400 KB limit. Applications that store growing lists or embedded arrays in a single item will hit the 400 KB wall. Design your data model with item growth in mind; use a one-to-many pattern with separate items instead of unbounded lists within a single item.

Test your DR story with global tables. The October 2025 US-EAST-1 outage proved that single-region deployments have a regional blast radius. If DynamoDB availability matters to your business, deploy global tables and validate that your application actually fails over correctly.

Additional Resources

Let's Build Something!

I help teams ship cloud infrastructure that actually works at scale. Whether you're modernizing a legacy platform, designing a multi-region architecture from scratch, or figuring out how AI fits into your engineering workflow, I've seen your problem before. Let me help.

Currently taking on select consulting engagements through Vantalect.

Get in Touch View Background LinkedIn