Skip to main content

MySQL vs. PostgreSQL on Aurora: An Architecture Deep Dive

AWSArchitectureDatabases

About the author: I'm Charles Sieg, a cloud architect and platform engineer who builds apps, services, and infrastructure for Fortune 1000 clients through Vantalect. If your organization is rethinking its software strategy in the age of AI-assisted engineering, let's talk.

Every relational database argument eventually becomes a religion debate. I have no interest in that. What I care about is how these engines behave under load, where they break, and what happens when you deploy them on Aurora's distributed storage layer. After running both MySQL and PostgreSQL in production on Aurora across dozens of services, the differences that actually matter have little to do with SQL syntax preferences. They live in storage engine internals, MVCC implementation, connection handling, and the operational failure modes that surface at 3 AM when your on-call phone goes off.

This is a reference for engineers choosing between Aurora MySQL and Aurora PostgreSQL, or evaluating whether to migrate from one to the other. I will cover the architectural differences between the two engines, how Aurora's storage layer changes the calculus, and where each one earns its place.

Storage Engine Architecture

The storage engine is where MySQL and PostgreSQL diverge most fundamentally. Everything downstream, from MVCC behavior to vacuum requirements to replication mechanics, traces back to how each engine organizes data on disk.

InnoDB: Clustered Index Storage

MySQL's InnoDB engine uses a clustered index architecture. The primary key index IS the table. Row data is physically stored in primary key order within B+ tree leaf pages. Every secondary index stores the primary key value as its pointer back to the row data, meaning secondary index lookups require two traversals: one through the secondary index to find the primary key, then one through the clustered index to find the row.

This design has real consequences. Choosing the wrong primary key (a UUID, a random string, anything that causes page splits on insert) degrades write performance because InnoDB must physically reorganize pages to maintain sorted order. I have seen tables with UUID primary keys run 3-4x slower on bulk inserts compared to auto-increment integers. The fix is simple: use sequential primary keys, or at least UUIDv7 which is time-ordered.

InnoDB uses 16 KB pages by default. Each page holds multiple rows, and the buffer pool caches pages in memory. The buffer pool is InnoDB's single most important configuration parameter. Size it wrong and everything suffers.

PostgreSQL: Heap Storage

PostgreSQL takes the opposite approach. Tables are stored as unordered heaps. Rows land wherever there is space. The primary key is just another index, no different from any secondary index. All indexes point to a physical tuple identifier (ctid) that locates the row on disk.

The advantage: inserts are fast because PostgreSQL appends rows without reorganizing anything. Primary key choice does not affect insert performance the way it does in InnoDB. The disadvantage: every index lookup is a two-step process (index scan plus heap fetch), and there is no way to do an index-only range scan on the primary key without touching the heap. PostgreSQL 13 added deduplication to B-tree indexes to mitigate some of the overhead, but the fundamental architecture remains heap-based.

PostgreSQL uses 8 KB pages by default, half the size of InnoDB's. Smaller pages mean less wasted space for small rows, but more I/O operations for large sequential scans. On Aurora, this page size difference affects how much data each read I/O request returns.

Storage Architecture Comparison

CharacteristicMySQL (InnoDB)PostgreSQL
Table storageClustered index (B+ tree)Heap (unordered)
Default page size16 KB8 KB
Primary key storageData stored in PK leaf pagesSeparate heap; PK is a regular index
Secondary index pointerStores primary key valueStores physical ctid
Insert performance sensitivityHigh (depends on PK ordering)Low (append anywhere)
Index-only scans on PKNative (data is in the index)Requires visibility map check
Row overhead per tuple~20 bytes~23 bytes (xmin, xmax, ctid, etc.)

MVCC: Two Fundamentally Different Approaches

Multi-Version Concurrency Control lets readers and writers operate without blocking each other. Both engines implement MVCC. The mechanisms could not be more different.

MySQL: Undo Log Versioning

InnoDB maintains a single current version of each row in the clustered index. When a transaction updates a row, InnoDB copies the old version to a separate undo log (stored in the system tablespace or dedicated undo tablespaces) and writes the new version in place. The undo log forms a linked list of previous versions. Readers needing an older snapshot follow the chain backward from the current version.

This means the main table stays compact. Old versions live in a separate structure that InnoDB's purge threads clean up automatically once no active transaction references them. The trade-off: read queries that need old snapshots pay a reconstruction cost. Each hop backward through the undo chain requires reading and applying an undo record to reconstruct the row as it appeared at the requested point in time. Long-running read transactions with heavy concurrent writes can get expensive.

PostgreSQL: In-Place Tuple Versioning

PostgreSQL does something conceptually simpler and operationally messier. When a transaction updates a row, PostgreSQL writes a completely new copy of the entire row into the same table. The old tuple stays in place with its xmax field set to the updating transaction's ID. The new tuple gets the updating transaction's ID in its xmin field. Both versions coexist in the heap until cleanup occurs.

Readers determine which version to see by checking xmin and xmax against the current transaction's snapshot. No reconstruction needed; both versions are complete rows sitting in the table. Reads are fast. The problem: dead tuples accumulate in the table itself, bloating it over time.

Cleanup: VACUUM vs. Purge

AspectMySQL (InnoDB Purge)PostgreSQL (VACUUM)
What gets cleanedUndo log entriesDead tuples in the heap
Where bloat accumulatesSeparate undo tablespaceMain table and indexes
Cleanup mechanismBackground purge threadsVACUUM process (auto or manual)
Table size impactMinimal; table stays compactSignificant; table grows without VACUUM
Index maintenanceIndexes unaffectedDead index entries must be cleaned too
Failure modeUndo tablespace grows if purge falls behindTable bloat spirals; queries slow down
Recovery from neglectPurge catches up once long transactions endVACUUM FULL required; locks the table

I have seen PostgreSQL tables balloon to 10x their expected size because autovacuum could not keep up with the write rate. The fix usually involves tuning autovacuumvacuumcostdelay and autovacuumvacuumscalefactor, or running manual VACUUM during maintenance windows. On Aurora PostgreSQL, this problem persists because Aurora uses the same PostgreSQL MVCC engine. The distributed storage layer does not save you from vacuum debt.

Concurrency and Connection Handling

Threads vs. Processes

MySQL and PostgreSQL use fundamentally different process models, and this difference drives operational decisions around connection management.

MySQL spawns a thread per connection within a single process. Threads share memory, making context switches cheap. A well-tuned MySQL instance handles thousands of concurrent connections without connection pooling. I have run Aurora MySQL instances with 2,000+ connections directly from application servers without issues.

PostgreSQL forks a new operating system process for each connection. Each process gets its own memory space. At 500+ connections, the overhead from context switching and memory consumption becomes measurable. At 1,000+, performance degrades visibly. Connection pooling through PgBouncer or a similar tool is mandatory for any PostgreSQL deployment that handles meaningful traffic.

CharacteristicMySQLPostgreSQL
Connection modelThread per connectionProcess per connection
Memory overhead per connection~256 KB - 1 MB~5-10 MB
Practical limit without pooler3,000-5,000300-500
Connection poolingOptional (helpful at scale)Mandatory for production
Recommended poolerProxySQL, MySQL RouterPgBouncer, Pgpool-II
Aurora integrationRDS Proxy availableRDS Proxy available

Deadlock Behavior

Both databases detect and resolve deadlocks, but with different characteristics. MySQL's InnoDB performs immediate deadlock detection for most scenarios, rolling back the transaction with the fewest row locks. PostgreSQL checks for deadlocks periodically (every deadlock_timeout, default 1 second), which means short-lived deadlocks can resolve faster in MySQL.

In practice, I see more application-level deadlock issues with MySQL because InnoDB's gap locking (used in REPEATABLE READ isolation) locks ranges of the index rather than individual rows. PostgreSQL's MVCC avoids gap locks entirely, which makes concurrent inserts into the same index range less likely to conflict. If your workload involves heavy concurrent inserts into tables with secondary indexes, PostgreSQL handles contention more gracefully.

Replication Architecture

MySQL: Binary Log Replication

MySQL replication centers on the binary log (binlog). The primary records every data-modifying operation in the binlog, and replicas read and replay those events. Three binlog formats exist:

FormatWhat Gets LoggedProsCons
Statement-based (SBR)SQL statementsCompact log, human-readableNon-deterministic functions break replication
Row-based (RBR)Actual row changesDeterministic, always correctLarger log volume
MixedStatements when safe, rows otherwiseBest of both worldsComplexity in debugging

Aurora MySQL uses the same binlog infrastructure but adds its own replication through the shared storage layer. Within a single Aurora cluster, replicas read from the same distributed storage volume. No binlog replay needed. Replication lag within a cluster is typically under 20 milliseconds.

PostgreSQL: WAL-Based Replication

PostgreSQL replication uses the Write-Ahead Log (WAL), which records physical changes at the disk block level. Physical (streaming) replication ships raw WAL bytes to standby servers, which replay them to produce an identical copy.

PostgreSQL also supports logical replication (introduced in version 10), which decodes WAL into logical change events. Logical replication enables cross-version replication, selective table replication, and data transformation during replication. MySQL's binlog replication is inherently logical, so MySQL had this capability first, but PostgreSQL's logical decoding infrastructure has matured into a more flexible system.

Aurora PostgreSQL uses the shared storage layer for intra-cluster replication (same as Aurora MySQL). For cross-region replication, Aurora Global Database ships storage-level changes, which is faster than WAL shipping.

Query Processing and Extensibility

Query Optimizer

MySQL's optimizer is simpler and more predictable. It uses a cost-based optimizer that evaluates a limited set of join strategies (primarily nested loop joins with index lookups). This works well for OLTP queries with selective indexes. It struggles with complex analytical queries involving many joins, subqueries, or window functions.

PostgreSQL's optimizer is more sophisticated. It supports hash joins, merge joins, nested loop joins, and can evaluate a much larger plan space. For analytical queries with multi-way joins, PostgreSQL consistently produces better execution plans. The trade-off: plan time is longer, and the optimizer occasionally makes poor choices on table statistics that are stale. Running ANALYZE regularly matters more in PostgreSQL.

Data Types and Extensions

This is where PostgreSQL pulls ahead decisively.

FeatureMySQLPostgreSQL
JSON supportJSON type since 5.7; limited operatorsJSONB since 9.4; GIN indexable, rich operators
Full-text searchBasic FULLTEXT indexestsvector/tsquery with ranking, stemming, dictionaries
GeospatialLimited spatial typesPostGIS extension (industry standard)
Array typesNo native supportNative array columns with operators
Range typesNo native supportNative range types (int, timestamp, etc.)
Custom typesNoCREATE TYPE with full operator support
Extension ecosystemPlugins (limited)1,000+ extensions via PGXN
Partial indexesNoCREATE INDEX WHERE condition
Expression indexesNo (functional indexes since 8.0.13)CREATE INDEX ON expression

On Aurora, both engines support the same extensions they support on standard RDS. Aurora PostgreSQL supports PostGIS, pgstatstatements, pgvector (for embeddings), and the full extension catalog. Aurora MySQL supports the standard MySQL plugin set.

If your application needs JSONB querying, geospatial analysis, full-text search without Elasticsearch, or vector similarity search, PostgreSQL is the clear choice. For straightforward OLTP with simple data types, MySQL's simpler architecture often translates to lower operational overhead.

Aurora: The Cloud-Native Storage Layer

Aurora replaces the local storage engine's I/O path with a distributed, log-structured storage service. Understanding this architecture is critical because it changes several assumptions about how MySQL and PostgreSQL behave.

Distributed Storage and Quorum Writes

Aurora's storage volume spans three Availability Zones. Data is divided into 10 GB segments called Protection Groups. Each Protection Group is replicated six ways: two copies per AZ.

Writes use a 4-of-6 quorum. Aurora sends redo log records (not full data pages) to all six storage nodes and considers the write durable once four acknowledge. Reads use a 3-of-6 quorum, which guarantees at least one node has the latest committed data.

flowchart TB
  W[Aurora Writer Instance] -->|Redo log records| S1[Storage Node 1
AZ-a] W -->|Redo log records| S2[Storage Node 2
AZ-a] W -->|Redo log records| S3[Storage Node 3
AZ-b] W -->|Redo log records| S4[Storage Node 4
AZ-b] W -->|Redo log records| S5[Storage Node 5
AZ-c] W -->|Redo log records| S6[Storage Node 6
AZ-c] S1 -.->|4-of-6 quorum| ACK[Write Acknowledged] S3 -.->|4-of-6 quorum| ACK S4 -.->|4-of-6 quorum| ACK S5 -.->|4-of-6 quorum| ACK
Aurora distributed storage with quorum writes

The critical insight: Aurora only ships redo log records to storage, not full data pages. This dramatically reduces network I/O compared to traditional MySQL or PostgreSQL, where the database writes full 16 KB or 8 KB pages over the network to EBS. Aurora's redo records are typically tens to hundreds of bytes. The storage nodes reconstruct data pages from the redo stream.

Aurora MySQL vs. Aurora PostgreSQL Internals

Aurora uses the same distributed storage layer for both engines, but the engines interact with it differently because of their underlying architectures.

AspectAurora MySQLAurora PostgreSQL
Page size sent to storage16 KB (InnoDB)8 KB (PostgreSQL)
Redo log formatInnoDB redo log recordsPostgreSQL WAL records
MVCC cleanupPurge threads (undo log)Autovacuum (dead tuples)
Buffer pool managementInnoDB buffer poolPostgreSQL shared_buffers
Connection modelThreadedProcess-based
Max table size64 TiB32 TiB
Max storage per cluster128 TiB128 TiB
Replication lag (intra-cluster)< 20 ms typical< 20 ms typical
Limitless Database supportNoYes (horizontal write scaling)

Aurora PostgreSQL has one significant advantage: Limitless Database. This feature horizontally shards data across multiple storage groups, enabling write throughput beyond what a single instance can provide. Aurora MySQL has no equivalent. For write-heavy workloads that will exceed a single writer's capacity, Aurora PostgreSQL with Limitless is the only Aurora-native option. See AWS Aurora: Getting Close to Multi-Region Active/Active for the full multi-region picture.

Aurora Deployment Patterns

Provisioned vs. Serverless v2

flowchart TD
  A[New Aurora Cluster] --> B{Workload
pattern?} B -->|Steady, predictable| C{Budget for
Reserved Instances?} C -->|Yes| D[Provisioned + RI
Up to 72% savings] C -->|No| E[Provisioned On-Demand] B -->|Variable or spiky| F{Idle periods
longer than 15 min?} F -->|Yes| G[Serverless v2
Scales to 0.5 ACU] F -->|No| H{Peak exceeds
steady by 3x+?} H -->|Yes| G H -->|No| E B -->|Dev/test or
infrequent use| I[Serverless v2
Min 0.5 ACU]
Aurora deployment decision tree
Deployment ModelBest ForCost ModelScalingMin Cost/Hour
Provisioned On-DemandSteady workloads, no commitmentPer-instance-hourManual (change instance class)~$0.073 (db.r6g.large)
Provisioned ReservedSteady workloads, 1-3 year commitmentUpfront + reduced hourlyManual~$0.021 (db.r6g.large, 3yr all-upfront)
Serverless v2Variable/spiky workloadsPer ACU-hour ($0.12)Auto (0.5-128 ACU, milliseconds)~$0.06 (0.5 ACU)

The break-even point between Serverless v2 and provisioned depends on utilization. At sustained 70%+ utilization, provisioned with Reserved Instances wins. Below 40% average utilization, Serverless v2 is cheaper. The gap between 40-70% depends on how spiky the workload is.

One gotcha I hit in production: RDS Proxy paired with Aurora Serverless v2 has a minimum charge of 8 ACUs regardless of actual database load. If your Serverless v2 cluster scales down to 0.5 ACU but you are running RDS Proxy, you are paying for 8 ACUs of proxy capacity. For low-traffic applications, this negates most of the Serverless v2 cost savings.

Read Replica Architecture

Both Aurora MySQL and Aurora PostgreSQL support up to 15 read replicas within a cluster. All replicas share the same storage volume, so adding a replica does not require data copying. Replica lag is typically 10-20 milliseconds because replicas read directly from the shared storage layer rather than replaying replication logs.

Aurora auto-scales read replicas based on CPU or connection metrics through Aurora Auto Scaling. For read-heavy workloads, this provides elastic read capacity without manual intervention.

Global Database

Aurora Global Database replicates across regions using storage-level replication with typical lag under 1 second. Write forwarding (available for both engines) lets read replicas in secondary regions forward writes to the primary region, enabling single-writer, multi-reader global architectures. For the full analysis of multi-region patterns, see AWS Aurora: Getting Close to Multi-Region Active/Active.

Cost Architecture

Aurora pricing has multiple dimensions, and choosing the wrong configuration can double your bill.

Pricing Components

ComponentStandardI/O-Optimized
Instance/ACU hoursBase rate~30% higher base rate
I/O operations$0.20 per million reads, $0.20 per million writesIncluded (no I/O charges)
Storage$0.10/GB-month$0.225/GB-month
Backup storageFree up to cluster size; $0.021/GB-month beyondSame
Data transferStandard AWS ratesStandard AWS rates

I/O-Optimized pricing eliminates per-I/O charges in exchange for higher base rates. The break-even: if I/O costs exceed 25% of your total Aurora bill, I/O-Optimized saves money. Most write-heavy OLTP workloads benefit from I/O-Optimized. Read-heavy workloads with good cache hit ratios often stay cheaper on Standard.

Engine-Specific Cost Differences

The per-ACU and per-instance rates are identical for Aurora MySQL and Aurora PostgreSQL. The cost difference comes from workload behavior:

  • PostgreSQL's MVCC generates more I/O than MySQL's because dead tuples in the heap and indexes require VACUUM scans, which are read I/O, plus the writes to reclaim space.
  • PostgreSQL's process-per-connection model consumes more memory, potentially requiring a larger instance class.
  • MySQL's clustered index can cause write amplification with random primary keys, generating more I/O on inserts.

In my experience, Aurora PostgreSQL clusters run 10-20% higher total cost than equivalent Aurora MySQL clusters for the same OLTP workload, primarily due to VACUUM I/O. Workloads that leverage PostgreSQL-specific features (JSONB queries, PostGIS, full-text search) easily justify the premium by eliminating external services like Elasticsearch or dedicated geospatial databases.

Failure Modes and Operational Lessons

PostgreSQL: Vacuum Stalls and Transaction ID Wraparound

PostgreSQL's most dangerous failure mode is transaction ID (XID) wraparound. PostgreSQL uses 32-bit transaction IDs. After roughly 2 billion transactions, IDs wrap around, and the database must freeze old tuples to prevent data loss. If autovacuum cannot keep up, PostgreSQL enters "emergency autovacuum" mode and eventually refuses new write transactions entirely to protect data integrity.

I have seen this happen exactly once in production. An Aurora PostgreSQL cluster with a long-running analytics query prevented autovacuum from cleaning old tuples. The cluster hit the wraparound warning threshold and started logging increasingly urgent warnings before the on-call engineer noticed. The fix: kill the long-running query, let autovacuum catch up, and add monitoring for the age(datfrozenxid) metric.

Prevention: monitor pgdatabase.datfrozenxid and alert when it exceeds 500 million. Set statementtimeout for read replicas used by analytics queries. Never let a transaction run for hours on a production cluster.

MySQL: Replication Lag and Gap Lock Deadlocks

MySQL's primary operational headache on Aurora is gap lock contention. In REPEATABLE READ isolation (the InnoDB default), range queries lock gaps between index records, not just the matching rows. Concurrent transactions inserting into the same gap deadlock each other. The fix: use READ COMMITTED isolation if your application can tolerate it, or restructure queries to use unique lookups instead of range scans.

Binary log replication to external replicas (outside the Aurora cluster) can lag significantly under heavy write loads because the external replica must replay binlog events sequentially. Within the Aurora cluster, this is not an issue because replicas use shared storage.

Aurora-Specific Failure Modes

Failure ModeImpactMitigation
Storage node failure (single)None; quorum maintains reads and writesAutomatic; AWS replaces the node
AZ failureWrite quorum maintained (4-of-6 with 2 nodes in failed AZ)Automatic failover of writer to healthy AZ
AZ failure + additional nodeRead quorum still possible (3-of-6); writes blocked until repairFast repair; Aurora copies 10 GB segments
Writer instance failure15-30 second failover to replicaUse cluster endpoint; set short TCP keepalives
Serverless v2 cold scaling5-15 second latency spike when scaling from minimumSet minimum ACU above cold-start threshold
Storage full (128 TiB)Writes failMonitor FreeLocalStorage; archive or partition old data

Decision Framework

When to Choose Aurora MySQL

ScenarioWhy MySQL
High-connection OLTP with simple queriesThread-per-connection handles thousands of connections without pooling
Team already experienced with MySQLOperational familiarity reduces incidents
Migrating from existing MySQL/MariaDBWire-protocol compatible; minimal application changes
Write-heavy workloads with sequential keysClustered index delivers efficient ordered inserts
Latency-sensitive applicationsSimpler query planner; more predictable latency

When to Choose Aurora PostgreSQL

ScenarioWhy PostgreSQL
Complex queries with many joinsSuperior optimizer with hash/merge join strategies
JSONB, geospatial, or full-text search needsNative JSONB, PostGIS, tsvector; no external service needed
Need horizontal write scalingLimitless Database (Aurora PostgreSQL only)
Vector similarity search (ML/AI)pgvector extension for embeddings
Analytics mixed with OLTPBetter window functions, CTEs, and partitioning
Strict data integrity requirementsMore complete SQL standard support, CHECK constraints always enforced
Schema evolution flexibilityTransactional DDL; ALTER TABLE is safer

My Default Recommendation

For new projects on Aurora: start with PostgreSQL unless you have a specific reason not to. The extension ecosystem, query optimizer, and Limitless Database support make it the more capable foundation. The VACUUM overhead and connection pooling requirement are manageable operational costs that you learn to handle.

Switch to MySQL when: your workload is exclusively high-throughput, simple-query OLTP with thousands of direct connections, your team has deep MySQL operational experience, or you are migrating an existing MySQL application.

For DynamoDB as an alternative for key-value workloads, see AWS DynamoDB: An Architecture Deep-Dive. For caching layers in front of either engine, see Amazon ElastiCache: An Architecture Deep-Dive.

Key Takeaways

MySQL and PostgreSQL are both mature, production-grade databases. The architectural differences that matter most on Aurora:

Storage engines determine everything downstream. InnoDB's clustered index optimizes ordered range scans and keeps tables compact. PostgreSQL's heap storage optimizes insert performance and eliminates primary key sensitivity, at the cost of VACUUM overhead.

MVCC cleanup is the single biggest operational difference. MySQL handles it automatically through background purge threads. PostgreSQL requires autovacuum tuning and monitoring. On Aurora, the distributed storage layer does not change this equation.

Connection handling drives infrastructure decisions. MySQL's threaded model tolerates thousands of connections natively. PostgreSQL's process model requires connection pooling (PgBouncer or RDS Proxy) for anything beyond a few hundred connections.

Aurora's storage layer benefits both engines equally: 4-of-6 quorum writes, automatic replication across three AZs, shared storage for read replicas, and sub-second Global Database replication. The engine choice should be driven by application requirements, not Aurora-specific considerations.

PostgreSQL wins on capabilities. JSONB, PostGIS, full-text search, pgvector, Limitless Database, transactional DDL, and a superior query optimizer.

MySQL wins on simplicity. Fewer operational knobs, predictable performance, and a connection model that requires less infrastructure.

Additional Resources

Let's Build Something!

I help teams ship cloud infrastructure that actually works at scale. Whether you're modernizing a legacy platform, designing a multi-region architecture from scratch, or figuring out how AI fits into your engineering workflow, I've seen your problem before. Let me help.

Currently taking on select consulting engagements through Vantalect.