Databases

Database Interview Guide

Database selection, RDS vs Aurora vs DynamoDB, caching, and migration — asked in every SA interview.

5Topics

5Diagrams

AWS Database Services Decision Framework Diagram — Databases — Services Decision Framework

Advanced

Database Selection Framework

The #1 SA interview question: "Which database would you use?" — here's the decision matrix:

Requirement	Best Choice	Why
Relational + complex queries	Aurora (PostgreSQL/MySQL)	5x throughput of standard MySQL, auto-scaling storage, multi-AZ
Key-value, <10ms, any scale	DynamoDB	Serverless, single-digit ms latency, unlimited scale, on-demand pricing
Sub-millisecond caching	ElastiCache (Redis)	In-memory, supports complex data structures, pub/sub
Data warehouse / analytics	Redshift	Columnar storage, MPP, optimized for OLAP queries on petabytes
Document / flexible schema	DocumentDB	MongoDB-compatible, managed, scales to millions of reads/writes
Graph relationships	Neptune	Social networks, fraud detection, knowledge graphs
Time-series / IoT	Timestream	Purpose-built for time-series with built-in analytics
Ledger / immutable	Aurora PostgreSQL + pgcrypto	Verifiable transaction log (QLDB deprecated July 2025 — migrate to Aurora with audit tables)
In-memory + durability	MemoryDB for Redis	Redis-compatible with multi-AZ durable storage

The Decision Tree (Interview Framework)

Need SQL joins/transactions? → Yes: Aurora. No: continue.
Simple key-value access patterns? → Yes: DynamoDB. No: continue.
Need caching layer? → Yes: ElastiCache Redis. No: continue.
Analytics on large datasets? → Yes: Redshift or Athena (serverless). No: continue.
Graph relationships? → Yes: Neptune. No: check remaining options.

🎯 Key Takeaway

Interview tip: "I start with the access pattern: if I need complex SQL joins and ACID transactions, Aurora. If simple get/put with massive scale, DynamoDB. For analytics, Redshift. I always add ElastiCache in front for hot data — even a simple cache reduces RDS load by 90%."

Intermediate

RDS vs Aurora Deep Dive

Feature	RDS	Aurora	Aurora Serverless v2
Engines	MySQL, PostgreSQL, MariaDB, Oracle, SQL Server	MySQL, PostgreSQL only	MySQL, PostgreSQL only
Storage	EBS (gp3/io1), manual scaling	Auto-scales 10GB → 128TB, 6 copies across 3 AZs	Same as Aurora
Read Replicas	Up to 15 (async)	Up to 15 (shared storage, ~10ms lag)	Same as Aurora
Failover	~60-120 seconds	~30 seconds (faster with reader)	~30 seconds
Scaling	Manual instance resize (downtime)	Add readers, instance resize	Auto-scales 0.5 → 128 ACUs with load
Cost	Lower starting point	~20% more than RDS	Pay per ACU-hour (can scale to zero)
Global	Cross-region read replicas (minutes lag)	Global Database (<1 second lag, 5 secondary regions)	Supported (since 2024)
Best For	Small/medium, Oracle/SQL Server	Production MySQL/PostgreSQL	Variable/unpredictable workloads

Aurora Storage Architecture (Know This)

Aurora stores data in a shared, distributed storage layer: 6 copies across 3 AZs. It can lose 2 copies and still write, lose 3 and still read. Storage auto-heals — bad segments are automatically repaired. This is why Aurora has sub-30-second failover.

Multi-AZ vs Read Replicas

Multi-AZ: Standby instance for HA (automatic failover). Cannot serve reads. Purpose: availability.
Read Replica: Separate instance that serves read queries. Can be promoted to primary. Purpose: read scaling.
Aurora: Every reader is also a failover target. Multi-AZ + read scaling in one.

🎯 Key Takeaway

Interview tip: "I always recommend Aurora over standard RDS for production MySQL/PostgreSQL. Aurora's storage layer (6 copies, 3 AZs, auto-healing) gives better durability and 30-second failover. For unpredictable workloads, Aurora Serverless v2 auto-scales ACUs. For multi-region, Aurora Global Database provides <1 second replication."

Advanced

DynamoDB Architect Patterns

DynamoDB is THE go-to for serverless and high-scale architectures:

Key Concepts

Concept	What	Why It Matters
Partition Key (PK)	Required. Determines which partition stores the item.	Bad PK = hot partition = throttling. Use high-cardinality keys.
Sort Key (SK)	Optional. Enables range queries within a partition.	Design SK to support your query patterns (e.g., ORDER#timestamp).
GSI	Global Secondary Index — different PK/SK	Enables queries on non-key attributes. Has its own throughput.
LSI	Local Secondary Index — same PK, different SK	Must be created at table creation. Shares table throughput.
Single-Table Design	Store multiple entity types in one table	Reduces costs, enables transactional operations across entities.

Capacity Modes

Mode	Best For	Cost	Scaling
On-Demand	Unpredictable, new apps, spiky traffic	~6x provisioned price per request	Instant, automatic
Provisioned	Predictable, steady workloads	Lowest (with auto-scaling)	Auto-scaling with target utilization
Reserved Capacity	Committed steady workload	Up to 77% savings	Fixed capacity

DynamoDB Streams + EventBridge

Pattern: DynamoDB Streams captures every item change (INSERT, MODIFY, DELETE) → triggers Lambda → fan-out via EventBridge to downstream services. This is the CDC (Change Data Capture) pattern for real-time event-driven architectures.

🎯 Key Takeaway

Interview tip: "I design DynamoDB tables access-pattern-first: identify all queries before choosing PK/SK. I use single-table design to colocate related entities, On-Demand mode for new apps, then switch to Provisioned with auto-scaling once traffic is predictable. DynamoDB Streams + Lambda powers real-time event processing."

Intermediate

Caching — ElastiCache & DAX

Service	Engine	Latency	Best For
ElastiCache Redis	Redis (in-memory)	Sub-millisecond	Session store, leaderboards, pub/sub, complex data types
ElastiCache Memcached	Memcached (in-memory)	Sub-millisecond	Simple key-value caching, multi-threaded
DAX	DynamoDB-native cache	Microseconds	DynamoDB read-heavy workloads (drop-in, no code change)
CloudFront	CDN edge cache	Edge-level	Static assets, API responses at the edge

Caching Strategies

Strategy	How	Use When	Trade-off
Lazy Loading (Cache-Aside)	App checks cache → miss → read DB → write to cache	Most cases, default strategy	First read is always a miss (cold start)
Write-Through	Write to cache AND DB on every write	Data must always be in cache	Higher write latency, cache may hold unused data
Write-Behind	Write to cache → async write to DB later	High write throughput	Risk of data loss if cache crashes
TTL	Set expiration time on cached items	Always — prevents stale data	Must balance freshness vs hit rate

🎯 Key Takeaway

Interview tip: "I use lazy loading with TTL as my default caching strategy — it handles cache misses gracefully and TTL prevents stale data. For DynamoDB reads, DAX gives microsecond latency with zero code changes. Redis is my choice over Memcached because it supports pub/sub, sorted sets, and persistence."

Intermediate

Database Migration (DMS & SCT)

Tool	Purpose	Migration Type
DMS (Database Migration Service)	Migrate data between databases (including during operation)	Homogeneous (MySQL → MySQL) or Heterogeneous (Oracle → Aurora)
SCT (Schema Conversion Tool)	Convert schema + stored procedures between engines	Heterogeneous only (e.g., Oracle → PostgreSQL)

DMS Migration Modes

Full Load — One-time migration of all existing data
CDC (Change Data Capture) — Continuous replication of changes after full load
Full Load + CDC — Most common: migrate everything, then keep syncing until cutover

Migration Strategy

Assessment: Run SCT to analyze schema compatibility and identify conversion issues
Schema conversion: SCT converts schema, stored procedures, triggers
Full load: DMS migrates all existing data to the target
CDC replication: DMS continuously replicates changes while source is live
Cutover: Stop writes to source → verify target → switch application connection string

🎯 Key Takeaway

Interview tip: "For database migration, I use DMS with Full Load + CDC for near-zero downtime. For heterogeneous (Oracle → Aurora), I run SCT first to convert the schema, then DMS for data migration. During CDC, the source stays live — we cut over only when the replication lag is zero."

Advanced

Interview Questions — Databases

Database selection and design is a core architect competency. These questions test your ability to choose the right database and handle data at scale.

Your application has complex SQL queries with JOINs across 20 tables, needs ACID compliance, and handles 50,000 reads/second with 5,000 writes/second. Would you choose RDS PostgreSQL or Aurora PostgreSQL? Why?

Answer Guide

Aurora — 5x throughput over standard PostgreSQL, 6 copies across 3 AZs, auto-scaling storage to 128TB, up to 15 read replicas. The storage architecture (shared distributed volume) eliminates replication lag for read replicas.
Design a DynamoDB table for an e-commerce application. You need to query by customer_id (all their orders), by order_id (single order lookup), and by date range (orders in last 30 days). What's your key design?

Answer Guide

PK = customer_id, SK = order_date#order_id (enables range queries). GSI1: PK = order_id (single lookup). Consider single-table design vs multiple tables. Discuss hot partition risk if one customer has millions of orders.
Your DynamoDB table has a hot partition problem — one partition key receives 80% of all traffic. What are three strategies to fix it?

Answer Guide

1) Write sharding — add random suffix to partition key (e.g., customer_123#1, customer_123#2). 2) Composite keys — add a time-based prefix. 3) Caching — DAX in front of DynamoDB for read-heavy patterns. Also: re-evaluate the data model itself.
When would you use ElastiCache Redis vs DynamoDB DAX? A team is adding caching to their application and isn't sure which to choose.

Answer Guide

DAX is specifically for DynamoDB — microsecond read latency, API-compatible (drop-in). ElastiCache Redis is general-purpose — supports complex data structures (sorted sets, pub/sub, Lua scripting), works with any backend. Use DAX for DynamoDB-only caching, Redis for everything else.
You need to migrate a 2TB Oracle database to Aurora PostgreSQL with near-zero downtime. Walk me through the complete migration process.

Answer Guide

1) SCT to convert schema. 2) Fix incompatibilities manually (Oracle-specific PL/SQL, sequences, synonyms). 3) DMS Full Load to seed data. 4) DMS CDC for ongoing replication. 5) Validate data consistency. 6) Cut over when replication lag = 0. Test rollback plan before cutover.
Explain the caching strategies: lazy loading, write-through, and write-behind. Which would you use for a leaderboard that updates every second and is read by millions of users?

Answer Guide

Write-behind (write to cache immediately, async persist to DB) for real-time updates. Redis Sorted Sets are perfect for leaderboards (ZADD, ZRANGE). Lazy loading risks stale data. Write-through adds latency on every write. Discuss TTL as a safety net.
Design a multi-region database strategy for a global chat application. Users in US, Europe, and Asia need sub-100ms read and write latency from their region.

Answer Guide

DynamoDB Global Tables — active-active in all 3 regions with eventual consistency (typically sub-second replication). For SQL: Aurora Global Database gives low-latency reads in secondary regions, but writes go to one primary region. Trade-off: DynamoDB has eventual consistency, Aurora has single-writer.
A Lambda function connecting to an Aurora database is hitting connection limits during traffic spikes (1,000 concurrent Lambda executions). How do you solve this?

Answer Guide

RDS Proxy — connection pooling and multiplexing. Lambda creates a new connection per invocation; RDS Proxy pools them (e.g., 1,000 Lambda connections → 100 DB connections). Also supports IAM authentication and automatic failover. Without RDS Proxy, you'll exhaust max_connections.
Your team uses DynamoDB with On-Demand capacity mode. The monthly bill is $15,000. Reads are 80% of the cost and traffic is predictable. How would you optimize?

Answer Guide

Switch to Provisioned capacity with auto-scaling (20-40% cheaper for predictable traffic). Add DAX for read-heavy patterns (reduces consumed RCUs dramatically). Enable DynamoDB Reserved Capacity for baseline. Consider DynamoDB Infrequent Access table class for cold data.
What's the difference between DynamoDB strongly consistent reads and eventually consistent reads? When would you explicitly need strong consistency?

Answer Guide

Eventually consistent (default) — may return stale data (replication takes ~ms). Strongly consistent — always returns latest, but 2x the cost and only works against the leader node. Use strong consistency for: financial transactions, inventory counts, anything where reading stale data causes business errors.

Preparation Strategy

Database questions test your decision framework: SQL vs NoSQL, consistency vs availability, cost vs performance. For every answer, explain the trade-off: "I chose Aurora over DynamoDB because [requirement] outweighs [trade-off]." Interviewers want to see you think through options, not jump to a favorite.