Cost

Cost Optimization Interview Guide

Pricing models, right-sizing, governance, and cost-efficient architectures — shows business awareness in interviews.

4Topics

Intermediate

AWS Pricing Models

Model	Discount	Commitment	Flexibility	Best For
On-Demand	0% (baseline)	None	Start/stop anytime	Short-term, unpredictable, testing
Savings Plans (Compute)	Up to 66%	1 or 3 year $/hour	Any instance family, size, OS, region	Steady compute (EC2, Lambda, Fargate)
Savings Plans (EC2 Instance)	Up to 72%	1 or 3 year $/hour	Specific instance family + region	Known instance family, highest savings
Reserved Instances	Up to 72%	1 or 3 year	Specific instance type + AZ (less flexible)	Databases (RDS RI), legacy
Spot Instances	Up to 90%	None (can be interrupted)	Any available capacity	Batch, CI/CD, ML training, stateless

Savings Plans vs Reserved Instances

Aspect	Savings Plans	Reserved Instances
Flexibility	Change instance family, size, OS, region	Locked to specific instance type
Applies To	EC2, Fargate, Lambda	EC2, RDS, ElastiCache, Redshift, OpenSearch
Recommendation	✅ Preferred for EC2 compute	✅ Preferred for RDS, ElastiCache

Spot Instance Strategies

Diversify instance types — Request multiple instance families to reduce interruption (e.g., m5, m5a, m5n, m6i)
Use Spot Fleet — Automatically requests capacity from the cheapest available pools
Handle interruptions — 2-minute warning. Save state to S3, use SQS for job queues
Best for: EKS node pools, EMR clusters, CI/CD builds, batch processing

🎯 Key Takeaway

Interview tip: "I use a mix: Compute Savings Plans for baseline steady-state (66% savings), Spot for stateless/fault-tolerant workloads like EKS worker nodes (90% savings), and On-Demand for spikes. For RDS/ElastiCache, I use Reserved Instances. This typically saves 40-60% vs all On-Demand."

Intermediate

Right-Sizing & Monitoring

Right-sizing is the #1 cost optimization lever — most instances are oversized:

Tool	What It Does	Actionable Output
AWS Cost Explorer	Visualize spend over time, forecast	RI/SP purchase recommendations
AWS Compute Optimizer	ML-based right-sizing recommendations	"Downsize m5.xlarge → m5.large" with impact analysis
AWS Trusted Advisor	Best practice checks across 5 pillars	Idle instances, underutilized EBS, unused EIPs
AWS Budgets	Set cost/usage budgets with alerts	Email/SNS alert when 80%, 100% of budget hit
Cost Anomaly Detection	ML-based detection of unexpected spend	"Your EC2 spend increased 300% — investigate"

Top 5 Quick Wins

Delete unused resources — Unattached EBS volumes, unused EIPs, idle load balancers
Right-size EC2 — If CPU <20% average, downsize. Compute Optimizer automates this.
Use gp3 over gp2 — Same baseline performance, 20% cheaper. Immediate savings.
S3 lifecycle policies — Transition old data to Glacier. Most S3 data is accessed <30 days.
NAT Gateway optimization — Use S3/DynamoDB Gateway Endpoints (free) to avoid NAT data charges.

🎯 Key Takeaway

Interview tip: "My cost optimization process: 1) Enable Cost Explorer and set Budgets first, 2) Run Compute Optimizer for right-sizing, 3) Buy Savings Plans for steady-state, 4) Add Spot for fault-tolerant workloads, 5) Lifecycle policies on all S3 buckets. This typically cuts bills by 30-50%."

Advanced

Cost Governance & Allocation

Enterprise cost management — showing this in interviews demonstrates business awareness:

Cost Allocation Strategy

Tagging — Tag every resource with Team, Environment, Project, CostCenter. Enforce via SCP + AWS Config rules.
Separate Accounts — One AWS account per environment (dev, staging, prod) per team. Consolidated billing via Organizations.
AWS Budgets — Per-account and per-tag budgets. Alert at 80% threshold. Auto-actions: stop EC2, apply SCP.

FinOps Framework

Phase	Activities	AWS Tools
Inform	Visibility, allocation, reporting	Cost Explorer, Cost & Usage Report, QuickSight
Optimize	Right-size, commitments, architectural changes	Compute Optimizer, Savings Plans, Spot
Operate	Continuous monitoring, governance, culture	Budgets, Anomaly Detection, SCPs, Config rules

🎯 Key Takeaway

Interview tip: "I implement FinOps across three phases: Inform (visibility via CUR + Cost Explorer), Optimize (Savings Plans, right-sizing, Spot), Operate (Budgets with auto-actions, tagging enforcement via SCPs). Cost tags are mandatory — I enforce them with AWS Config rules that flag untagged resources."

Advanced

Cost-Efficient Architecture Patterns

Pattern	Expensive Way	Cost-Optimized Way	Savings
API Backend	ALB + EC2 (always running)	API Gateway + Lambda (pay per request)	80%+ for low traffic
Database	RDS Multi-AZ provisioned	Aurora Serverless v2 (scales to zero ACUs)	60%+ for variable workloads
Batch Processing	EC2 On-Demand fleet	Spot Fleet + SQS job queue	Up to 90%
Static Hosting	EC2 + Nginx	S3 + CloudFront	90%+ (no compute)
Container Workloads	EKS with On-Demand nodes	EKS with Karpenter + Spot + Graviton	60-80%
Data Lake Queries	Redshift always-on cluster	S3 + Athena (pay per query, $5/TB scanned)	80%+ for ad-hoc

The Hidden Costs (Interview Differentiator)

NAT Gateway — $0.045/GB data processing. Use VPC endpoints for S3/DynamoDB to avoid.
Data Transfer — Inter-AZ: $0.01/GB. Inter-Region: $0.02/GB. Use VPC endpoints + same-AZ where possible.
CloudWatch Logs — $0.50/GB ingestion. Use log levels wisely. Set retention policies.
Elastic IP — Since Feb 2024, ALL public IPv4 addresses cost $0.005/hr whether attached or not. Delete unused EIPs and audit all public IPs.
EBS Snapshots — $0.05/GB. Old snapshots accumulate silently. Use lifecycle policies.

🎯 Key Takeaway

Interview tip: "I architect for cost from day one: serverless for variable workloads (pay-per-use), Spot for fault-tolerant compute, Graviton for 20% savings. The hidden costs most teams miss are NAT Gateway data processing, inter-AZ transfer, and CloudWatch log ingestion. I set VPC endpoints for S3/DynamoDB and log retention policies as standard practice."

Advanced

Interview Questions — Cost Optimization

Cost optimization shows business acumen — a trait that distinguishes senior architects. These questions test your ability to think about money, not just technology.

Your company's AWS bill jumped from $100K to $180K last month. Walk me through your investigation process to find and fix the cost increase.

Answer Guide

Cost Explorer (group by service, filter by tag, compare month-over-month). Check: new services launched, data transfer spikes, NAT Gateway processing, DynamoDB on-demand scaling, unattached EBS volumes/EIPs. Set up Cost Anomaly Detection for proactive alerts. Tag everything for attribution.
A team runs 200 m5.xlarge instances 24/7. They're all on-demand. Design a purchasing strategy to reduce costs by 50% or more.

Answer Guide

Analyze steady-state vs variable usage. Base load → Compute Savings Plans (1yr partial upfront, ~40% savings). Variable → Spot for fault-tolerant workloads. Consider Graviton (m7g.xlarge) for additional 20% savings. Dev/test → scheduled scaling (stop nights/weekends = 65% time reduction). Target: 50-60% total savings.
Explain the difference between Compute Savings Plans, EC2 Instance Savings Plans, and Reserved Instances. When would you choose each?

Answer Guide

Compute SP — most flexible (any instance family, size, OS, region, including Fargate/Lambda), ~66% savings. EC2 Instance SP — locked to instance family + region, ~72% savings. RI — most restrictive (specific instance type + AZ), similar savings to EC2 SP, but better for RDS/ElastiCache where SPs don't apply.
Your data transfer bill is $25,000/month. Where is the money going and what architectural changes would you make?

Answer Guide

Common culprits: NAT Gateway processing ($0.045/GB), inter-AZ traffic ($0.01/GB), inter-region replication, CloudFront to origin. Fixes: S3/DynamoDB gateway endpoints (free), VPC endpoints for other services, AZ-aware routing, CloudFront caching (reduce origin fetches), compress data before transfer.
Design a FinOps practice for an organization with 50 AWS accounts and 10 engineering teams. How do you implement cost accountability?

Answer Guide

AWS Organizations with consolidated billing. Mandatory tagging policy (team, project, environment) enforced by SCPs. Shared Reserved Capacity across accounts. Per-team cost dashboards using CUR + Athena/QuickSight. Monthly cost review meetings. Set budgets with alerts per team. Chargeback or showback model.
A startup is choosing between serverless (Lambda + DynamoDB) and container-based (EKS + RDS) for their new application. Expected traffic: 10,000 requests/day initially, growing to 1M/day in 12 months. Which is more cost-effective?

Answer Guide

At 10K/day: serverless wins dramatically (pay-per-use, near-zero idle cost). At 1M/day: depends on request duration and compute needs. Lambda breaks even with containers around 1-2M requests/hour (not per day). At 1M/day serverless is still cheaper. Crossover typically happens at sustained millions of requests per hour. Start serverless, re-evaluate when costs exceed container alternative.
Your team uses DynamoDB on-demand capacity. The monthly bill is $20,000. Traffic is predictable (peaks at 10AM-6PM, low overnight). How do you optimize?

Answer Guide

Switch to provisioned capacity with auto-scaling (set min/max based on traffic patterns). Provisioned is 20-40% cheaper for predictable workloads. Use DynamoDB Infrequent Access table class for cold data. Consider Reserved Capacity for baseline. Enable DAX caching to reduce consumed RCUs.
A CTO asks: "We spend $500K/year on AWS. Should we negotiate an Enterprise Discount Program (EDP)?" What factors would you consider?

Answer Guide

EDP requires a commit (typically $500K-$1M+ annual). Benefits: blanket discount on all services (typically 5-15%), credits, dedicated TAM. Considerations: growth trajectory (don't overcommit), existing SPs/RIs (EDP stacks with them), flexibility to reduce spend, negotiation leverage based on workload migration plans.
Estimate the monthly cost of running a three-tier web application: ALB, 4 EC2 instances (m5.xlarge), Aurora PostgreSQL (db.r5.xlarge with 1 read replica), and 500GB S3. Show your math.

Answer Guide

ALB: ~$25/month (fixed) + LCU charges. EC2: 4 × m5.xlarge on-demand × $0.192/hr × 730 hrs = ~$560. Aurora: 2 × db.r5.xlarge × $0.48/hr × 730 = ~$700 + I/O + storage. S3: 500GB × $0.023 = ~$11.50 + requests. Approximate total: ~$1,300-1,500/month. Interviewers want to see you can do napkin math, not exact numbers.
Your Kubernetes cluster has 30% average CPU utilization across nodes, meaning 70% of compute capacity is wasted. How would you improve utilization without impacting reliability?

Answer Guide

Karpenter for right-sized node provisioning (picks optimal instance type per pod spec). Pod resource requests/limits tuning with VPA recommendations. Consolidation policy (Karpenter bin-packs pods onto fewer nodes). Spot instances for non-critical workloads. Use Kubecost for per-namespace cost visibility. Target: 60-70% utilization.

Preparation Strategy

Cost questions test business awareness. Practice doing quick mental math: hourly EC2 pricing × 730 hours/month, data transfer volumes, and break-even calculations between pricing models. The ability to estimate costs on a whiteboard is a powerful differentiator in architect interviews.