Interview tip:"I use a mix: Compute Savings Plans for
baseline steady-state (66% savings), Spot for stateless/fault-tolerant
workloads like EKS worker nodes (90% savings), and On-Demand for spikes. For
RDS/ElastiCache, I use Reserved Instances. This typically saves 40-60% vs
all On-Demand."
Intermediate
Right-Sizing & Monitoring
Right-sizing is the #1 cost optimization lever β most instances are oversized:
Tool
What It Does
Actionable Output
AWS Cost Explorer
Visualize spend over time, forecast
RI/SP purchase recommendations
AWS Compute Optimizer
ML-based right-sizing recommendations
"Downsize m5.xlarge β m5.large" with impact analysis
Right-size EC2 β If CPU <20% average, downsize. Compute
Optimizer automates this.
Use gp3 over gp2 β Same baseline performance, 20% cheaper.
Immediate savings.
S3 lifecycle policies β Transition old data to Glacier. Most S3
data is accessed <30 days.
NAT Gateway optimization β Use S3/DynamoDB Gateway Endpoints
(free) to avoid NAT data charges.
π― Key Takeaway
Interview tip:"My cost optimization process: 1) Enable Cost
Explorer and set Budgets first, 2) Run Compute Optimizer for right-sizing,
3) Buy Savings Plans for steady-state, 4) Add Spot for fault-tolerant
workloads, 5) Lifecycle policies on all S3 buckets. This typically cuts
bills by 30-50%."
Advanced
Cost Governance & Allocation
Enterprise cost management β showing this in interviews demonstrates business
awareness:
Cost Allocation Strategy
Tagging β Tag every resource with Team, Environment, Project,
CostCenter. Enforce via SCP + AWS Config rules.
Separate Accounts β One AWS account per environment (dev,
staging, prod) per team. Consolidated billing via Organizations.
AWS Budgets β Per-account and per-tag budgets. Alert at 80%
threshold. Auto-actions: stop EC2, apply SCP.
FinOps Framework
Phase
Activities
AWS Tools
Inform
Visibility, allocation, reporting
Cost Explorer, Cost & Usage Report, QuickSight
Optimize
Right-size, commitments, architectural changes
Compute Optimizer, Savings Plans, Spot
Operate
Continuous monitoring, governance, culture
Budgets, Anomaly Detection, SCPs, Config rules
π― Key Takeaway
Interview tip:"I implement FinOps across three phases:
Inform (visibility via CUR + Cost Explorer), Optimize (Savings Plans,
right-sizing, Spot), Operate (Budgets with auto-actions, tagging enforcement
via SCPs). Cost tags are mandatory β I enforce them with AWS Config rules
that flag untagged resources."
Advanced
Cost-Efficient Architecture Patterns
Pattern
Expensive Way
Cost-Optimized Way
Savings
API Backend
ALB + EC2 (always running)
API Gateway + Lambda (pay per request)
80%+ for low traffic
Database
RDS Multi-AZ provisioned
Aurora Serverless v2 (scales to zero ACUs)
60%+ for variable workloads
Batch Processing
EC2 On-Demand fleet
Spot Fleet + SQS job queue
Up to 90%
Static Hosting
EC2 + Nginx
S3 + CloudFront
90%+ (no compute)
Container Workloads
EKS with On-Demand nodes
EKS with Karpenter + Spot + Graviton
60-80%
Data Lake Queries
Redshift always-on cluster
S3 + Athena (pay per query, $5/TB scanned)
80%+ for ad-hoc
The Hidden Costs (Interview Differentiator)
NAT Gateway β $0.045/GB data processing. Use VPC endpoints for
S3/DynamoDB to avoid.
Data Transfer β Inter-AZ: $0.01/GB. Inter-Region: $0.02/GB. Use
VPC endpoints + same-AZ where possible.
CloudWatch Logs β $0.50/GB ingestion. Use log levels wisely.
Set retention policies.
Elastic IP β Since Feb 2024, ALL public IPv4 addresses cost $0.005/hr whether attached or not. Delete unused EIPs and audit all public IPs.
EBS Snapshots β $0.05/GB. Old snapshots accumulate silently.
Use lifecycle policies.
π― Key Takeaway
Interview tip:"I architect for cost from day one:
serverless for variable workloads (pay-per-use), Spot for fault-tolerant
compute, Graviton for 20% savings. The hidden costs most teams miss are NAT
Gateway data processing, inter-AZ transfer, and CloudWatch log ingestion. I
set VPC endpoints for S3/DynamoDB and log retention policies as standard
practice."
Advanced
Interview Questions β Cost Optimization
Cost optimization shows business acumen β a trait that distinguishes senior architects. These questions test your ability to think about money, not just technology.
Your company's AWS bill jumped from $100K to $180K last month. Walk me through your investigation process to find and fix the cost increase.
Answer Guide
Cost Explorer (group by service, filter by tag, compare month-over-month). Check: new services launched, data transfer spikes, NAT Gateway processing, DynamoDB on-demand scaling, unattached EBS volumes/EIPs. Set up Cost Anomaly Detection for proactive alerts. Tag everything for attribution.
A team runs 200 m5.xlarge instances 24/7. They're all on-demand. Design a purchasing strategy to reduce costs by 50% or more.
Answer Guide
Analyze steady-state vs variable usage. Base load β Compute Savings Plans (1yr partial upfront, ~40% savings). Variable β Spot for fault-tolerant workloads. Consider Graviton (m7g.xlarge) for additional 20% savings. Dev/test β scheduled scaling (stop nights/weekends = 65% time reduction). Target: 50-60% total savings.
Explain the difference between Compute Savings Plans, EC2 Instance Savings Plans, and Reserved Instances. When would you choose each?
Answer Guide
Compute SP β most flexible (any instance family, size, OS, region, including Fargate/Lambda), ~66% savings. EC2 Instance SP β locked to instance family + region, ~72% savings. RI β most restrictive (specific instance type + AZ), similar savings to EC2 SP, but better for RDS/ElastiCache where SPs don't apply.
Your data transfer bill is $25,000/month. Where is the money going and what architectural changes would you make?
Answer Guide
Common culprits: NAT Gateway processing ($0.045/GB), inter-AZ traffic ($0.01/GB), inter-region replication, CloudFront to origin. Fixes: S3/DynamoDB gateway endpoints (free), VPC endpoints for other services, AZ-aware routing, CloudFront caching (reduce origin fetches), compress data before transfer.
Design a FinOps practice for an organization with 50 AWS accounts and 10 engineering teams. How do you implement cost accountability?
Answer Guide
AWS Organizations with consolidated billing. Mandatory tagging policy (team, project, environment) enforced by SCPs. Shared Reserved Capacity across accounts. Per-team cost dashboards using CUR + Athena/QuickSight. Monthly cost review meetings. Set budgets with alerts per team. Chargeback or showback model.
A startup is choosing between serverless (Lambda + DynamoDB) and container-based (EKS + RDS) for their new application. Expected traffic: 10,000 requests/day initially, growing to 1M/day in 12 months. Which is more cost-effective?
Answer Guide
At 10K/day: serverless wins dramatically (pay-per-use, near-zero idle cost). At 1M/day: depends on request duration and compute needs. Lambda breaks even with containers around 1-2M requests/hour (not per day). At 1M/day serverless is still cheaper. Crossover typically happens at sustained millions of requests per hour. Start serverless, re-evaluate when costs exceed container alternative.
Your team uses DynamoDB on-demand capacity. The monthly bill is $20,000. Traffic is predictable (peaks at 10AM-6PM, low overnight). How do you optimize?
Answer Guide
Switch to provisioned capacity with auto-scaling (set min/max based on traffic patterns). Provisioned is 20-40% cheaper for predictable workloads. Use DynamoDB Infrequent Access table class for cold data. Consider Reserved Capacity for baseline. Enable DAX caching to reduce consumed RCUs.
A CTO asks: "We spend $500K/year on AWS. Should we negotiate an Enterprise Discount Program (EDP)?" What factors would you consider?
Answer Guide
EDP requires a commit (typically $500K-$1M+ annual). Benefits: blanket discount on all services (typically 5-15%), credits, dedicated TAM. Considerations: growth trajectory (don't overcommit), existing SPs/RIs (EDP stacks with them), flexibility to reduce spend, negotiation leverage based on workload migration plans.
Estimate the monthly cost of running a three-tier web application: ALB, 4 EC2 instances (m5.xlarge), Aurora PostgreSQL (db.r5.xlarge with 1 read replica), and 500GB S3. Show your math.
Answer Guide
ALB: ~$25/month (fixed) + LCU charges. EC2: 4 Γ m5.xlarge on-demand Γ $0.192/hr Γ 730 hrs = ~$560. Aurora: 2 Γ db.r5.xlarge Γ $0.48/hr Γ 730 = ~$700 + I/O + storage. S3: 500GB Γ $0.023 = ~$11.50 + requests. Approximate total: ~$1,300-1,500/month. Interviewers want to see you can do napkin math, not exact numbers.
Your Kubernetes cluster has 30% average CPU utilization across nodes, meaning 70% of compute capacity is wasted. How would you improve utilization without impacting reliability?
Answer Guide
Karpenter for right-sized node provisioning (picks optimal instance type per pod spec). Pod resource requests/limits tuning with VPA recommendations. Consolidation policy (Karpenter bin-packs pods onto fewer nodes). Spot instances for non-critical workloads. Use Kubecost for per-namespace cost visibility. Target: 60-70% utilization.
Preparation Strategy
Cost questions test business awareness. Practice doing quick mental math: hourly EC2 pricing Γ 730 hours/month, data transfer volumes, and break-even calculations between pricing models. The ability to estimate costs on a whiteboard is a powerful differentiator in architect interviews.