Cost

Cost Optimization Interview Guide

Pricing models, right-sizing, governance, and cost-efficient architectures β€” shows business awareness in interviews.

4Topics
Intermediate

AWS Pricing Models

Model Discount Commitment Flexibility Best For
On-Demand 0% (baseline) None Start/stop anytime Short-term, unpredictable, testing
Savings Plans (Compute) Up to 66% 1 or 3 year $/hour Any instance family, size, OS, region Steady compute (EC2, Lambda, Fargate)
Savings Plans (EC2 Instance) Up to 72% 1 or 3 year $/hour Specific instance family + region Known instance family, highest savings
Reserved Instances Up to 72% 1 or 3 year Specific instance type + AZ (less flexible) Databases (RDS RI), legacy
Spot Instances Up to 90% None (can be interrupted) Any available capacity Batch, CI/CD, ML training, stateless

Savings Plans vs Reserved Instances

Aspect Savings Plans Reserved Instances
Flexibility Change instance family, size, OS, region Locked to specific instance type
Applies To EC2, Fargate, Lambda EC2, RDS, ElastiCache, Redshift, OpenSearch
Recommendation βœ… Preferred for EC2 compute βœ… Preferred for RDS, ElastiCache

Spot Instance Strategies

  • Diversify instance types β€” Request multiple instance families to reduce interruption (e.g., m5, m5a, m5n, m6i)
  • Use Spot Fleet β€” Automatically requests capacity from the cheapest available pools
  • Handle interruptions β€” 2-minute warning. Save state to S3, use SQS for job queues
  • Best for: EKS node pools, EMR clusters, CI/CD builds, batch processing

🎯 Key Takeaway

Interview tip: "I use a mix: Compute Savings Plans for baseline steady-state (66% savings), Spot for stateless/fault-tolerant workloads like EKS worker nodes (90% savings), and On-Demand for spikes. For RDS/ElastiCache, I use Reserved Instances. This typically saves 40-60% vs all On-Demand."

Intermediate

Right-Sizing & Monitoring

Right-sizing is the #1 cost optimization lever β€” most instances are oversized:

Tool What It Does Actionable Output
AWS Cost Explorer Visualize spend over time, forecast RI/SP purchase recommendations
AWS Compute Optimizer ML-based right-sizing recommendations "Downsize m5.xlarge β†’ m5.large" with impact analysis
AWS Trusted Advisor Best practice checks across 5 pillars Idle instances, underutilized EBS, unused EIPs
AWS Budgets Set cost/usage budgets with alerts Email/SNS alert when 80%, 100% of budget hit
Cost Anomaly Detection ML-based detection of unexpected spend "Your EC2 spend increased 300% β€” investigate"

Top 5 Quick Wins

  1. Delete unused resources β€” Unattached EBS volumes, unused EIPs, idle load balancers
  2. Right-size EC2 β€” If CPU <20% average, downsize. Compute Optimizer automates this.
  3. Use gp3 over gp2 β€” Same baseline performance, 20% cheaper. Immediate savings.
  4. S3 lifecycle policies β€” Transition old data to Glacier. Most S3 data is accessed <30 days.
  5. NAT Gateway optimization β€” Use S3/DynamoDB Gateway Endpoints (free) to avoid NAT data charges.

🎯 Key Takeaway

Interview tip: "My cost optimization process: 1) Enable Cost Explorer and set Budgets first, 2) Run Compute Optimizer for right-sizing, 3) Buy Savings Plans for steady-state, 4) Add Spot for fault-tolerant workloads, 5) Lifecycle policies on all S3 buckets. This typically cuts bills by 30-50%."

Advanced

Cost Governance & Allocation

Enterprise cost management β€” showing this in interviews demonstrates business awareness:

Cost Allocation Strategy

  • Tagging β€” Tag every resource with Team, Environment, Project, CostCenter. Enforce via SCP + AWS Config rules.
  • Separate Accounts β€” One AWS account per environment (dev, staging, prod) per team. Consolidated billing via Organizations.
  • AWS Budgets β€” Per-account and per-tag budgets. Alert at 80% threshold. Auto-actions: stop EC2, apply SCP.

FinOps Framework

Phase Activities AWS Tools
Inform Visibility, allocation, reporting Cost Explorer, Cost & Usage Report, QuickSight
Optimize Right-size, commitments, architectural changes Compute Optimizer, Savings Plans, Spot
Operate Continuous monitoring, governance, culture Budgets, Anomaly Detection, SCPs, Config rules

🎯 Key Takeaway

Interview tip: "I implement FinOps across three phases: Inform (visibility via CUR + Cost Explorer), Optimize (Savings Plans, right-sizing, Spot), Operate (Budgets with auto-actions, tagging enforcement via SCPs). Cost tags are mandatory β€” I enforce them with AWS Config rules that flag untagged resources."

Advanced

Cost-Efficient Architecture Patterns

Pattern Expensive Way Cost-Optimized Way Savings
API Backend ALB + EC2 (always running) API Gateway + Lambda (pay per request) 80%+ for low traffic
Database RDS Multi-AZ provisioned Aurora Serverless v2 (scales to zero ACUs) 60%+ for variable workloads
Batch Processing EC2 On-Demand fleet Spot Fleet + SQS job queue Up to 90%
Static Hosting EC2 + Nginx S3 + CloudFront 90%+ (no compute)
Container Workloads EKS with On-Demand nodes EKS with Karpenter + Spot + Graviton 60-80%
Data Lake Queries Redshift always-on cluster S3 + Athena (pay per query, $5/TB scanned) 80%+ for ad-hoc

The Hidden Costs (Interview Differentiator)

  • NAT Gateway β€” $0.045/GB data processing. Use VPC endpoints for S3/DynamoDB to avoid.
  • Data Transfer β€” Inter-AZ: $0.01/GB. Inter-Region: $0.02/GB. Use VPC endpoints + same-AZ where possible.
  • CloudWatch Logs β€” $0.50/GB ingestion. Use log levels wisely. Set retention policies.
  • Elastic IP β€” Since Feb 2024, ALL public IPv4 addresses cost $0.005/hr whether attached or not. Delete unused EIPs and audit all public IPs.
  • EBS Snapshots β€” $0.05/GB. Old snapshots accumulate silently. Use lifecycle policies.

🎯 Key Takeaway

Interview tip: "I architect for cost from day one: serverless for variable workloads (pay-per-use), Spot for fault-tolerant compute, Graviton for 20% savings. The hidden costs most teams miss are NAT Gateway data processing, inter-AZ transfer, and CloudWatch log ingestion. I set VPC endpoints for S3/DynamoDB and log retention policies as standard practice."

Advanced

Interview Questions β€” Cost Optimization

Cost optimization shows business acumen β€” a trait that distinguishes senior architects. These questions test your ability to think about money, not just technology.

  1. Answer Guide
    Cost Explorer (group by service, filter by tag, compare month-over-month). Check: new services launched, data transfer spikes, NAT Gateway processing, DynamoDB on-demand scaling, unattached EBS volumes/EIPs. Set up Cost Anomaly Detection for proactive alerts. Tag everything for attribution.
  2. Answer Guide
    Analyze steady-state vs variable usage. Base load β†’ Compute Savings Plans (1yr partial upfront, ~40% savings). Variable β†’ Spot for fault-tolerant workloads. Consider Graviton (m7g.xlarge) for additional 20% savings. Dev/test β†’ scheduled scaling (stop nights/weekends = 65% time reduction). Target: 50-60% total savings.
  3. Answer Guide
    Compute SP β€” most flexible (any instance family, size, OS, region, including Fargate/Lambda), ~66% savings. EC2 Instance SP β€” locked to instance family + region, ~72% savings. RI β€” most restrictive (specific instance type + AZ), similar savings to EC2 SP, but better for RDS/ElastiCache where SPs don't apply.
  4. Answer Guide
    Common culprits: NAT Gateway processing ($0.045/GB), inter-AZ traffic ($0.01/GB), inter-region replication, CloudFront to origin. Fixes: S3/DynamoDB gateway endpoints (free), VPC endpoints for other services, AZ-aware routing, CloudFront caching (reduce origin fetches), compress data before transfer.
  5. Answer Guide
    AWS Organizations with consolidated billing. Mandatory tagging policy (team, project, environment) enforced by SCPs. Shared Reserved Capacity across accounts. Per-team cost dashboards using CUR + Athena/QuickSight. Monthly cost review meetings. Set budgets with alerts per team. Chargeback or showback model.
  6. Answer Guide
    At 10K/day: serverless wins dramatically (pay-per-use, near-zero idle cost). At 1M/day: depends on request duration and compute needs. Lambda breaks even with containers around 1-2M requests/hour (not per day). At 1M/day serverless is still cheaper. Crossover typically happens at sustained millions of requests per hour. Start serverless, re-evaluate when costs exceed container alternative.
  7. Answer Guide
    Switch to provisioned capacity with auto-scaling (set min/max based on traffic patterns). Provisioned is 20-40% cheaper for predictable workloads. Use DynamoDB Infrequent Access table class for cold data. Consider Reserved Capacity for baseline. Enable DAX caching to reduce consumed RCUs.
  8. Answer Guide
    EDP requires a commit (typically $500K-$1M+ annual). Benefits: blanket discount on all services (typically 5-15%), credits, dedicated TAM. Considerations: growth trajectory (don't overcommit), existing SPs/RIs (EDP stacks with them), flexibility to reduce spend, negotiation leverage based on workload migration plans.
  9. Answer Guide
    ALB: ~$25/month (fixed) + LCU charges. EC2: 4 Γ— m5.xlarge on-demand Γ— $0.192/hr Γ— 730 hrs = ~$560. Aurora: 2 Γ— db.r5.xlarge Γ— $0.48/hr Γ— 730 = ~$700 + I/O + storage. S3: 500GB Γ— $0.023 = ~$11.50 + requests. Approximate total: ~$1,300-1,500/month. Interviewers want to see you can do napkin math, not exact numbers.
  10. Answer Guide
    Karpenter for right-sized node provisioning (picks optimal instance type per pod spec). Pod resource requests/limits tuning with VPA recommendations. Consolidation policy (Karpenter bin-packs pods onto fewer nodes). Spot instances for non-critical workloads. Use Kubecost for per-namespace cost visibility. Target: 60-70% utilization.

Preparation Strategy

Cost questions test business awareness. Practice doing quick mental math: hourly EC2 pricing Γ— 730 hours/month, data transfer volumes, and break-even calculations between pricing models. The ability to estimate costs on a whiteboard is a powerful differentiator in architect interviews.