Kubernetes & EKS

Kubernetes & EKS Interview Guide

EKS scaling, Karpenter, container lifecycle, and Kubernetes tech stack.

15Topics
11Diagrams
Kubernetes Architecture Overview Diagram
Kubernetes Architecture & EKS Overview
Beginner

Kubernetes Tools Landscape

Kubernetes Tools Ecosystem Diagram

Kubernetes Tools Ecosystem with AWS Cloud Implementation:

  • Observability: Prometheus, Grafana, Fluentbit, Jaeger, ADOT, CloudWatch, X-Ray
  • Scaling: Karpenter, AutoScaling
  • Delivery/Automation: Argo, Terraform, Jenkins, GitHub Actions, GitLab CI/CD
  • Security: Gatekeeper, Trivy, ECR Scan, GuardDuty, Kube Bench, Secrets Manager, Istio
  • Cost Optimization: CloudWatch Container Insights, Cost and Usage Report (Split Cost Allocation), Kubecost
Intermediate

Scaling EC2 Vs Lambda Vs EKS

Scaling EC2 vs Lambda vs EKS Comparison Diagram
Aspect EC2 + Auto Scaling Lambda EKS (Kubernetes)
Scaling Unit VM instance Function invocation Pod (container)
Scaling Speed Minutes (launch new EC2) Milliseconds (concurrent executions) Seconds (new pods) to minutes (new nodes)
Scaling Mechanism Auto Scaling Group (target tracking, step, scheduled) Automatic concurrency scaling HPA (Horizontal Pod Autoscaler), Karpenter/Cluster Autoscaler for nodes
Max Duration Unlimited 15 minutes Unlimited
Best For Long-running, stateful workloads Short, event-driven, bursty workloads Containerized microservices at scale
Operational Overhead Medium (patch, AMI management) Lowest (fully managed) Highest (cluster management, networking)
Intermediate

Kubernetes (EKS) Scaling

Kubernetes EKS Scaling Levels Diagram

EKS scaling operates at two levels:

Pod-Level Scaling

  • Horizontal Pod Autoscaler (HPA) β€” Adds or removes pod replicas based on CPU, memory, or custom metrics. Reacts in seconds.
  • Vertical Pod Autoscaler (VPA) β€” Adjusts CPU/memory requests and limits for individual pods. Requires pod restart.
  • KEDA (Kubernetes Event Driven Autoscaling) β€” Scales pods based on external event sources (SQS queue depth, Kafka lag, etc.).

Node-Level Scaling

  • Cluster Autoscaler β€” Watches for pods that can't be scheduled due to insufficient resources, then adds nodes from pre-configured Auto Scaling Groups. Limited to predefined instance types.
  • Karpenter β€” Next-generation node provisioner. Directly provisions the right-sized EC2 instance for pending pods. No need for pre-configured node groups. Faster, more efficient, and supports consolidation.
Intermediate

EKS Upgrade With Karpenter

EKS Upgrade with Karpenter Diagram

Karpenter simplifies EKS upgrades by managing node lifecycle automatically:

  1. Update the EKS control plane β€” Upgrade the Kubernetes version of the EKS cluster (managed by AWS).
  2. Update the Karpenter NodePool AMI β€” Change the amiFamily or AMI selector in the Karpenter NodePool spec to point to the new Kubernetes version AMI.
  3. Karpenter handles the rest β€” Karpenter detects nodes running outdated AMIs and uses its drift detection feature. It automatically cordons old nodes, gracefully drains pods, launches new nodes with the updated AMI, and schedules pods onto them.

Key advantage over Cluster Autoscaler: No need to manually manage multiple managed node groups, roll out updates ASG-by-ASG, or deal with PodDisruptionBudgets manually. Karpenter handles disruption budgets, pod rescheduling, and node replacement in one workflow.

Advanced

EKS Upgrade With Karpenter - Advanced

Disruption Controls

  • Consolidation β€” Karpenter continuously watches for underutilized nodes. It deletes or replaces nodes when pods can be rescheduled on fewer/smaller instances, reducing cost.
  • Expiration (TTL) β€” Set expireAfter on NodePool to force node recycling after a time period, ensuring nodes stay fresh and up-to-date.
  • Drift Detection β€” Karpenter detects when a node's configuration no longer matches the NodePool spec (AMI, instance type, labels) and replaces it.

NodePool Configuration

  • Multiple NodePools β€” Create separate NodePools for different workload types (e.g., GPU workloads, spot-tolerant batch jobs, on-demand critical services).
  • Instance Type Flexibility β€” Specify broad requirements (CPU, memory, architecture) and let Karpenter choose the cheapest matching instance.
  • Spot & On-Demand Mix β€” Use capacityType to mix Spot and On-Demand within the same NodePool with weighted priorities.
Beginner

Container Lifecycle - Local to Cloud

Container Lifecycle Local to Cloud Diagram

The journey of a containerized application from a developer's laptop to production in the cloud:

  1. Write Code + Dockerfile β€” Developer writes application code and a Dockerfile that defines the runtime environment, dependencies, and startup command.
  2. Build Container Image β€” Run docker build locally to create a container image from the Dockerfile.
  3. Test Locally β€” Run docker run to test the container on the local machine. Verify the application works as expected.
  4. Push to Registry β€” Push the image to Amazon ECR (Elastic Container Registry) so it's available in the cloud.
  5. Deploy to Orchestrator β€” Deploy the container image to Amazon EKS (Kubernetes), ECS (Elastic Container Service), or Fargate (serverless containers) using manifest YAMLs or task definitions.
  6. Run in Production β€” The orchestrator manages scheduling, scaling, health checks, rolling updates, and self-healing across multiple nodes and AZs.
Beginner

Kubernetes Node Pod Container Relationship

Kubernetes Node Pod Container Hierarchy Diagram

Understanding the hierarchy in Kubernetes:

  • Cluster β€” The top-level unit. A cluster is a set of nodes managed by the Kubernetes control plane.
  • Node β€” A worker machine (EC2 instance in EKS). Nodes provide compute resources (CPU, memory) for running pods.
  • Pod β€” The smallest deployable unit in Kubernetes. A pod wraps one or more containers that share the same network namespace (IP address) and storage volumes.
  • Container β€” The actual running application process. A container runs from a container image (e.g., from ECR).

Key Relationships

  • One Node runs many Pods
  • One Pod typically runs one Container (but can run multiple β€” see sidecar pattern)
  • All containers in a pod share the same IP address, ports, and volumes
  • Pods are ephemeral β€” they can be killed and rescheduled on different nodes at any time
Intermediate

How Karpenter Saves Money

How Karpenter Saves Money Diagram

Karpenter reduces costs through intelligent node provisioning and continuous optimization:

  1. Right-Sizing β€” Karpenter selects the optimal EC2 instance type for the pending pod's resource requirements. Instead of using a pre-configured m5.xlarge for a pod that needs 0.5 vCPU, Karpenter might launch a smaller t3.medium.
  2. Bin Packing β€” Karpenter packs pods tightly onto nodes to maximize utilization. Less wasted CPU/memory per node means fewer nodes needed overall.
  3. Consolidation β€” Karpenter continuously monitors running nodes. If pods can be reshuffled onto fewer or cheaper nodes, it automatically migrates them and terminates the excess nodes.
  4. Spot Instance Integration β€” Karpenter natively supports Spot instances and can fall back to On-Demand when Spot capacity is unavailable. Spot can save up to 90% vs On-Demand pricing.
  5. No Idle Node Groups β€” Unlike Cluster Autoscaler (which requires pre-configured ASGs that might sit idle), Karpenter provisions nodes on-demand and removes them when empty.
Intermediate

EDA with Kubernetes

Running event-driven architectures on Kubernetes combines the decoupling benefits of EDA with the orchestration power of K8s:

Pattern

Event Producer → Message Broker (SQS/SNS/Kafka) → Kubernetes Consumer Pods

Key Components

  • KEDA (Kubernetes Event Driven Autoscaler) β€” Scales consumer pods to zero when there are no messages, and scales up based on queue depth or event lag. Prevents idle resource costs.
  • Consumer Pods β€” Long-running pods that poll SQS, subscribe to SNS, or consume from Kafka topics. Each pod processes events independently.
  • DLQ (Dead Letter Queue) β€” Messages that fail processing after max retries are sent to a DLQ for investigation and replay.

Why Kubernetes for EDA?

  • Scale consumer pods independently per event type
  • Use different resource profiles for different event processors
  • Built-in health checks and self-healing for consumer reliability
Advanced

EDA with SNS, SQS, Kubernetes

EDA with SNS SQS Kubernetes Diagram

A common event-driven pattern combining AWS messaging with Kubernetes consumers:

Architecture Flow

  1. Producer publishes events to an SNS Topic.
  2. SNS fans out to multiple SQS Queues (one per consumer type), with message filtering to route events to the right queue.
  3. Kubernetes pods on EKS poll their respective SQS queues and process messages.
  4. KEDA monitors each SQS queue's ApproximateNumberOfMessages and scales the consumer pods accordingly β€” scaling to zero when the queue is empty.

Why SNS + SQS (not just SQS)?

  • SNS provides fan-out β€” one event goes to multiple subscribers
  • SQS provides buffering β€” messages are retained even if the consumer is down, with built-in retry and DLQ
  • Together they decouple producers from consumers and enable independent scaling
Intermediate

EKS Auto (Re:Invent 2024)

EKS Auto Mode Diagram

EKS Auto Mode, announced at Re:Invent 2024, is a fully managed experience for Amazon EKS that automates cluster infrastructure management:

What EKS Auto Manages

  • Compute β€” Automatically provisions and right-sizes EC2 instances (powered by Karpenter). No need to configure node groups or instance types.
  • Networking β€” Automatically configures VPC CNI, load balancers, and DNS for services.
  • Storage β€” Automatically provisions EBS volumes via CSI drivers.
  • Upgrades β€” Automatically upgrades the Kubernetes version, AMIs, and add-ons with minimal disruption.

Key Benefit

EKS Auto eliminates the operational burden of managing Kubernetes infrastructure. You focus on deploying applications; AWS handles the cluster. It's the "serverless experience" for Kubernetes β€” you just deploy pods and EKS Auto handles everything underneath.

Beginner

Docker Vs Kubernetes

Docker vs Kubernetes Comparison Diagram

Docker and Kubernetes solve different problems and work together, not against each other:

Aspect Docker Kubernetes
What It Is Container runtime β€” builds and runs containers Container orchestrator β€” manages containers at scale
Scope Single host (one machine) Multi-host cluster (many machines)
Scaling Manual (docker run more containers) Automatic (HPA, Karpenter)
Self-Healing None β€” if container crashes, it stays down Automatic restart, reschedule to healthy nodes
Load Balancing Not built-in Built-in Service + Ingress
Rolling Updates Manual Built-in Deployment strategies (rolling, blue-green)
Use Case Development, testing, single-server apps Production workloads at scale

Think of it this way: Docker is like a shipping container. Kubernetes is the port that manages thousands of shipping containers β€” scheduling, routing, stacking, and replacing them automatically.

Intermediate

Pod Container Sidecar

Pod Sidecar Pattern Diagram

The sidecar pattern runs a secondary container alongside the main application container within the same pod. Both containers share the same network and storage.

Common Sidecar Use Cases

  • Logging / Log Forwarding β€” Sidecar container (e.g., Fluentbit) reads log files written by the main container and ships them to CloudWatch, Elasticsearch, or Splunk.
  • Service Mesh Proxy β€” Envoy proxy sidecar (used by Istio, App Mesh) handles mTLS, traffic routing, retries, and observability without changing application code.
  • Monitoring Agent β€” Sidecar collects metrics and traces (e.g., ADOT, Datadog agent) and exports them to monitoring backends.
  • Secret Injection β€” Sidecar fetches secrets from AWS Secrets Manager or Vault and mounts them as files for the main container.

Why Sidecar Instead of Building It In?

  • Separation of concerns β€” Application code stays clean; cross-cutting concerns live in the sidecar.
  • Reusability β€” Same sidecar image works across all microservices.
  • Independent updates β€” Update the sidecar without redeploying the main application.
Advanced

Karpenter Bin Pack Granular Control

Karpenter Bin Pack Consolidation Diagram

Karpenter's bin packing decides how tightly pods are packed onto nodes to maximize resource utilization:

Consolidation Policies

  • WhenUnderutilized β€” Karpenter replaces nodes when it detects pods can fit on fewer or cheaper nodes. Aggressive cost savings but more pod disruption.
  • WhenEmpty β€” Karpenter only removes nodes when all pods have been drained (safest, least disruptive, but less cost-efficient).

Granular Controls

  • Pod resource requests β€” Set accurate CPU/memory requests so Karpenter can calculate true utilization and choose the right instance size.
  • Topology spread constraints β€” Force pods to spread across AZs or nodes for high availability, even if bin packing would prefer a single node.
  • Node affinity / anti-affinity β€” Control which pods can or cannot share the same node.
  • Do-not-disrupt annotation β€” Mark critical pods with karpenter.sh/do-not-disrupt: "true" to prevent Karpenter from evicting them during consolidation.

Best practice: Set accurate resource requests, use topology constraints for HA, and let Karpenter handle the rest. Over-requesting resources defeats the purpose of bin packing.

Intermediate

Kubernetes Tech Stack

Kubernetes Production Tech Stack Diagram

A typical production-ready Kubernetes tech stack on AWS EKS:

Layer Component Tools
Ingress External traffic routing AWS ALB Ingress Controller, Nginx Ingress, Traefik
Service Mesh Service-to-service communication, mTLS, traffic management Istio, Linkerd, AWS App Mesh
CI/CD Build and deploy pipelines ArgoCD (GitOps), Jenkins, GitHub Actions, Flux
Observability Metrics, logs, traces Prometheus + Grafana, Fluentbit + CloudWatch, Jaeger / X-Ray (ADOT)
Scaling Pod and node autoscaling HPA, KEDA, Karpenter
Security Policy enforcement, scanning OPA Gatekeeper, Trivy, Kube-bench, Falco
Secrets Secret management AWS Secrets Manager CSI Driver, External Secrets Operator
Storage Persistent volumes EBS CSI Driver, EFS CSI Driver
Networking Pod networking AWS VPC CNI, Calico (network policies)
Advanced

Interview Questions β€” Kubernetes & EKS

Kubernetes questions go deep on architecture internals, EKS operations, and production-readiness. These cover what interviewers actually ask.

  1. Answer Guide
    kubectl β†’ API Server (auth/authz/admission) β†’ etcd (store desired state) β†’ Scheduler (picks a node based on resources/affinity) β†’ Kubelet on node (pulls image, creates container via CRI) β†’ Pod running. Discuss the control loop pattern.
  2. Answer Guide
    AWS VPC CNI assigns real VPC IPs to pods. Each instance type has a max ENI Γ— IPs-per-ENI limit. Solutions: enable prefix delegation (16 IPs per slot instead of 1), use larger instance types, add secondary CIDR to VPC, or consider custom networking with separate pod subnets.
  3. Answer Guide
    Karpenter: group-less (no ASGs), provisions optimal instance type per pod spec, faster scaling (seconds vs minutes), supports consolidation (bin-packing to reduce waste), drift detection for automated upgrades. CA: uses ASGs, requires node groups per instance type, slower scheduling.
  4. Answer Guide
    Create namespaces (platform, backend, frontend). Role + RoleBinding per namespace for team-specific access. ClusterRole + ClusterRoleBinding for platform team. Map IAM roles to K8s groups using aws-auth ConfigMap or EKS access entries. Use IRSA for pod-level AWS permissions.
  5. Answer Guide
    ClusterIP (internal only), NodePort (exposes on every node's port, testing), LoadBalancer (creates AWS ALB/NLB per service β€” expensive if many services), Ingress (single ALB with path/host-based routing to multiple services β€” preferred for production). Discuss AWS ALB Ingress Controller.
  6. Answer Guide
    The application exceeds the 512Mi limit under load β†’ kernel OOM kills the container. Fix: increase limits to match actual peak usage (e.g., 768Mi). Discuss requests vs limits: requests guarantee scheduling, limits cap usage. VPA can auto-tune these based on actual usage.
  7. Answer Guide
    Stateless services β€” fine on EKS. Stateful databases β€” use with caution. PersistentVolumeClaims with EBS CSI driver (gp3), StorageClass for dynamic provisioning. Consider: node affinity for data locality, pod disruption budgets, backup strategies. For critical databases, managed services (RDS) are often better than self-managed on K8s.
  8. Answer Guide
    Rolling update with readiness probes. Database: backward-compatible migrations only (add columns, not rename/delete). Deploy new code that handles both old and new schema β†’ migrate schema β†’ remove old-schema code in next release. Discuss blue/green deployments with Argo Rollouts for safer rollbacks.
  9. Answer Guide
    K8s Secrets are base64-encoded (not encrypted by default), stored in etcd, visible to anyone with RBAC read on the namespace. Better: AWS Secrets Manager + External Secrets Operator (syncs secrets from AWS to K8s). Enable EKS envelope encryption for etcd. Use IRSA to limit which pods access which secrets.
  10. Answer Guide
    Metrics: Prometheus + Grafana (or Amazon Managed Prometheus/Grafana). Karpenter and HPA metrics. Logs: Fluent Bit β†’ CloudWatch Logs or OpenSearch. Traces: OpenTelemetry β†’ X-Ray or Jaeger. Key metrics: pod CPU/memory, request latency (p50/p95/p99), error rates, node utilization, pending pods count.

Preparation Strategy

Kubernetes interviews test both conceptual understanding and operational depth. Be ready to explain the control plane architecture, troubleshoot pod scheduling issues, and discuss production concerns (security, observability, upgrades). Hands-on experience with kubectl troubleshooting commands is essential.