Streaming

Streaming Interview Guide

Data streaming with Kinesis and MSK, the messaging decision framework, real-time analytics pipelines, and video streaming architectures.

6Topics
Intermediate

SQS vs SNS vs Kinesis vs EventBridge

The #1 most-asked messaging question in architect interviews β€” knowing when to use which service:

AspectSQSSNSKinesis Data StreamsEventBridge
PatternQueue (point-to-point)Pub/Sub (fan-out)Ordered streamingEvent routing
OrderingFIFO only (3,000 msg/s)NoYes (per shard)No
RetentionUp to 14 daysNone (fire and forget)1–365 daysArchive up to unlimited
ReplayNoNoYes (any position)Yes (archive replay)
ConsumersSingle consumer (or fan-out via SQS per consumer)Multiple subscribersMultiple consumers (KCL, Lambda, Flink)Multiple targets per rule
ThroughputUnlimited (standard)Millions/sec1 MB/s per shard write, 2 MB/s read2,400 events/sec default
Best ForWork queues, decoupling, bufferingSimple fan-out, notificationsReal-time analytics, clickstream, IoT, log aggregationMicroservice events, SaaS integration, AWS service events

Decision Framework

  • "I need to decouple two services" β†’ SQS
  • "One event, multiple consumers" β†’ SNS (simple) or EventBridge (with content filtering)
  • "I need ordering + replay + high throughput" β†’ Kinesis Data Streams
  • "I need smart event routing with filtering" β†’ EventBridge
  • "I need the Kafka ecosystem" β†’ Amazon MSK

🎯 Key Takeaway

Interview tip: "I choose the messaging service based on the pattern: SQS for work queues and decoupling, SNS for simple fan-out, Kinesis for ordered high-throughput streaming with replay, and EventBridge for smart event routing between microservices. The key differentiator is whether you need ordering (Kinesis), content-based filtering (EventBridge), or simple queue semantics (SQS)."

Intermediate

Kinesis Data Streams & Firehose

Kinesis is the core data streaming service on AWS β€” understand the difference between Streams and Firehose:

Kinesis Data Streams vs Firehose

AspectKinesis Data StreamsKinesis Data Firehose
PurposeReal-time data streaming with custom consumersManaged data delivery to destinations
Latency~200ms (real-time)60 seconds minimum (near-real-time)
ConsumersCustom (KCL, Lambda, Flink) β€” you write processing logicAuto-delivers to S3, Redshift, OpenSearch, Splunk
ScalingManual shard management (or on-demand mode)Fully automatic
ReplayYes (1–365 day retention, seek to any position)No replay capability
TransformYou build consumersBuilt-in Lambda transform, format conversion (JSON→Parquet), compression
PricingPer shard-hour ($0.015/hr) + PUT payloadPer GB ingested ($0.029/GB)

Shard Architecture

  • Each shard: 1 MB/s write, 2 MB/s read (or 5 reads/sec)
  • Enhanced Fan-Out: 2 MB/s per consumer per shard (dedicated throughput). Use when you have 3+ consumers.
  • Partition Key: Determines which shard gets the record. Use high-cardinality keys (user_id) to avoid hot shards.
  • On-Demand Mode (2021): Auto-scales shards, no capacity planning. Pay per GB. Best for unpredictable traffic.

🎯 Key Takeaway

Interview tip: "Kinesis Data Streams and Firehose solve different problems. Streams is for real-time processing where I need custom consumers, ordering, and replay β€” like processing clickstream data with Flink. Firehose is for managed delivery β€” like landing log data in S3 as compressed Parquet files with automatic format conversion. I often use both: Streams for real-time processing, feeding into Firehose for the batch/archive path."

Advanced

Amazon MSK (Managed Kafka)

Amazon MSK is managed Apache Kafka on AWS β€” choose it when you need the Kafka ecosystem:

MSK vs Kinesis β€” Decision Framework

FactorChoose MSK WhenChoose Kinesis When
Existing ExpertiseTeam already knows KafkaTeam prefers AWS-native, simpler APIs
EcosystemNeed Kafka Connect, Kafka Streams, ksqlDB, Schema RegistryNeed tight Lambda/Firehose integration
RetentionUnlimited (tiered storage to S3)1–365 days
OperationsMore operational control (broker config, partition tuning)Serverless, minimal operations
Multi-CloudKafka API is portable to any cloudAWS-only (lock-in)
Consumer ModelConsumer groups with offset managementKCL checkpointing or Lambda ESM

MSK Serverless vs Provisioned

  • MSK Serverless: Auto-scales, no broker management, pay per data. Best for variable workloads and teams that don't want to manage Kafka infrastructure.
  • MSK Provisioned: Choose broker instances and config. Better price at sustained high throughput. Needed for advanced Kafka features.

🎯 Key Takeaway

Interview tip: "I choose MSK over Kinesis when the team has Kafka expertise and needs the ecosystem β€” Kafka Connect for CDC from databases, Kafka Streams for stateful stream processing, or Schema Registry for event contracts. For teams without Kafka experience who want a fully managed serverless experience, Kinesis Data Streams with Lambda consumers is simpler and cheaper to operate."

Advanced

Stream Processing Patterns

Architecture patterns for processing streaming data at scale:

Lambda Architecture vs Kappa Architecture

AspectLambda ArchitectureKappa Architecture
PathsTwo: speed layer (real-time) + batch layer (historical)One: stream processing only
Speed LayerKinesis β†’ Flink β†’ RedisKinesis β†’ Flink β†’ store
Batch LayerS3 β†’ Glue/EMR β†’ data warehouseReplay stream for reprocessing
ComplexityHigher (maintain 2 codepaths)Lower (single processing path)
Best ForWhen batch and real-time views need different computationWhen the same processing logic works for both real-time and historical

Processing Options on AWS

ServiceWhen to UseThroughput
Lambda (ESM)Simple per-record transforms, enrichment, filteringUp to 10 concurrent batches per shard
Managed Apache FlinkWindowed aggregations, complex event processing, stream joinsParallel processing with managed scaling
ECS/EKS + KCLCustom consumer logic, team prefers containers, long-running stateful processors1 worker per shard (KCL), custom scaling
Firehose + LambdaSimple transform before delivery to S3/Redshift/OpenSearchAuto-scaling, 60s–900s buffer

Windowing Strategies (For Flink)

  • Tumbling Window: Fixed size, non-overlapping (e.g., "count per 5-minute block"). Simplest.
  • Sliding Window: Overlapping (e.g., "trending in last 15 min, updated every 1 min"). Most common for real-time dashboards.
  • Session Window: Activity-based (e.g., "user session ends after 30 min of inactivity"). Used for user behavior analytics.

🎯 Key Takeaway

Interview tip: "For stream processing, I match the tool to the complexity. Lambda for simple per-event transforms (filter, enrich, format). Managed Apache Flink for stateful processing β€” windowed aggregations, stream joins, complex event processing. For delivery to S3 or data warehouses, Kinesis Firehose with a Lambda transform handles batching, compression, and format conversion automatically."

Intermediate

Live Streaming

AWS architecture for live video streaming at scale β€” understanding this shows you can design media-heavy systems:

Architecture Flow (5 Steps)

  1. Ingest β€” Live video source (camera, encoder) sends an RTMP/SRT stream to AWS Elemental MediaLive.
  2. Transcode β€” MediaLive transcodes the stream into multiple bitrates and resolutions (ABR β€” Adaptive Bitrate): e.g., 1080p, 720p, 480p, 240p for different devices and network conditions.
  3. Package β€” AWS Elemental MediaPackage packages the transcoded stream into HLS/DASH formats and provides origin endpoints with DVR/time-shift/catch-up TV capabilities.
  4. Deliver β€” Amazon CloudFront (CDN) distributes the live stream globally to millions of viewers with low latency via 450+ edge locations.
  5. Play β€” Viewers watch on web, mobile, or smart TV using a video player that dynamically switches quality based on bandwidth (ABR).

MediaLive vs IVS β€” When to Use Which

Aspect MediaLive + MediaPackage + CloudFront Amazon IVS
Complexity Full control, 10+ services to configure Fully managed, single API call
Latency ~10-30 seconds (standard HLS) ~2-5 seconds (low-latency)
Customization Full control over transcoding, packaging, DRM Limited, opinionated defaults
Best For Broadcast TV, large-scale OTT (Netflix-like) Interactive streams (Twitch-like), quick prototypes
DRM Support Yes (Widevine, FairPlay, PlayReady via MediaPackage) No native DRM
Cost Higher, pay per channel-hour + egress Lower, pay per hour + viewer-hours

Streaming Protocols (Quick Reference)

  • HLS (HTTP Live Streaming) β€” Apple standard. Works everywhere. 10-30s latency. Use for broad compatibility.
  • DASH (Dynamic Adaptive Streaming over HTTP) β€” Open standard. Similar to HLS. Used by YouTube, Netflix.
  • LL-HLS (Low Latency HLS) β€” Apple's low-latency extension. 2-4 seconds. Use for near-real-time.
  • WebRTC β€” Sub-second latency. Use for video conferencing, not broadcast.

🎯 Key Takeaway

Interview tip: Know the full pipeline and when to simplify. Say: "For a broadcast-quality OTT platform, I'd use MediaLive β†’ MediaPackage β†’ CloudFront with ABR transcoding. For a quick interactive streaming feature (like in-app live video), Amazon IVS gives you 2-5 second latency with a single API call and no infrastructure to manage."

Advanced

Live Streaming with Ads

Server-side ad insertion (SSAI) is the enterprise approach to monetizing live streams while defeating ad blockers:

SSAI Architecture Flow

  1. Live Stream + Ad Markers β€” MediaLive inserts SCTE-35 markers (cue points) into the video stream at designated ad break positions.
  2. Ad Decision Server (ADS) β€” When a viewer requests the stream, AWS Elemental MediaTailor detects SCTE-35 markers and calls the Ad Decision Server to fetch personalized ads for that specific viewer.
  3. Server-Side Stitching β€” MediaTailor transcodes the ad to match the stream's bitrate/format and stitches it directly into the video stream server-side. The viewer receives a single, seamless stream.
  4. Content Delivery β€” CloudFront delivers the personalized stream (with stitched ads) to each viewer. Each viewer may see different ads.

SSAI vs Client-Side Ads (CSAI)

Aspect SSAI (Server-Side) CSAI (Client-Side)
Ad Blockers βœ… Immune β€” ads are in the video stream ❌ Easily blocked
User Experience Seamless, no buffering between content and ads May see loading, quality changes
Personalization Per-viewer personalized ads Per-viewer personalized ads
Measurement Server-side tracking (more accurate) Client-side pixels (can be blocked)
Complexity Higher (MediaTailor + ADS integration) Lower (JS SDK in player)
Use Case Premium OTT, broadcast, sports Web-only, short-form content

SCTE-35 Markers β€” What They Are

SCTE-35 is the industry standard for signaling ad breaks in video streams. MediaLive can insert them automatically on a timer, or your automation system can trigger them via the MediaLive API at specific moments (e.g., between game quarters).

🎯 Key Takeaway

Interview tip: Say: "For ad monetization of live streams, I'd use SSAI with MediaTailor because ads are stitched server-side, making them ad-blocker resistant. MediaLive inserts SCTE-35 markers, MediaTailor detects them and fetches personalized ads from the ADS, then stitches them into the stream before CloudFront delivers it. Each viewer gets a unique, seamless experience."

Advanced

Interview Questions β€” Streaming & Analytics

Streaming questions cover both data streaming (Kinesis, MSK) and video streaming architecture. These test real-time data pipeline design.

  1. Answer Guide
    Kinesis Data Streams β€” ordered within shard (by partition key), handles massive throughput, 7-day retention. SQS has no ordering (unless FIFO, which caps at 3,000 msg/s). SNS is pub/sub (no retention). EventBridge is for event routing, not high-throughput data streaming.
  2. Answer Guide
    MSK if: existing Kafka expertise, need Kafka ecosystem (Kafka Connect, Kafka Streams, ksqlDB), complex consumer group patterns, longer retention (unlimited). Kinesis if: fully serverless, tight AWS integration (Lambda, Firehose), simpler operations. Migration path: MSK preserves Kafka APIs, no code changes.
  3. Answer Guide
    Clickstream β†’ Kinesis Data Streams (partitioned by user_id) β†’ Managed Apache Flink (sliding window aggregation over 15 min) β†’ DynamoDB/ElastiCache for trending products β†’ API Gateway + Lambda for dashboard API. Alternative: Kinesis β†’ Firehose β†’ S3 for batch analytics with Athena.
  4. Answer Guide
    Use a composite partition key (product_id + random_suffix) to distribute across shards. Or use a hash-based partition key. Trade-off: you lose strict ordering for that product_id. Alternative: re-shard (split the hot shard), but this is temporary if the key distribution doesn't change.
  5. Answer Guide
    Kinesis provides at-least-once delivery. For exactly-once: use KCL (Kinesis Client Library) with checkpointing + DynamoDB for deduplication (idempotency key per record). Use enhanced fan-out for dedicated throughput per consumer. Discuss: Kafka with transactions can provide exactly-once semantics natively β€” trade-off consideration.
  6. Answer Guide
    IVS for simplicity (managed, sub-3s latency out-of-the-box, WebRTC-based). MediaLive stack for control (custom encoding profiles, DVR, DRM, SSAI ads, multi-CDN). At 100K viewers, IVS is simpler but less customizable. If you need ad insertion or premium features, go MediaLive stack.
  7. Answer Guide
    CloudWatch Agent on instances β†’ CloudWatch Logs β†’ Subscription Filter β†’ Kinesis Data Firehose β†’ OpenSearch (near-real-time indexing, ~30s). Alternative: Fluent Bit β†’ Kinesis Data Streams β†’ Lambda/Flink β†’ OpenSearch. Discuss: Firehose buffers (60-900 seconds), so tune buffer interval to 60s for near-real-time.
  8. Answer Guide
    They solve different problems. Streams: real-time processing, custom consumers, replay, ordering. Firehose: fully managed delivery to S3/Redshift/OpenSearch with built-in transformation (Lambda), batching, compression, and format conversion (Parquet). Use Streams for real-time processing, Firehose for data delivery/ETL.
  9. Answer Guide
    Kinesis retains data for 24-365 hours. Use TRIM_HORIZON or AT_TIMESTAMP to replay from before the outage. Idempotent consumers (dedup using DynamoDB conditional writes) ensure no duplicates. For Firehose: check S3 for already-delivered files. For Kafka/MSK: reset consumer group offset to the outage timestamp.
  10. Answer Guide
    SSAI (MediaTailor) stitches ads into the video stream server-side β€” ad blockers can't detect them, seamless quality matching, frame-accurate insertion. CSAI relies on client player to fetch ads β€” blocked by ad blockers, buffering during transitions. SSAI uses SCTE-35 markers in the manifest to identify ad break positions.

Preparation Strategy

Streaming questions test real-time data architecture. Know the decision framework: SQS (decoupling) vs SNS (fan-out) vs Kinesis (ordered streaming) vs EventBridge (event routing). For each, know throughput limits, retention periods, and ordering guarantees β€” interviewers will probe these specifics.