Gen AI Layers
Generative AI systems are built in three layers:
- Infrastructure Layer β The compute foundation that trains and runs models. Includes GPU/accelerator instances (e.g., EC2 P5, Trainium, Inferentia), storage (S3 for training data), and networking. AWS services: Amazon SageMaker, EC2 GPU instances, AWS Trainium chips.
- Model Layer β The foundation models (FMs) themselves. These are large pre-trained models (LLMs) like Anthropic Claude, Meta Llama, Amazon Titan, Stability AI. Access via Amazon Bedrock (managed, serverless access to multiple FMs) or self-host on SageMaker.
- Application Layer β The end-user-facing applications built on top of models. Includes prompt engineering, RAG pipelines, agents, chatbots, and code generators. AWS services: Amazon Q (AI assistant), PartyRock (no-code AI app builder), Bedrock Agents.
Bedrock vs SageMaker β When to Use Which
| Aspect | Amazon Bedrock | Amazon SageMaker |
|---|---|---|
| Model Type | Pre-built FMs (Claude, Llama, Titan) | Custom models or fine-tuned FMs |
| Infrastructure | Fully serverless β no instances | You manage instances (ml.p4d, etc.) |
| Customization | Prompt engineering + RAG + light fine-tuning | Full training, custom architectures |
| Pricing | Pay per token (input/output) | Pay per instance hour |
| Best For | Application teams consuming AI | ML teams building/training models |
π― Key Takeaway
Interview tip: Most companies operate at the Application Layer β consuming pre-trained models via APIs (Bedrock) rather than training their own. For SA interviews, know Bedrock (serverless FM access), Knowledge Bases (managed RAG), and Agents (autonomous task execution). Say: "Start with Bedrock for speed and cost. Move to SageMaker only when you need custom model training or specialized ML pipelines."