This lab is currently in Beta, content may be updated as we refine the material
LABADVANCED

IDP Observability and Operations

Build production-grade observability for your Internal Developer Platform using Prometheus, Grafana, Loki, OpenTelemetry, and SLO-based monitoring.

90 minutes
IDP Observability and Operations - Platform Engineering Hands-On Lab Icon

Lab Overview

🛠 Lab from the Platform Engineering Bootcamp. Used in Weeks 20, 21. Bootcamp landing page: https://academy.tekanaid.com/bootcamps/platform-engineering-bootcamp Parent course(s):

  • Week 20: Building the Internal Developer Platform: Capstone Part 1 (slug: building-internal-developer-platform)
  • Week 21: Platform Hardening & Production Readiness: Capstone Part 2 (slug: platform-hardening-production-readiness)

🟡 Beta bootcamp lab. Hands-on instructions, check scripts, and solve scripts are in place. Lab is part of the running TaskFlow project that grows across all 21 weeks of the bootcamp.

Implement comprehensive observability for a Kubernetes-based Internal Developer Platform. You'll deploy a full observability stack — metrics, logs, and traces — then layer on structured alerting, operational dashboards, and SLO tracking with error budgets.

Working on a real Minikube cluster, you'll install kube-prometheus-stack for metrics and alerting, Loki and Promtail for log aggregation, and the OpenTelemetry Collector with Jaeger for distributed tracing. You'll define PrometheusRules for IDP health alerts, build Grafana dashboards, write SLO recording rules, and simulate SLO burn events — skills that map directly to day-2 platform operations.

Key Learning Objectives:

  • Deploy kube-prometheus-stack (Prometheus, Grafana, Alertmanager) with Helm
  • Aggregate platform logs using Loki and Promtail
  • Collect distributed traces with OpenTelemetry Collector and Jaeger
  • Write PrometheusRules and configure Alertmanager routing
  • Build operational Grafana dashboards for IDP health
  • Define SLOs with recording rules and visualize error budgets

What You'll Learn

Deploy kube-prometheus-stack using Helm with pinned chart versions

Query platform metrics using PromQL in Grafana

Aggregate Kubernetes logs centrally with Loki and Promtail

Collect and visualize distributed traces with OpenTelemetry and Jaeger

Write PrometheusRule manifests for IDP health alerting

Configure Alertmanager routing for on-call notification workflows

Build multi-panel Grafana dashboards for operational visibility

Define SLOs using Prometheus recording rules and visualize error budgets

Choose your plan

Simple, Transparent Pricing

Unlock full access to TeKanAid courses, labs, and bootcamps

Buying for a team? Private corporate training is available for up to 15 learners.View team training
MonthlyQuarterly

Pro

Course content without labs

$59/month

Renews automatically. Cancel anytime.

Final price verified at checkout.

  • Full access to all courses
  • Progress tracking
  • Certificate of completion
  • Community access
  • Bootcamp participation
  • New content access
Recommended

Premium

Full access with hands-on labs

$99/month

Renews automatically. Cancel anytime.

Final price verified at checkout.

  • Everything in Pro
  • Unlimited hands-on labs
  • Lab AI Assistant
  • Accelerator bootcamps with live office hours
  • Priority support

Prefer a single course?

Purchase individual courses for a one-time fee of $79. Full access to course content, quizzes, certificates, and community features, lab access is not included.

Browse Courses

Try it free, no credit card

Three free ways to start. All bridge into the paid Premium catalog when you're ready.

Not ready to commit? The crash course is email-only. No academy account required.

Ready to Get Started?

Start this hands-on lab and build real-world Platform Engineering skills

Get Access Now