LABINTERMEDIATE

Document Ingestion Pipeline

Build a document ingestion pipeline for Markdown and PDF documents with chunking strategies, metadata extraction, and vector storage.

60 minutes
ai-infrastructure/rag
Document Ingestion Pipeline - Platform Engineering Hands-On Lab Icon

Lab Overview

This hands-on lab teaches you to build a complete document ingestion pipeline.

You'll learn to:

  • Create document loaders for Markdown and text files
  • Implement recursive and markdown-aware chunking strategies
  • Extract metadata from document structure
  • Embed and store chunks in Chroma vector database

This lab connects to the vector database you deployed previously.

Prerequisites

vector-database-deployment

chunking-strategies-lesson

Technologies Covered

document-processingchunkingembeddingschromapythonrag

Part of a Course

This lab is part of the RAG Architectures and Vector Databases course

View All Courses

Choose your plan

Simple, Transparent Pricing

One price, everything included

Monthly Plan

Access all content

$99/month
Save 16%

Quarterly Plan

Save 16% with quarterly billing

$249/quarter

Everything Included in Your Subscription

Content & Learning

  • Access to all courses and bootcamps
  • Video lessons with closed captions
  • Interactive quizzes and assessments
  • Course completion certificates

Hands-On Labs

  • Browser-based cloud labs
  • Pre-configured VMs ready to use
  • Playgrounds for experiments
  • Multi-VM realistic scenarios

AWS Integration

  • Managed AWS Account included
  • Pre-configured environments
  • Real-world cloud scenarios

Support & Community

  • Priority support
  • Active community forum

No Setup Required

  • Everything runs in your browser
  • No software installation needed
  • Automatic environment provisioning
  • Works on any device

Ready to Get Started?

Start this hands-on lab and build real-world Platform Engineering skills

Get Access Now