LABINTERMEDIATE
Document Ingestion Pipeline
Build a document ingestion pipeline for Markdown and PDF documents with chunking strategies, metadata extraction, and vector storage.
60 minutes
ai-infrastructure/rag

Lab Overview
This hands-on lab teaches you to build a complete document ingestion pipeline.
You'll learn to:
- Create document loaders for Markdown and text files
- Implement recursive and markdown-aware chunking strategies
- Extract metadata from document structure
- Embed and store chunks in Chroma vector database
This lab connects to the vector database you deployed previously.
Prerequisites
vector-database-deployment
chunking-strategies-lesson
Technologies Covered
document-processingchunkingembeddingschromapythonrag
Part of a Course
This lab is part of the RAG Architectures and Vector Databases course
View All CoursesChoose your plan
Simple, Transparent Pricing
One price, everything included
Monthly Plan
Access all content
$99/month
Save 16%
Quarterly Plan
Save 16% with quarterly billing
$249/quarter
Everything Included in Your Subscription
Content & Learning
- Access to all courses and bootcamps
- Video lessons with closed captions
- Interactive quizzes and assessments
- Course completion certificates
Hands-On Labs
- Browser-based cloud labs
- Pre-configured VMs ready to use
- Playgrounds for experiments
- Multi-VM realistic scenarios
AWS Integration
- Managed AWS Account included
- Pre-configured environments
- Real-world cloud scenarios
Support & Community
- Priority support
- Active community forum
No Setup Required
- Everything runs in your browser
- No software installation needed
- Automatic environment provisioning
- Works on any device
Ready to Get Started?
Start this hands-on lab and build real-world Platform Engineering skills
Get Access Now