Description
Automated context-based document chunking and embedding for enhanced retrieval in RAG pipelines β powered by AI
π Say goodbye to rigid splitting β this workflow intelligently segments documents into context-preserving chunks and stores them in Pinecone for semantically rich search and Retrieval-Augmented Generation (RAG).
π§ What Problem Does It Solve?
Standard chunking in RAG pipelines often loses context, leading to poor retrieval performance. This workflow automates context-aware chunking using section-based logic and AI to retain document-level meaning, dramatically improving LLM accuracy during retrieval.
It provides semantic-ready embeddings with full context on:
β
Document sections
β
Cross-referenced metadata
β
Meaning-preserving chunking
β
Enhanced semantic embeddings
βοΈ How It Works
π Pulls a structured document from Google Drive
π Extracts text and detects section boundaries
π§© Splits text into context-aware chunks using code logic
π Loops through each chunk for individual processing
π€ Uses OpenRouter + GPT-4.0-mini to generate succinct chunk context
πͺ Prepends AI-generated context to each chunk
π§ Embeds enriched chunks using Google Gemini (text-embedding-004)
π¦ Stores embeddings in Pinecone vector store with metadata
β¨ Key Features
π₯ Automatically fetches documents from Google Drive
π§ Uses GPT-4.0-mini via OpenRouter to generate contextual metadata
π§Ύ Prepends context to boost semantic relevance
π§ Improves retrieval accuracy in RAG workflows
𧬠Creates AI-enriched vector representations with Google Gemini
ποΈ Stores structured embeddings into Pinecone with traceable metadata
βοΈ Scales across document types and projects
π Built-in error handling and modular design
π§° What You Need
β
Google Drive file with structured sections
β
Pinecone account and index
β
OpenRouter or OpenAI API access (GPT-4.0-mini)
β
Google Gemini API key (for embeddings)
β
n8n setup for automation
β
(Optional) YouTube link for demo & visualization
π Setup Instructions
π Connect the workflow to your source folder in Google Drive
π Add OpenAI/OpenRouter and Gemini API credentials
π§Ύ Use structured text markers like [SECTIONEND] for clean chunking
π Loop through sections and enrich with AI-generated context
π§ Generate embeddings using Gemini's text-embedding-004 model
π¦ Store final vectors in Pinecone, including original + enriched context
π§ͺ Test with a small document before scaling
π Integrations
Google Drive
OpenRouter (GPT-4.0-mini)
Google Gemini
Pinecone Vector Store
n8n