Description
Run A/B tests on your AI agentβs prompts in production with this smart, database-driven workflow. Randomly assign chat sessions to either a baseline or experimental prompt, track results, and compare outcomesβall from within n8n.
Perfect for product teams, prompt engineers, or researchers optimizing LLM responses for quality, engagement, or conversions.
π§ How It Works:
Β β βοΈ New message arrives in chat β π Check if the session ID already exists in Supabase β π² If new, randomly assign a prompt (baseline or alternative) β ποΈ Store session ID and assigned prompt in Supabase β π¬ Generate response using assigned prompt via OpenAI (or compatible model) β π Track performance and compare results across sessions
π It Automates:
Β β Random assignment of new chat sessions to control/test prompts β Consistent prompt use throughout a session β Database-backed session tracking (no cookies or external state needed) β Structured prompt experimentation within your live agent pipeline β Simple scaling to multiple variants (A/B/C...) with minimal changes
π‘ Why Choose This Workflow:
Β β Run prompt experiments without writing backend code β Persistently associate sessions with prompt variants β Compare model behavior with subtle prompt changes β Improve your AI agent iteratively, based on real-world usage β Easy to expand for logging, metrics collection, or user feedback
π€ Who Is This For:
Β β AI teams optimizing prompt design β UX researchers testing conversational tone or style β Product managers experimenting with feature wording β Developers comparing OpenAI parameters like temperature or system prompts β Educators or researchers running controlled LLM experiments
π Integrations:
Β β Supabase (stores session-prompt assignments) β OpenAI / Anthropic / Ollama (handles LLM responses) β n8n Chat UI (for internal testing or embedded chat) β Optional logging or analytics tools (PostHog, Segment, etc.)
π§ͺ Run Clean Prompt Experiments β No Guesswork With persistent A/B prompt testing inside your workflow, you can stop guessing and start optimizing what your AI says, how it says it, and what performs best.
Link : [https://lovable.dev/projects/7026e079-73eb-43a6-bf3b-b256d6b9c271]