Description
Build a smart crawler with agentic capabilities that extracts structured data from any website โ all powered via n8n, OpenAI, and Supabase.
Perfect for anyone needing targeted info (like social links, contact pages, etc.) across websites without manual digging.
๐นย How It Works:
โ
๐ Input URL โ Provide any websiteย
โ
๐ Link Collection โ The AI agent fetches all available links using theย URLs Tool
โ
๐ Text Analysis โ It then scrapes and analyzes text content using theย Text Tool
โ
๐ง AI Agent Logic โ The OpenAI-powered agent interprets the structure and finds the info you need (e.g., social links, emails, contact details)
โ
๐ฆ Store Output โ Results are saved inย Supabaseย (or your preferred DB) for later use
๐ผย Use Cases:
๐ย Social Media Link Extractionย โ Pull all available social handles from websites
๐ฌย Contact Info Miningย โ Auto-fetch emails, phone numbers, and contact forms
๐ย Lead Enrichmentย โ Feed company URLs and get back useful business info
๐ฐย Content Harvestingย โ Extract text, page titles, or custom schema-based content
๐ย Niche Site Crawlingย โ Explore multiple subpages to locate specific information
๐ง ย Why Itโs Smart:
โ๏ธย Agentic Navigationย โ The AI crawls subpagesย like a human
๐ ๏ธย Modular Toolsย โ Customize schema, prompt, and tools to extract whatย youย need
๐ย Self-Looping Logicย โ It can iterate across linked pages autonomously
๐ง ย LLM Reasoningย โ Understands context to find relevant data, not just keywords
๐งฐย Works with Supabaseย โ Easy to extend into full-stack workflows with SQL or RPC triggers
๐งย Level of Effort:
๐ก Moderate โ Basic setup takes ~15โ20 minutes, advanced config (like custom JSON schema) adds depth
๐งฉย Youโll Need:
๐ OpenAI API Key (for the agentโs reasoning and schema-based extraction)
๐๏ธ Supabase Project (to store input URLs + extracted results)
๐ง Prompt + JSON Schema (tell the agent what data to return)
๐ Website URLs (manually or from a trigger-based source like Airtable, Notion, CSV)
๐กย Customization Tip:
Want to extract contact infoย andย blog post metadata? Duplicate the crawler and tweak its schema. You can even use the outputs to auto-populate CRMs, send emails, or track site changes via alerts.
โกย Give Your Bots a Brain โ Crawl Smarter, Not Harder
From static pages to structured insights โ let autonomous AI agents crawl the web for you.