Transcription

Find anything in a picture.

No Reviews
0 Order in queue
394 Views

Delivery Time 1-3 Days
Response Time 1 Day
English Level Basic level

Description

This n8n template showcases how to harness the power of Google Gemini 2.0's prompt-based Bounding Box detection — unlocking a smarter way to detect objects in images using natural language.

Instead of pre-trained object classes, you can now ask things like:
🔍 “Draw a box around adults holding children”
🚗 “Detect all improperly parked cars”
🐰 “Find all bunnies in this image”

It’s contextual, flexible, and requires zero model training.

⚙️ How It Works
🖼️ Image is fetched via HTTP request
📏 Width & height are extracted using the “Edit Image” node
🧠 Image + prompt sent to Gemini 2.0 Vision API
🧮 Bounding box coordinates returned based on prompt
📐 Coordinates re-scaled to match original image size
🖊️ Boxes drawn back onto image using “Edit Image” for visualization

💡 Use Case Ideas
🛡️ Smart content moderation (e.g. detect @s, alcohol, etc.)
🛒 Retail shelf monitoring (e.g. highlight missing products)
📸 Security footage analysis (e.g. detect crowd clusters)
🖥️ UX feedback review (e.g. locate logos/UI issues in screenshots)

🔧 Requirements
✅ Google Gemini 2.0 with Vision API access
✅ Any image input (via URL or upload)
✅ n8n instance with HTTP + Edit Image nodes

🧪 Customization Tips
This is a demo of Gemini 2.0's experimental feature — tweak as you like:
✍️ Change prompts (e.g. “highlight dogs wearing clothes”)
📤 Swap image sources (uploads, Telegram, forms, etc.)
🗂️ Customize output (upload to Drive, send on email/Telegram)

⚠️ Note: Production use not advised until Gemini 2.0 Vision becomes stable

Project Link:-[https://preview--gemini-vision-prompt-boxes.lovable.app/]