Description
Turn voice messages into summarized, translated text or images — all through your Telegram bot
🎯 No more transcribing or translating manually — this automation uses OpenAI Whisper & GPT-4 to process voice messages and send back smart, actionable responses
🧠 What Problem Does It Solve?
Listening to and managing voice messages is inefficient and hard to organize. This workflow automates the process by transcribing, summarizing, translating, and replying — in seconds
It handles:
🔹 Voice-to-text transcription
🔹 English + Hindi translation
🔹 Summarization using GPT-4
🔹 Optional HTML or image generation
⚙️ How It Works
🎤 Detects voice messages sent to your Telegram Bot
📥 Downloads .oga voice files from Telegram
🔁 Converts .oga to .mp3 using FFmpeg
🧾 Transcribes audio to text using OpenAI Whisper
🧠 Sends transcription to GPT-4 for summarization and translation
🌐 Detects commands like “generate image” or “make HTML”
🖼️ Uses DALL·E to generate visuals if requested
📤 Sends reply back to Telegram chat with summaries, translations, or media
✨ Key Features
💬 Works with all voice messages under 1 minute
🧠 Uses GPT-4 to analyze, translate, and summarize speech
🗂️ Multi-language support (Hindi + English)
🖼️ Supports DALL·E for AI-generated image responses
📩 Full conversation happens inside Telegram
🛠️ Error handling + fallback in case of failed transcription
🧰 What You Need
✅ Telegram Bot token
✅ n8n instance (self-hosted or cloud)
✅ FFmpeg installed in your environment
✅ OpenAI API key with Whisper + GPT-4 access
✅ (Optional) DALL·E access for image generation
🧪 Setup Instructions
🔗 Connect your Telegram Bot to n8n
🛠️ Add your OpenAI credentials in HTTP Request nodes
📂 Ensure FFmpeg is accessible to n8n (for .oga → .mp3)
📌 Add Telegram Trigger node to listen to voice messages
🧠 Add Whisper transcription + GPT-4 summarization logic
🖼️ Optional: Add conditional nodes for HTML/image generation
🧪 Test with short audio messages (<60 seconds)
📤 Reply using Telegram Send Message node
🔌 Integrations
Telegram Bot
n8n
OpenAI Whisper
OpenAI GPT-4
FFmpeg
(Optional) DALL·E