[ElevenLabs](https://elevenlabs.io/) provides AI-powered speech-to-text and text-to-speech APIs for creating natural-sounding voice applications with advanced features like voice cloning and streaming synthesis.
ElevenLabs provides AI-powered speech-to-text and text-to-speech APIs for creating natural-sounding voice applications with advanced features like voice cloning and streaming synthesis.
This cookbook demonstrates how to build a low-latency voice assistant by combining ElevenLabs' speech processing with Claude's intelligent responses, progressively optimizing for real-time performance.
Low Latency Voice Assistant Notebook - An interactive tutorial that walks you through building a voice assistant step-by-step, demonstrating various optimization techniques to minimize latency through streaming.
WebSocket Streaming Script - A production-ready conversational voice assistant featuring continuous microphone input, gapless audio playback, and the lowest possible latency using WebSocket streaming.
We recommend following this sequence to get the most out of this cookbook:
Create a virtual environment:
# Navigate to the ElevenLabs directory
cd /path/to/claude-cookbooks/third_party/ElevenLabs
# Create virtual environment
python -m venv venv
# Activate it
source venv/bin/activate # On macOS/Linux
# OR
venv\Scripts\activate # On Windows
Get your API keys:
ElevenLabs API key: elevenlabs.io/app/developers/api-keys
When creating your API key, ensure it has the following minimum permissions:
Anthropic API key: console.anthropic.com/settings/keys
Configure your environment:
cp .env.example .env
Edit .env and add your API keys:
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
ANTHROPIC_API_KEY=sk-ant-api03-...
Install dependencies:
# With venv activated
pip install -r requirements.txt
Start with the Low Latency Voice Assistant Notebook. This interactive guide will teach you:
The notebook includes performance metrics and comparisons at each step, helping you understand the impact of each optimization.
After understanding the concepts from the notebook, run the WebSocket Streaming Script to experience a fully functional voice assistant:
python stream_voice_assistant_websocket.py
How it works:
The script demonstrates production-ready implementations of:
Symptom: You may occasionally hear brief pops, clicks, or audio dropouts during playback.
Explanation:
This occurs because the script uses MP3 format audio, which is required for the ElevenLabs free tier. When streaming MP3 data in real-time chunks, FFmpeg occasionally receives incomplete frames that cannot be decoded. This typically happens:
The script automatically handles these failed chunks by skipping them (using a try-except pattern in the audio decoding logic), which prevents errors from appearing in the console but may result in brief audio gaps that manifest as pops or clicks.
Impact:
Solution:
This is expected behavior when using MP3 format on the free tier. If you want to eliminate audio popping entirely:
pcm_44100 format instead of MP3Symptom: AssertionError: ELEVENLABS_API_KEY is not set or AssertionError: ANTHROPIC_API_KEY is not set
Solution:
.env.example to .env: cp .env.example .env.env and ensure both API keys are set correctlySymptom: Errors like ImportError: PortAudio library not found or audio playback failures
Solution:
macOS:
brew install portaudio ffmpeg
Ubuntu/Debian:
sudo apt-get install portaudio19-dev ffmpeg
Windows:
Then reinstall Python dependencies:
pip install -r requirements.txt
Symptom: OSError: [Errno -9999] Unanticipated host error or microphone not accessible
Solution:
audio group: sudo usermod -a -G audio $USER (then log out and back in)Test your microphone setup:
python -c "import sounddevice as sd; print(sd.query_devices())"
Symptom: Connection errors, timeouts, or stream interruptions
Solution:
If you continue to experience issues, check ElevenLabs Status for service updates.
Once you're comfortable with the voice assistant, here are some inspiring projects you can build:
Meeting Note-Taker - Record and transcribe meetings in real-time, then use Claude to generate summaries, action items, and key takeaways from the conversation.
Language Learning Tutor - Practice conversations in any language with real-time feedback. Claude can correct pronunciation, suggest better phrasing, and adapt difficulty to your skill level.
Interactive Storyteller - Create choose-your-own-adventure games where Claude narrates the story and responds to your spoken choices, with different voice characters for each role.
Hands-Free Coding Assistant - Describe code changes, bugs, or features verbally while keeping your hands on the keyboard. Perfect for rubber duck debugging or pair programming solo.
Voice-Activated Smart Home - Build natural conversation interfaces for controlling home devices. Ask complex questions like "Is it cold enough to turn on the heater?" instead of simple on/off commands.
Personal Voice Journal - Keep a daily journal by speaking your thoughts. Claude can organize entries by theme, track your mood over time, and surface relevant past entries when you need them.
Here are some helpful resources to deepen your understanding: