ElevenLabs <> Claude Cookbooks

ElevenLabs provides AI-powered speech-to-text and text-to-speech APIs for creating natural-sounding voice applications with advanced features like voice cloning and streaming synthesis.

This cookbook demonstrates how to build a low-latency voice assistant by combining ElevenLabs' speech processing with Claude's intelligent responses, progressively optimizing for real-time performance.

What's Included

Low Latency Voice Assistant Notebook - An interactive tutorial that walks you through building a voice assistant step-by-step, demonstrating various optimization techniques to minimize latency through streaming.
WebSocket Streaming Script - A production-ready conversational voice assistant featuring continuous microphone input, gapless audio playback, and the lowest possible latency using WebSocket streaming.

How to Use This Cookbook

We recommend following this sequence to get the most out of this cookbook:

Step 1: Set Up Your Environment

Create a virtual environment:

# Navigate to the ElevenLabs directory
cd /path/to/claude-cookbooks/third_party/ElevenLabs

# Create virtual environment
python -m venv venv

# Activate it
source venv/bin/activate  # On macOS/Linux
# OR
venv\Scripts\activate     # On Windows

Get your API keys:
- ElevenLabs API key: elevenlabs.io/app/developers/api-keys
  
  When creating your API key, ensure it has the following minimum permissions:
  - Text to speech
  - Speech to text
  - Read access on voices
  - Read access on models
- Anthropic API key: console.anthropic.com/settings/keys

Configure your environment:

cp .env.example .env

Edit .env and add your API keys:

ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
ANTHROPIC_API_KEY=sk-ant-api03-...

Install dependencies:

# With venv activated
pip install -r requirements.txt

Step 2: Work Through the Notebook

Start with the Low Latency Voice Assistant Notebook. This interactive guide will teach you:

How to use ElevenLabs for speech-to-text transcription
How to generate Claude responses and measure latency
How streaming reduces time-to-first-token
How to stream text-to-speech for faster audio playback
The tradeoffs between different streaming approaches
Why WebSocket streaming provides the best balance of latency and quality

The notebook includes performance metrics and comparisons at each step, helping you understand the impact of each optimization.

Step 3: Try the Production Script

After understanding the concepts from the notebook, run the WebSocket Streaming Script to experience a fully functional voice assistant:

python stream_voice_assistant_websocket.py

How it works:

Press Enter to start recording
Speak your question into the microphone
Press Enter to stop recording
The assistant will respond with natural speech
Repeat or press Ctrl+C to exit

The script demonstrates production-ready implementations of:

Real-time microphone recording with sounddevice
Continuous conversation with context retention
WebSocket-based streaming for minimal latency
Custom audio queue for seamless playback

Troubleshooting

Audio Popping or Crackling

Symptom: You may occasionally hear brief pops, clicks, or audio dropouts during playback.

Explanation:

This occurs because the script uses MP3 format audio, which is required for the ElevenLabs free tier. When streaming MP3 data in real-time chunks, FFmpeg occasionally receives incomplete frames that cannot be decoded. This typically happens:

At the start of streaming (first chunk may be too small)
During brief network delays
At the end of audio generation (final chunk may be partial)

The script automatically handles these failed chunks by skipping them (using a try-except pattern in the audio decoding logic), which prevents errors from appearing in the console but may result in brief audio gaps that manifest as pops or clicks.

Impact:

Audio playback continues normally
Brief pops or clicks are usually imperceptible or minor
The WebSocket connection remains stable
No functionality is lost

Solution:

This is expected behavior when using MP3 format on the free tier. If you want to eliminate audio popping entirely:

Upgrade to a paid ElevenLabs tier
Modify the script to use pcm_44100 format instead of MP3
PCM format provides cleaner streaming without decoding issues

API Key Issues

Symptom: AssertionError: ELEVENLABS_API_KEY is not set or AssertionError: ANTHROPIC_API_KEY is not set

Solution:

Verify you've copied .env.example to .env: cp .env.example .env
Edit .env and ensure both API keys are set correctly
Check for typos or extra spaces in your API keys
Confirm your ElevenLabs key has the required permissions (see Step 1)

Dependency Issues

Symptom: Errors like ImportError: PortAudio library not found or audio playback failures

Solution:

macOS:

brew install portaudio ffmpeg

Ubuntu/Debian:

sudo apt-get install portaudio19-dev ffmpeg

Windows:

Install FFmpeg from ffmpeg.org
Add FFmpeg to your system PATH
PortAudio typically installs automatically with sounddevice on Windows

Then reinstall Python dependencies:

pip install -r requirements.txt

Microphone Permissions

Symptom: OSError: [Errno -9999] Unanticipated host error or microphone not accessible

Solution:

macOS: Go to System Preferences → Security & Privacy → Privacy → Microphone, and enable Terminal (or your Python IDE)
Windows: Go to Settings → Privacy → Microphone, and enable microphone access for Python/Terminal
Linux: Check your user is in the audio group: sudo usermod -a -G audio $USER (then log out and back in)

Test your microphone setup:

python -c "import sounddevice as sd; print(sd.query_devices())"

WebSocket Connection Failures

Symptom: Connection errors, timeouts, or stream interruptions

Solution:

Check your internet connection is stable
Verify firewall isn't blocking WebSocket connections (port 443)
Try disabling VPN or proxy temporarily
Ensure you're not exceeding API rate limits (see ElevenLabs dashboard for usage)

If you continue to experience issues, check ElevenLabs Status for service updates.

Project Ideas

Once you're comfortable with the voice assistant, here are some inspiring projects you can build:

Meeting Note-Taker - Record and transcribe meetings in real-time, then use Claude to generate summaries, action items, and key takeaways from the conversation.
Language Learning Tutor - Practice conversations in any language with real-time feedback. Claude can correct pronunciation, suggest better phrasing, and adapt difficulty to your skill level.
Interactive Storyteller - Create choose-your-own-adventure games where Claude narrates the story and responds to your spoken choices, with different voice characters for each role.
Hands-Free Coding Assistant - Describe code changes, bugs, or features verbally while keeping your hands on the keyboard. Perfect for rubber duck debugging or pair programming solo.
Voice-Activated Smart Home - Build natural conversation interfaces for controlling home devices. Ask complex questions like "Is it cold enough to turn on the heater?" instead of simple on/off commands.
Personal Voice Journal - Keep a daily journal by speaking your thoughts. Claude can organize entries by theme, track your mood over time, and surface relevant past entries when you need them.

More About ElevenLabs

Here are some helpful resources to deepen your understanding:

ElevenLabs Platform - Official website
API Documentation - Complete API reference
Voice Library - Explore available voices
API Playground - Test voices interactively
Python SDK - Official Python SDK

ElevenLabs Claude Cookbooks

Install

Description

Trust Signals

Links

ElevenLabs <> Claude Cookbooks

What's Included

How to Use This Cookbook

Step 1: Set Up Your Environment

Step 2: Work Through the Notebook

Step 3: Try the Production Script

Troubleshooting

Audio Popping or Crackling

API Key Issues

Dependency Issues

Microphone Permissions

WebSocket Connection Failures

Project Ideas

More About ElevenLabs