Your AI agent doesn't just listen — it talks back.
Voice layer runs fully local. Open source. Zero cloud fees.
Other voice tools are a monologue — you dictate, the machine types. HeyVox is a conversation. Your agent reads the code, does the work, and tells you what happened — like a colleague sitting next to you.
You choose the detail level: full response, a concise summary, or just the key facts. If something sounds off, pull up the full diff. If it sounds right, keep going — hands-free.
The voice layer is completely free — no API keys, no per-minute billing. STT and TTS run locally on your Mac. You bring your own AI agent.
Claude writes code, runs tests — then Herald reads you a summary of what changed.
Full response, short summary, or one-liner — configurable per message or globally.
Say your next instruction. No context switching, no reading walls of text.
Pull up the full response anytime. You stay in control without losing flow.
Voice in, voice out, and a HUD to see what's happening — all running on your Mac, zero cloud.
Your agent finishes a task and speaks the result via Kokoro TTS. Choose verbosity: full response, summary, or one-liner. Emotional voice switching adapts tone to context. Hush auto-pauses YouTube & Spotify while it speaks.
Say the wake word (or hold push-to-talk) → HeyVox transcribes locally via
MLX Whisper or sherpa-onnx → text is pasted
directly into your agent. Works with any app.
Menu bar icon shows state at a glance (idle, recording, transcribing, speaking). Frosted-glass overlay appears during recording with live waveform bars. Recent transcript history in the dropdown.
Voice IN works with any app — HeyVox pastes transcribed text
into whatever window has focus. Voice OUT via Herald hooks works automatically with Claude Code.
Other agents can use MCP (voice_speak) to speak back.
Kokoro TTS with multi-part streaming, audio ducking, and workspace-aware queue. Your agent speaks, you listen.
Full response, summary, or one-liner. Per-message or global. Hear what matters, skip what doesn't.
Detects mood in text (alert, cheerful, thoughtful) and picks the right voice. Auto-switches languages too.
Chrome extension pauses YouTube & Spotify during TTS. Falls back to MediaRemote for native apps.
Powered by openwakeword. Always listening locally, never to the cloud.
MLX Whisper on Apple Silicon. sherpa-onnx fallback on Intel Macs. Fast, accurate, offline.
Configurable key binding. Hold to record, release to transcribe.
Menu bar icon with state indicator. Frosted-glass pill overlay during recording. Transcript history in dropdown.
4 MCP tools for any agent. Herald hooks for automatic Claude Code TTS. One heyvox setup wires everything.
No API keys for STT or TTS. No per-minute voice billing. Open source, runs on your hardware.
Runs as a macOS launch agent. Always ready when you open your Mac.
Dead mic recovery in 30s. Memory watchdog auto-restarts at 1 GB. STT timeout prevents hangs.
Privacy
All processing runs on your machine. No audio ever leaves.
HeyVox was designed privacy-first from day one. Zero telemetry. Zero cloud APIs for voice. No per-minute TTS or STT billing. Your voice data never leaves your machine.
Your microphone has a bigger impact on accuracy than the STT model.
USB dongle headsets (Logitech, Jabra, EPOS) use their own USB audio device. Full-quality mic channel at all times, no OS audio switching, no echo issues.
Wired headsets, the built-in Mac microphone, and built-in speakers all work reliably. Echo suppression mutes the mic during TTS when using speakers without a headset.
Use Bluetooth headphones for TTS playback (A2DP, full quality) while using the built-in Mac mic for voice input. Best of both worlds.
When a Bluetooth headset's mic is active, macOS switches to HFP mode, dropping audio to 8–16 kHz mono. Use the built-in mic instead.
HeyVox is in early access. We’re onboarding testers who use AI coding agents daily and want a real voice workflow. Tell us what agent you use and we’ll get you set up.
Request Access