macOS · Fully Local · Open Source · Early Access

Voice Coding,
not Vibe Coding

Your AI agent doesn't just listen — it talks back.
Voice layer runs fully local. Open source. Zero cloud fees.

It codes. It talks. You decide.

Other voice tools are a monologue — you dictate, the machine types. HeyVox is a conversation. Your agent reads the code, does the work, and tells you what happened — like a colleague sitting next to you.

You choose the detail level: full response, a concise summary, or just the key facts. If something sounds off, pull up the full diff. If it sounds right, keep going — hands-free.

The voice layer is completely free — no API keys, no per-minute billing. STT and TTS run locally on your Mac. You bring your own AI agent.

1

Agent works, then speaks

Claude writes code, runs tests — then Herald reads you a summary of what changed.

2

You choose the verbosity

Full response, short summary, or one-liner — configurable per message or globally.

3

Sounds good? Keep going

Say your next instruction. No context switching, no reading walls of text.

4

Something off? Review the details

Pull up the full response anytime. You stay in control without losing flow.

Three layers, fully local

Voice in, voice out, and a HUD to see what's happening — all running on your Mac, zero cloud.

🔊

Voice OUT — Herald

Your agent finishes a task and speaks the result via Kokoro TTS. Choose verbosity: full response, summary, or one-liner. Emotional voice switching adapts tone to context. Hush auto-pauses YouTube & Spotify while it speaks.

🎙

Voice IN

Say the wake word (or hold push-to-talk) → HeyVox transcribes locally via MLX Whisper or sherpa-onnx → text is pasted directly into your agent. Works with any app.

🖥

HUD & Menu Bar

Menu bar icon shows state at a glance (idle, recording, transcribing, speaking). Frosted-glass overlay appears during recording with live waveform bars. Recent transcript history in the dropdown.

Works where you work

⚡ Claude Code
💫 Cursor
🌪 Windsurf
🔁 Continue.dev
➕ Any app with a text field

Voice IN works with any app — HeyVox pastes transcribed text into whatever window has focus. Voice OUT via Herald hooks works automatically with Claude Code. Other agents can use MCP (voice_speak) to speak back.

Everything you need, nothing you don't

🔊

Herald TTS Orchestration

Kokoro TTS with multi-part streaming, audio ducking, and workspace-aware queue. Your agent speaks, you listen.

📊

Configurable Verbosity

Full response, summary, or one-liner. Per-message or global. Hear what matters, skip what doesn't.

🎭

Emotional Voice Switching

Detects mood in text (alert, cheerful, thoughtful) and picks the right voice. Auto-switches languages too.

Hush Media Control

Chrome extension pauses YouTube & Spotify during TTS. Falls back to MediaRemote for native apps.

👁

Wake Word Detection

Powered by openwakeword. Always listening locally, never to the cloud.

🗣

Local STT

MLX Whisper on Apple Silicon. sherpa-onnx fallback on Intel Macs. Fast, accurate, offline.

🎮

Push-to-Talk

Configurable key binding. Hold to record, release to transcribe.

HUD & Menu Bar

Menu bar icon with state indicator. Frosted-glass pill overlay during recording. Transcript history in dropdown.

🔧

MCP + Claude Hooks

4 MCP tools for any agent. Herald hooks for automatic Claude Code TTS. One heyvox setup wires everything.

💰

Free Voice Layer

No API keys for STT or TTS. No per-minute voice billing. Open source, runs on your hardware.

🚀

Auto-Start via launchd

Runs as a macOS launch agent. Always ready when you open your Mac.

🛡

Self-Healing

Dead mic recovery in 30s. Memory watchdog auto-restarts at 1 GB. STT timeout prevents hangs.

All processing runs on your machine. No audio ever leaves.

HeyVox was designed privacy-first from day one. Zero telemetry. Zero cloud APIs for voice. No per-minute TTS or STT billing. Your voice data never leaves your machine.

openwakeword
On-device wake word detection, runs in real time
MLX Whisper
Apple Silicon speech recognition, fully offline
Kokoro TTS
Local neural text-to-speech, no API keys needed
sherpa-onnx
Intel fallback STT engine, also fully on-device

Choosing the right mic

Your microphone has a bigger impact on accuracy than the STT model.

Best

2.4 GHz USB Wireless Headsets

USB dongle headsets (Logitech, Jabra, EPOS) use their own USB audio device. Full-quality mic channel at all times, no OS audio switching, no echo issues.

Good

Wired Headset / Built-in Mic & Speakers

Wired headsets, the built-in Mac microphone, and built-in speakers all work reliably. Echo suppression mutes the mic during TTS when using speakers without a headset.

Good

Bluetooth for Playback + Built-in Mic

Use Bluetooth headphones for TTS playback (A2DP, full quality) while using the built-in Mac mic for voice input. Best of both worlds.

Not Recommended

Bluetooth Mic Mode

When a Bluetooth headset's mic is active, macOS switches to HFP mode, dropping audio to 8–16 kHz mono. Use the built-in mic instead.

Join the beta

HeyVox is in early access. We’re onboarding testers who use AI coding agents daily and want a real voice workflow. Tell us what agent you use and we’ll get you set up.

Request Access