HeyVox — Voice Coding, not Vibe Coding

Pair programming, out loud

It codes. It talks. You decide.

Other voice tools are a monologue — you dictate, the machine types. HeyVox is a conversation. Your agent reads the code, does the work, and tells you what happened — like a colleague sitting next to you.

You choose the detail level: full response, a concise summary, or just the key facts. If something sounds off, pull up the full diff. If it sounds right, keep going — hands-free.

The voice layer is completely free — no API keys, no per-minute billing. STT and TTS run locally on your Mac. You bring your own AI agent.

Agent works, then speaks

Claude writes code, runs tests — then Herald reads you a summary of what changed.

You choose the verbosity

Full response, short summary, or one-liner — configurable per message or globally.

Sounds good? Keep going

Say your next instruction. No context switching, no reading walls of text.

Something off? Review the details

Pull up the full response anytime. You stay in control without losing flow.

How it works

Three layers, fully local

Voice in, voice out, and a HUD to see what's happening — all running on your Mac, zero cloud.

🔊

Voice OUT — Herald

Your agent finishes a task and speaks the result via Kokoro TTS. Choose verbosity: full response, summary, or one-liner. Emotional voice switching adapts tone to context. Hush auto-pauses YouTube & Spotify while it speaks.

🎙

Voice IN

Say the wake word (or hold push-to-talk) → HeyVox transcribes locally via MLX Whisper or sherpa-onnx → text is pasted directly into your agent. Works with any app.

🖥

HUD & Menu Bar

Menu bar icon shows state at a glance (idle, recording, transcribing, speaking). Frosted-glass overlay appears during recording with live waveform bars. Recent transcript history in the dropdown.

Supported agents

Works where you work

⚡ Claude Code

💫 Cursor

🌪 Windsurf

🔁 Continue.dev

➕ Any app with a text field

Voice IN works with any app — HeyVox pastes transcribed text into whatever window has focus. Voice OUT via Herald hooks works automatically with Claude Code. Other agents can use MCP (voice_speak) to speak back.

Features

Everything you need, nothing you don't

🔊

Herald TTS Orchestration

Kokoro TTS on Metal GPU via mlx-audio. Multi-part streaming, audio ducking, and workspace-aware queue. Your agent speaks, you listen.

📊

Configurable Verbosity

Full response, summary, or one-liner. Per-message or global. Hear what matters, skip what doesn't.

🎭

Emotional Voice Switching

Detects mood in text (alert, cheerful, thoughtful) and picks the right voice. Auto-switches languages too.

⏸

Hush Media Control

Chrome extension pauses YouTube & Spotify during TTS. Falls back to MediaRemote for native apps.

👁

Wake Word Detection

🗣

Local STT

MLX Whisper on Apple Silicon. sherpa-onnx fallback on Intel Macs. Fast, accurate, offline.

🎮

Push-to-Talk

Configurable key binding. Hold to record, release to transcribe.

✨

HUD & Menu Bar

Menu bar icon with state indicator. Frosted-glass pill overlay during recording. Transcript history in dropdown.

🔧

MCP + Claude Hooks

4 MCP tools for any agent. Herald hooks for automatic Claude Code TTS. One heyvox setup wires everything.

💰

Free Voice Layer

No API keys for STT or TTS. No per-minute voice billing. Open source, runs on your hardware.

🚀

Auto-Start via launchd

Runs as a macOS launch agent. Always ready when you open your Mac.

🛡

Self-Healing

Dead mic recovery in 10s with exponential backoff. Bluetooth device filtering via CoreAudio. Memory watchdog auto-restarts at 1 GB.

Install (beta testers)

Up and running in minutes

For early access testers. Requires macOS 14+, Python 3.12+, and Apple Silicon (Intel supported via sherpa-onnx).

# 1. Install system dependency
brew install portaudio

# 2. Clone and install
git clone https://github.com/heyvox-dev/heyvox.git
cd heyvox
pip install -e ".[apple-silicon,chrome]"

# 3. Run setup wizard
heyvox setup

The setup wizard walks you through permissions (Microphone, Accessibility), model download, mic test, config generation, launchd service, Herald TTS hooks for Claude Code, and MCP server registration — all in one guided flow.

Privacy

All processing runs on your machine. No audio ever leaves.

HeyVox was designed privacy-first from day one. Zero telemetry. Zero cloud APIs for voice. No per-minute TTS or STT billing. Your voice data never leaves your machine.

openwakeword

On-device wake word detection, runs in real time

MLX Whisper

Apple Silicon speech recognition, fully offline

Kokoro TTS

Local neural text-to-speech, no API keys needed

sherpa-onnx

Intel fallback STT engine, also fully on-device

Audio devices

Choosing the right mic

Your microphone has a bigger impact on accuracy than the STT model.

Best

2.4 GHz USB Wireless Headsets

USB dongle headsets (Logitech, Jabra, EPOS) use their own USB audio device. Full-quality mic channel at all times, no OS audio switching, no echo issues.

Good

Wired Headset / Built-in Mic & Speakers

Wired headsets, the built-in Mac microphone, and built-in speakers all work reliably. Echo suppression mutes the mic during TTS when using speakers without a headset.

Good

Bluetooth for Playback + Built-in Mic

Use Bluetooth headphones for TTS playback (A2DP, full quality) while using the built-in Mac mic for voice input. Best of both worlds.

Works

Bluetooth Mic Mode

Bluetooth mic activates HFP mode (reduced quality), but HeyVox handles it: dead device filtering via CoreAudio, silent mic auto-recovery, and echo suppression. USB dongles are still better, but Bluetooth works.

Voice Coding,not Vibe Coding

It codes. It talks. You decide.

Agent works, then speaks

You choose the verbosity

Sounds good? Keep going

Something off? Review the details

Three layers, fully local

Voice OUT — Herald

Voice IN

HUD & Menu Bar

Works where you work

Everything you need, nothing you don't

Herald TTS Orchestration

Configurable Verbosity

Emotional Voice Switching

Hush Media Control

Wake Word Detection

Local STT

Push-to-Talk

HUD & Menu Bar

MCP + Claude Hooks

Free Voice Layer

Auto-Start via launchd

Self-Healing

Up and running in minutes

Choosing the right mic

2.4 GHz USB Wireless Headsets

Wired Headset / Built-in Mic & Speakers

Bluetooth for Playback + Built-in Mic

Bluetooth Mic Mode

Join the beta

Voice Coding,
not Vibe Coding