Agent GridAgent Grid Docs
Guides

Voice mode

Talk to a Claude pane over a real-time voice call backed by OpenAI's Realtime API.

Voice mode lets you have a spoken conversation with a Claude pane — push-to-talk, captions, mute. It's a live channel into the same session you'd otherwise be typing into.

Starting a call

The voice overlay is opened by the Phone / Call button in a Claude pane header, not by a global shortcut. Click it and the overlay slides in, connects to the OpenAI Realtime backend, and starts streaming.

CmdMCtrlM does not open or close the call overlay. Inside the open overlay it toggles mute — it is not a global toggle. Open a call from the Phone button in a Claude pane header.

The Call button lives next to the other pane actions on the Claude pane (and Claude worker pane) header. It's disabled when the pane has no transcript yet (start a conversation first) or when another pane is already on a call.

One call at a time globally

Only one call can be active across Agent Grid at any time. If a call is live on one Claude pane, the Call button on every other pane is disabled with a "Another pane is on a call" tooltip. Hang up first to start a new call elsewhere.

Push-to-talk

The default push-to-talk binding is bare Left AltAlt — hold it to transmit, release to stop. It's an in-app key binding, so Agent Grid must be the focused window for it to fire.

The binding is fully rebindable from Settings:

  • Any single key works — letters, digits, F-keys, modifiers, punctuation, navigation cluster, numpad.
  • Left and right modifier variants are distinguished (e.g. Left Alt vs Right Alt).

On Wayland sessions, push-to-talk may miss events when focus is on a native-Wayland window. If you notice your hold isn't transmitting, click the dictation indicator manually as a fallback — or move focus into an Agent Grid window before holding the key.

In-overlay controls

Once the overlay is open and the call is active:

KeyAction
CmdMCtrlMToggle mute (mic on/off). Scoped to overlay focus — not a global hijack.
EscEscDismiss the overlay (hang up).

The overlay also shows rolling captions, a mic-level waveform, a mute pill, and a hangup button. You can copy a transcript line straight from the captions panel.

What the agent hears

When you start a call, Agent Grid resolves the pane's transcript on-demand right before connecting, so the agent's context reflects the latest turn. If the transcript fetch fails, the call still starts — the agent will greet and ask. The conversation is bound to that specific pane: you're talking to the same Claude session you'd otherwise be typing into.

On this page