xAI just launched Voice Agent Builder in beta, and it is the most direct signal yet that the company is coming for the enterprise voice market. This is not another API wrapper around Grok Voice. It is a full platform: telephony, knowledge retrieval, tools, guardrails, MCP support, and observability, all in one interface.
Benji Taylor, who leads design at SpaceX AI, posted about it this morning on X. The official announcement went live on x.ai shortly after.
What It Does
Voice Agent Builder is a no-code platform for configuring production voice agents on Grok Voice. You write a plain-language description of how calls should flow, upload your documents, attach tools and guardrails, and you have a working voice agent in about two minutes.
The agent gets a free phone number out of the box. It can call, receive, transfer to a human, or end calls cleanly. Every call is recorded and transcribed.
One Voice Stack, Not Three
Most voice agents today are stitched together from three separate APIs: one for speech-to-text, one for the language model, and one for text-to-speech. Each hop adds cost, latency, and a new surface for things to go wrong.
Voice Agent Builder runs on Grok Voice’s speech-to-speech path. One interface, one pricing meter. The model handles the full voice pipeline natively.
Pricing That Actually Makes Sense
This is where xAI is being unusually clear. Voice Agent Builder is billed at $0.05 per minute of audio, with voices included and no separate platform fee. Telephony on a free provisioned number adds $0.01 per minute.
Compared to the typical voice stack where you pay for STT, the model, and TTS separately (each with its own per-minute or per-token meter), the single-meter approach is refreshing. One number. Multiply by call volume. Done.
The Benchmark Numbers
xAI published τ-voice Bench results comparing Grok Voice Think Fast 1.0 against the two main competitors in the real-time voice space:
- Grok Voice Think Fast 1.0: 67.3%
- Gemini 3.1 Flash Live: 43.8%
- GPT Realtime 1.5: 35.3%
That is a wide gap. Grok Voice leads by more than 20 points over Gemini and nearly 30 over GPT Realtime. The benchmark is designed to test real-world call conditions: low-quality telephony audio, background noise, strong accents, interruptions, and callers changing their mind mid-sentence.
What Grok Voice Agent Builder Ships With
The platform ships with a surprising amount of infrastructure for a beta:
- Knowledge base with document upload (text, Markdown, Word, PowerPoint, Excel, HTML, JSON)
- Collections to organize documents and share across agents
- Tools and connectors for Google Calendar, Outlook, email, web search, X search, Linear, Notion, Google Drive, and OneDrive
- MCP support for custom tool integration
- 80+ built-in voices plus voice cloning from a 2-minute recording
- Guardrails to prevent the agent from saying or doing things it should not
- SIP support to bring your own phone number
- WebSocket for custom client integration
The MCP support is particularly interesting given that xAI competitor X just launched hosted MCP servers last week. The voice agent builder accepting MCPs means you can wire the same tool ecosystem into voice calls.
Who This Is For
xAI is positioning this at operators and developers who want high-volume production voice agents without building the surrounding stack from scratch. Customer support, sales calls, booking lines, and phone-based workflows that currently require a developer team and a Twilio subscription are the obvious beachhead. The platform builds on the same Grok 4.5 infrastructure SpaceX and Tesla are already testing in private beta.
The no-code prompt interface means a support operations manager could build and deploy a voice agent without engineering. The tools, SIP, and WebSocket support mean a developer can extend it however they want.
The Bottom Line
The Voice Agent Builder is the most complete voice agent platform xAI has shipped. The pricing is transparent, the feature set is broad for a beta, and the τ-voice Bench numbers give it a credible claim to being the best real-time voice model on the market.
Whether that holds up under real call center traffic is another question, but the platform itself is well-designed. Two minutes to a working phone agent with no code is a strong demo.




