OpenAI's ChatGPT Bidi 1 Brings Bidirectional Voice Mode

OpenAI is preparing GPT-Bidi-1, a bidirectional voice model built to let ChatGPT listen and speak simultaneously instead of waiting for users to finish before responding. References to the model and its UI elements surfaced in the ChatGPT app starting June 16, 2026, according to TestingCatalog, and early user tests shared on June 23 show the model handling interruptions, switching tasks mid-sentence, and maintaining context across longer conversations.

OpenAI has not officially announced the model. But signs across both web and mobile platforms suggest a consumer rollout could begin within days, with a subset of ChatGPT app users already receiving access.

The name “Bidi” stands for bidirectional, referring to the model’s ability to process incoming and outgoing audio at the same time. ChatGPT’s current voice sessions run on GPT-4o, which operates on a half-duplex design. That means the system stops completely the moment a user begins speaking.

Bidi 1 changes that in several ways, based on early testing reported by Android Authority and TestingCatalog:

A visual indicator accompanies the change. The voice mode bubble turns yellow when Bidi 1 is active, replacing the current blue interface. Internal code describes the model as a “major leap in intelligence” and “the next generation of Voice.”

The bidirectional audio itself is significant, but the more important shift may be the three intelligence tiers that come with Bidi 1. These are labeled High, Medium, and Instant.

OpenAI already uses a similar tiered approach on its text side, where users pick between faster but lighter models and slower but more capable ones. Extending that logic to voice means users could select Instant for quick answers during a commute and switch to High when they need deeper reasoning in a hands-free work session.

No competing voice assistant currently offers this kind of selectable depth within a single voice interface. Google’s Gemini Live, which already supports bidirectional conversation, runs without tiered intelligence options on the consumer side. This suggests OpenAI is positioning Bidi 1 not just as a voice quality upgrade but as a flexible tool that adapts to different use cases within the same conversation.

The practical impact depends on how each tier performs. If High delivers reasoning comparable to GPT-5.5 level text capabilities while maintaining real-time audio, it would represent a meaningful step beyond what any voice assistant offers today. If the tiers mostly affect response latency with minimal quality difference, the feature becomes less compelling. OpenAI has not disclosed technical details about what powers each tier.

Bidi 1 does not exist in isolation. It arrives during OpenAI’s largest ChatGPT overhaul since launch, a redesign that transforms the platform into a super app combining Codex coding tools, AI agents, image generation, and third-party integrations ahead of a planned 2026 IPO.

The Financial Times reported in early June that OpenAI views voice as the dominant interface for how users will interact with AI in the future. That framing explains why the company is investing in a purpose-built bidirectional model rather than continuing to adapt GPT-4o for voice.

OpenAI’s text models raced ahead to the GPT-5.5 generation while the voice layer stayed on older architecture. Bidi 1 appears designed to close that gap. For ChatGPT’s 900 million weekly active users, this upgrade could change how a significant portion of them interact with the app. Voice that feels natural enough for extended use, combined with agents that complete tasks and coding tools that execute in the background, moves ChatGPT closer to the always-on assistant OpenAI has described in product roadmap discussions.

There is also a hardware angle. OpenAI is reportedly developing audio-first hardware products, and any device where speech is the primary interface would need a voice layer substantially better than what GPT-4o currently provides.

Google’s Gemini Live has supported bidirectional voice conversations since its native audio models rolled out in late 2025. It handles interruptions, maintains conversational flow, and works natively across Android devices and the Gemini app.

This means Bidi 1 is OpenAI closing a gap rather than opening one. The bidirectional capability itself is not new to the market. Google’s developer documentation already describes its Live API as enabling “low-latency bidirectional voice and video interactions” with features including interruption handling, multi-turn context, and proactive audio.

What may differentiate Bidi 1 is the combination of selectable intelligence tiers, deeper integration with ChatGPT’s expanding tool ecosystem, and the reported improvements to long-conversation context retention. Gemini Live connects well with Google’s app ecosystem but does not offer users a way to choose reasoning depth per query.

On the other side, Anthropic’s Claude also has voice capabilities but currently operates on a turn-based system without bidirectional audio. If OpenAI ships Bidi 1 successfully, it would match Gemini Live’s core capability while adding features that neither Google nor Anthropic currently offer in voice mode.

OpenAI has not confirmed a launch date, and the final model name may change before public release. However, several signals point to an imminent rollout:

A gradual, opt-in release across platforms appears most likely, with Bidi 1 sitting alongside the existing Advanced Voice Mode rather than replacing it immediately. Users in the European Economic Area may face a longer wait, though this has not been confirmed.

Bidi 1 represents OpenAI’s clearest signal yet that voice will become a primary interface for ChatGPT, not a secondary feature bolted onto text. Whether the intelligence tiers deliver a genuine difference in reasoning quality or mostly adjust speed will determine if this is a competitive leap or just a necessary catch-up to Google.

Bidi is short for bidirectional, meaning the model processes audio in both directions simultaneously. It listens to the user while speaking its response, unlike the current voice mode which pauses entirely when it detects user input. The model was first identified by its internal label GPT-Bidi-1.

No official release date has been announced, but Bidi 1 has already reached a subset of app users and web release preparations were spotted on June 23, 2026. Multiple signs indicate a gradual opt-in rollout across platforms could begin within days.

Bidi 1 offers three settings called High, Medium, and Instant. High prioritizes deeper reasoning, Instant prioritizes response speed, and Medium balances both. This lets users choose voice response quality based on their task, mirroring how ChatGPT’s text models already work.

Yes. Google launched Gemini Live with full bidirectional audio support using its native audio models in late 2025. Bidi 1 matches that core capability but adds selectable intelligence tiers and deeper ChatGPT ecosystem integration that Gemini Live does not currently offer.

Ler artigo original em Memeburn