Feat(ai-gemini): Gemini Realtime Adapter by nikas-belogolov · Pull Request #405 · TanStack/ai

nikas-belogolov · 2026-03-29T10:19:11Z

🎯 Changes

Added realtime ephemeral token generation

✅ Checklist

I have followed the steps in the Contributing guide.
I have tested this code locally with pnpm run test:pr.

🚀 Release Impact

This change affects published code, and I have generated a changeset.
This change is docs/CI/dev-only (no release).

Summary by CodeRabbit

New Features
- Gemini realtime provider added and selectable in the UI for live audio conversations with voice support.
- Tools UI now enabled for Gemini as well as OpenAI.
- Realtime sessions support provider-specific options to customize behavior (voice, modalities, etc.).
- Microphone capture, audio playback, interruption controls, and basic audio visualization available in realtime sessions.
- Tool configuration can include an output schema for richer tool responses.

coderabbitai · 2026-03-29T10:19:18Z

📝 Walkthrough

Walkthrough

Adds Gemini realtime support: new ai-gemini realtime token/adapter/types, exports, client-side integration in examples, updates shared realtime APIs (connect signature, session config), and small package.json and openai/elevenlabs adapter adjustments to accept session config.

Changes

Cohort / File(s)	Summary
Changeset `.changeset/huge-lizards-admire.md`	Adds a changeset entry bumping `@tanstack/ai-gemini` with a minor release and note about Gemini Realtime Adapter.
Example app changes `examples/ts-react-chat/src/lib/use-realtime.ts`, `examples/ts-react-chat/src/routes/realtime.tsx`	Add `'gemini'` provider option, extend provider types, wire geminiRealtime/geminiRealtimeToken in hook and UI, add voice typing and default voice selection for Gemini.
ai-client runtime/types `packages/typescript/ai-client/src/realtime-client.ts`, `packages/typescript/ai-client/src/realtime-types.ts`	Introduce session config argument to adapter connect call, pass rich RealtimeSessionConfig into connect, and add optional `providerOptions` to client options.
ai package shared types `packages/typescript/ai/src/realtime/types.ts`	Add `outputSchema?: Record<string, any>` to RealtimeToolConfig and new `'go_away'` realtime event/payload.
ai-gemini package metadata `packages/typescript/ai-gemini/package.json`	Bump `@google/genai`, add `@tanstack/ai-client` as peer/dev dependency, bump `vite`.
ai-gemini public exports `packages/typescript/ai-gemini/src/index.ts`, `packages/typescript/ai-gemini/src/realtime/index.ts`	Export `geminiRealtime`, `geminiRealtimeToken`, and related realtime types from package root and realtime entrypoint.
ai-gemini realtime implementation `packages/typescript/ai-gemini/src/realtime/adapter.ts`, `packages/typescript/ai-gemini/src/realtime/token.ts`, `packages/typescript/ai-gemini/src/realtime/types.ts`	New large implementation: token factory using `@google/genai`, client adapter implementing connect/send/interrupt/audio capture/playback/tool mapping/events, and TypeScript types for models/voices/options.
Other adapters updated `packages/typescript/ai-openai/src/realtime/adapter.ts`, `packages/typescript/ai-elevenlabs/src/realtime/adapter.ts`	Normalize connect signature to accept `RealtimeSessionConfig` (OpenAI/ElevenLabs updated to accept/ignore config) and forward config into WebRTC connection for OpenAI.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Browser Client
  participant Server as App Server
  participant GeminiAuth as Google GenAI Auth
  participant GeminiLive as Google GenAI Live
  participant AudioHW as Microphone/Audio

  Client->>Server: request realtime token (provider=gemini, options)
  Server->>GeminiAuth: client.authTokens.create(...) (model, expireTime, modalities)
  GeminiAuth-->>Server: ephemeral token (name, expireTime)
  Server-->>Client: RealtimeToken (token, expiresAt, config)

  Client->>GeminiLive: ai.live.connect(token, RealtimeSessionConfig)
  GeminiLive-->>Client: session open / status events
  AudioHW->>Client: capture PCM frames (AudioWorklet)
  Client->>GeminiLive: sendRealtimeInput({ audio: { data, mimeType } })
  GeminiLive-->>Client: transcript / tool_call / message parts (events)
  Client->>Client: assemble message, emit message_complete
  GeminiLive-->>Client: binary audio chunks
  Client->>AudioHW: decode & schedule playback (AudioBufferSourceNode)
  Client->>GeminiLive: sendText / sendImage / sendToolResult / interrupt

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat: add realtime chat support with OpenAI and ElevenLabs adapters #300: Extends the realtime system; directly related to adding Gemini realtime token/adapter and shared realtime API changes.

Suggested reviewers

AlemTuzlak

Poem

"A rabbit hopped in, ears full of cheer,
Built a realtime bridge both far and near.
Tokens and voices, streams that sing,
Mic hopped on, and code took wing.
Puck and friends now dance — hip-hop hooray!"

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 68.75% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: adding a Gemini Realtime Adapter to the ai-gemini package.
Description check	✅ Passed	The description covers the core change and includes all required checklist items completed, though it lacks detailed explanation of what the adapter does or how to use it.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

⚔️ Resolve merge conflicts

Resolve merge conflict in branch feat/gemini-realtime-chat

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

nikas-belogolov · 2026-03-29T10:23:54Z

There should be some discussion around how to implement the websocket connection, as there could be client-server (straight to the realtime API) connection, or client-server-server, with the web server (e.g. tanstack start) being a proxy

nx-cloud · 2026-03-29T10:24:29Z

🤖 Nx Cloud AI Fix Eligible

An automatically generated fix could have helped fix failing tasks for this run, but Self-healing CI is disabled for this workspace. Visit workspace settings to enable it and get automatic fixes in future runs.

_{To disable these notifications, a workspace admin can disable them in workspace settings.}

View your CI Pipeline Execution ↗ for commit f40f0c8

Command	Status	Duration	Result
`nx affected --targets=test:sherif,test:knip,tes...`	❌ Failed	2m 47s	View ↗
`nx run-many --targets=build --exclude=examples/**`	❌ Failed	48s	View ↗

☁️ Nx Cloud last updated this comment at 2026-03-29 10:28:30 UTC

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/typescript/ai-client/src/realtime-client.ts (1)

532-540: ⚠️ Potential issue | 🟡 Minor

Inconsistent tool mapping: applySessionConfig omits outputSchema.

The tool mapping in connect() (lines 104-114) includes both inputSchema and outputSchema, but applySessionConfig() only includes inputSchema. This inconsistency could lead to missing output schemas when the session is updated after initial connection.

🐛 Proposed fix for consistency

     const toolsConfig = tools
       ? Array.from(this.clientTools.values()).map((t) => ({
           name: t.name,
           description: t.description,
           inputSchema: t.inputSchema
             ? convertSchemaToJsonSchema(t.inputSchema)
             : undefined,
+          outputSchema: t.outputSchema
+            ? convertSchemaToJsonSchema(t.outputSchema)
+            : undefined,
         }))
       : undefined

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-client/src/realtime-client.ts` around lines 532 - 540,
The applySessionConfig mapping for tools omits outputSchema, causing
inconsistency with connect(); update applySessionConfig (the code building
toolsConfig from this.clientTools.values()) to include outputSchema the same way
inputSchema is handled by calling convertSchemaToJsonSchema on t.outputSchema
when present, so toolsConfig contains both inputSchema and outputSchema
(mirroring the mapping in connect()).

🧹 Nitpick comments (3)

packages/typescript/ai-client/src/realtime-client.ts (1)
99-102: Remove commented-out code.

This dead code should be removed to keep the codebase clean.
🧹 Proposed removal
-      // const toolsList =
-      //   this.clientTools.size > 0
-      //     ? Array.from(this.clientTools.values())
-      //     : undefined
-
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-client/src/realtime-client.ts` around lines 99 - 102,
Remove the dead/commented-out block that defines toolsList in realtime-client.ts
(the lines referencing toolsList and this.clientTools) — delete the commented
code entirely so no leftover commented declarations remain; ensure there are no
other references to the removed snippet (search for toolsList and usages of
this.clientTools in the surrounding code) and run lint/format to keep the file
clean.
packages/typescript/ai-gemini/src/index.ts (1)
86-90: Consider exporting Gemini realtime types from the main entry point.

The ./realtime/index module exports types (GeminiRealtimeModel, GeminiRealtimeTokenOptions, GeminiRealtimeOptions) that aren't re-exported here. Other adapters (text, summarize, image, tts) export their configuration types from the main entry point for consumer convenience.
♻️ Proposed addition for type exports
 // Realtime adapter
 export {
   geminiRealtime,
   geminiRealtimeToken,
 } from './realtime/index'
+export type {
+  GeminiRealtimeModel,
+  GeminiRealtimeTokenOptions,
+  GeminiRealtimeOptions,
+} from './realtime/index'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/index.ts` around lines 86 - 90, Export the
realtime types from the main entry point so consumers can import them alongside
geminiRealtime and geminiRealtimeToken; add a type-only re-export like "export
type { GeminiRealtimeModel, GeminiRealtimeTokenOptions, GeminiRealtimeOptions }
from './realtime/index'" in the same file that currently exports geminiRealtime
and geminiRealtimeToken. Ensure you use a type-only export to avoid runtime
bundle changes and reference the exact type names GeminiRealtimeModel,
GeminiRealtimeTokenOptions, and GeminiRealtimeOptions so IDEs and consumers can
import them directly.
examples/ts-react-chat/src/routes/realtime.tsx (1)
279-293: Consider enabling additional configuration options for Gemini.

The tools indicator now correctly appears for Gemini. However, other session configuration options (output mode, temperature, semantic eagerness) remain OpenAI-only in the UI, but the underlying RealtimeSessionConfig supports these for all providers. Consider enabling some of these controls for Gemini if the Gemini Live API supports them.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/ts-react-chat/src/routes/realtime.tsx` around lines 279 - 293, The
UI currently gates several session configuration controls to provider ===
'openai'; update those conditionals to also include 'gemini' so that controls
for output mode, temperature, and semanticEagerness are shown when provider ===
'gemini' (same as how you added the tools indicator for Gemini). Locate the
conditional checks and UI blocks in the realtime component that reference
provider (and the controls bound to RealtimeSessionConfig such as outputMode,
temperature, semanticEagerness) and extend their logic to allow 'gemini'; ensure
the form bindings still map to RealtimeSessionConfig fields so changes propagate
to the session payload.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/ts-react-chat/src/lib/use-realtime.ts`:
- Around line 65-69: The Gemini branch is using shared OpenAI defaults (voice:
'alloy' and unconstrained responseModalities) which Gemini Live rejects; update
the geminiRealtime() path so it supplies Gemini-compatible defaults: set voice
to a valid Gemini prebuilt name (e.g., 'Puck' or another supported name) instead
of 'alloy', and set responseModalities to an explicit single-element array
(e.g., [Modality.AUDIO]) for the session; ensure geminiRealtime (or its caller
in useRealtime) overrides the shared defaults and enforces only one modality so
the Gemini connection will succeed.

In `@packages/typescript/ai-client/src/realtime-types.ts`:
- Around line 28-34: The ElevenLabs realtime adapter's connect function still
uses the old signature with clientToolDefs?: ReadonlyArray<AnyClientTool>, which
no longer matches the RealtimeAdapter.connect signature (token: RealtimeToken,
config: RealtimeSessionConfig) and causes type errors; update the connect method
in packages/typescript/ai-elevenlabs/src/realtime/adapter.ts (the function named
connect or the class implementing RealtimeAdapter) to accept (token:
RealtimeToken, config: RealtimeSessionConfig) and remove the legacy
clientToolDefs parameter, update any internal usages/calls within that adapter
to read configuration from the provided RealtimeSessionConfig, and ensure the
exported adapter type still satisfies RealtimeAdapter.

In `@packages/typescript/ai-gemini/src/realtime/adapter.ts`:
- Around line 96-112: liveConfig is missing transcription flags so Gemini won't
populate inputTranscription/outputTranscription; update the LiveConnectConfig
object built in the liveConfig variable to include inputAudioTranscription: {}
and outputAudioTranscription: {} (use the same config.providerOptions merge
pattern so you don't overwrite existing keys) so the API will emit transcription
data that the adapter reads when handling inputTranscription/outputTranscription
events.
- Around line 466-473: The sendImage function currently calls
session.sendRealtimeInput with a non-supported media property; update sendImage
to pass the image under the modality-specific video field instead of media or
deprecated mediaChunks (e.g., call session.sendRealtimeInput({ video: { /*
include the image bytes and mimeType as the frame payload */ } })). Locate
sendImage and replace the media object with a video object containing the image
data and mimeType in the API's expected frame/payload shape so the
session.sendRealtimeInput call uses video rather than media or mediaChunks.
- Around line 179-185: The convertFloat32ToInt16 function currently returns
buf.toString() (comma-separated integers); change it to produce a base64-encoded
string of the raw 16-bit PCM bytes suitable for the Gemini Live API audio.data
field (e.g., MIME audio/pcm;rate=16000). Convert the Float32Array to an
Int16Array (clamping and scaling as done now), then create a Uint8Array view
over the Int16Array's buffer (ensuring correct endianness), encode that byte
array to base64, and return the base64 string so audio.data contains raw PCM
bytes rather than CSV integers.
- Around line 188-190: The GoogleGenAI client is being constructed with an
ephemeral token (token.token) but missing the v1alpha API version; update the
initialization of GoogleGenAI (the ai instance) to include httpOptions: {
apiVersion: 'v1alpha' } when using ephemeral tokens so Live API requests use the
v1alpha endpoint, e.g., detect where new GoogleGenAI({ apiKey: token.token }) is
created and add the httpOptions.apiVersion flag.

In `@packages/typescript/ai-gemini/src/realtime/token.ts`:
- Around line 41-42: The default model string is invalid for the
GeminiRealtimeModel union: change the default assigned to the local variable
model (currently set via options.model ?? 'gemini-live-2.5-flash-native-audio')
to one of the allowed union values (e.g., 'gemini-3.1-flash-live-preview' or
'gemini-2.5-flash-native-audio-preview-12-2025') so the inferred type matches
GeminiRealtimeModel; update the expression using options.model to fall back to a
valid GeminiRealtimeModel literal.

---

Outside diff comments:
In `@packages/typescript/ai-client/src/realtime-client.ts`:
- Around line 532-540: The applySessionConfig mapping for tools omits
outputSchema, causing inconsistency with connect(); update applySessionConfig
(the code building toolsConfig from this.clientTools.values()) to include
outputSchema the same way inputSchema is handled by calling
convertSchemaToJsonSchema on t.outputSchema when present, so toolsConfig
contains both inputSchema and outputSchema (mirroring the mapping in connect()).

---

Nitpick comments:
In `@examples/ts-react-chat/src/routes/realtime.tsx`:
- Around line 279-293: The UI currently gates several session configuration
controls to provider === 'openai'; update those conditionals to also include
'gemini' so that controls for output mode, temperature, and semanticEagerness
are shown when provider === 'gemini' (same as how you added the tools indicator
for Gemini). Locate the conditional checks and UI blocks in the realtime
component that reference provider (and the controls bound to
RealtimeSessionConfig such as outputMode, temperature, semanticEagerness) and
extend their logic to allow 'gemini'; ensure the form bindings still map to
RealtimeSessionConfig fields so changes propagate to the session payload.

In `@packages/typescript/ai-client/src/realtime-client.ts`:
- Around line 99-102: Remove the dead/commented-out block that defines toolsList
in realtime-client.ts (the lines referencing toolsList and this.clientTools) —
delete the commented code entirely so no leftover commented declarations remain;
ensure there are no other references to the removed snippet (search for
toolsList and usages of this.clientTools in the surrounding code) and run
lint/format to keep the file clean.

In `@packages/typescript/ai-gemini/src/index.ts`:
- Around line 86-90: Export the realtime types from the main entry point so
consumers can import them alongside geminiRealtime and geminiRealtimeToken; add
a type-only re-export like "export type { GeminiRealtimeModel,
GeminiRealtimeTokenOptions, GeminiRealtimeOptions } from './realtime/index'" in
the same file that currently exports geminiRealtime and geminiRealtimeToken.
Ensure you use a type-only export to avoid runtime bundle changes and reference
the exact type names GeminiRealtimeModel, GeminiRealtimeTokenOptions, and
GeminiRealtimeOptions so IDEs and consumers can import them directly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 26ce7bc3-6e10-47ce-adc7-516c0d88c602

📥 Commits

Reviewing files that changed from the base of the PR and between a8a4465 and dadfae9.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (13)

.changeset/huge-lizards-admire.md
examples/ts-react-chat/src/lib/use-realtime.ts
examples/ts-react-chat/src/routes/realtime.tsx
packages/typescript/ai-client/src/realtime-client.ts
packages/typescript/ai-client/src/realtime-types.ts
packages/typescript/ai-gemini/package.json
packages/typescript/ai-gemini/src/index.ts
packages/typescript/ai-gemini/src/realtime/adapter.ts
packages/typescript/ai-gemini/src/realtime/index.ts
packages/typescript/ai-gemini/src/realtime/token.ts
packages/typescript/ai-gemini/src/realtime/types.ts
packages/typescript/ai-openai/src/realtime/adapter.ts
packages/typescript/ai/src/realtime/types.ts

examples/ts-react-chat/src/lib/use-realtime.ts

packages/typescript/ai-client/src/realtime-types.ts

packages/typescript/ai-gemini/src/realtime/adapter.ts

coderabbitai · 2026-04-13T19:56:18Z

packages/typescript/ai-gemini/src/realtime/adapter.ts

+  function convertFloat32ToInt16(buffer: Float32Array) {
+    let l = buffer.length;
+    const buf = new Int16Array(l);
+    while (l--) {
+      buf[l] = Math.min(1, Math.max(-1, buffer[l]!)) * 0x7fff;
+    }
+    return buf.toString();


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

@google/genai session.sendRealtimeInput audio.data expected format for Gemini Live JavaScript

💡 Result:

The expected format for audio.data in session.sendRealtimeInput({ audio: { data: ..., mimeType: ... } }) for the @google/genai JavaScript SDK (Gemini Live) is a base64-encoded string of raw 16-bit PCM audio data (little-endian, typically 16kHz sample rate). Usage example: session.sendRealtimeInput({ audio: { data: audioChunk.toString('base64'), // audioChunk is a Buffer/ArrayBuffer/Uint8Array of raw PCM bytes mimeType: 'audio/pcm;rate=16000' } }); Key details: - Audio format: Raw PCM, 16-bit signed integers, little-endian, mono. - Sample rate: 16kHz native (API resamples if needed). - mimeType: Must include rate, e.g., 'audio/pcm;rate=16000'. - Send in small chunks for real-time streaming. This matches official Google AI for Developers documentation and SDK source patterns.

Citations:

1: https://ai.google.dev/gemini-api/docs/live-guide

2: https://ai.google.dev/gemini-api/docs/live-api/capabilities

3: https://github.com/googleapis/js-genai/blob/89400098/src/live.ts

4: https://tessl.io/registry/tessl/npm-google--genai/1.30.0/docs/live.md

🏁 Script executed:

# First, let's check if the file exists and read the relevant section cat -n packages/typescript/ai-gemini/src/realtime/adapter.ts | sed -n '170,200p'

Repository: TanStack/ai

Length of output: 981

🏁 Script executed:

# Search for usages of convertFloat32ToInt16 in the codebase rg "convertFloat32ToInt16" --type ts --type tsx -B 2 -A 5

Repository: TanStack/ai

Length of output: 82

🏁 Script executed:

# Check for any other audio-related functions in the adapter file rg "audio|pcm|Audio|PCM" packages/typescript/ai-gemini/src/realtime/adapter.ts -n

Repository: TanStack/ai

Length of output: 3431

Encode microphone PCM as base64-encoded raw bytes, not comma-separated integers.

Int16Array.prototype.toString() returns a comma-separated decimal string like "12,-44,9", but the Gemini Live API expects the audio.data field to contain a base64-encoded string of raw 16-bit PCM bytes with a MIME type like audio/pcm;rate=16000. Passing the current comma-separated format will cause the audio payload to be rejected or misinterpreted server-side, preventing microphone input from decoding correctly.

🐛 Proposed fix

function convertFloat32ToInt16(buffer: Float32Array) { - let l = buffer.length; - const buf = new Int16Array(l); - while (l--) { - buf[l] = Math.min(1, Math.max(-1, buffer[l]!)) * 0x7fff; - } - return buf.toString(); + const pcm = new Int16Array(buffer.length) + for (let i = 0; i < buffer.length; i++) { + pcm[i] = Math.min(1, Math.max(-1, buffer[i]!)) * 0x7fff + } + + const bytes = new Uint8Array(pcm.buffer) + let binary = '' + for (const byte of bytes) { + binary += String.fromCharCode(byte) + } + return btoa(binary) }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 179 - 185, The convertFloat32ToInt16 function currently returns buf.toString() (comma-separated integers); change it to produce a base64-encoded string of the raw 16-bit PCM bytes suitable for the Gemini Live API audio.data field (e.g., MIME audio/pcm;rate=16000). Convert the Float32Array to an Int16Array (clamping and scaling as done now), then create a Uint8Array view over the Int16Array's buffer (ensuring correct endianness), encode that byte array to base64, and return the base64 string so audio.data contains raw PCM bytes rather than CSV integers.

coderabbitai · 2026-04-13T19:56:18Z

packages/typescript/ai-gemini/src/realtime/adapter.ts

+  const ai = new GoogleGenAI({
+    apiKey: token.token
+  });


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Does @google/genai require httpOptions.apiVersion = 'v1alpha' when connecting to Gemini Live with an ephemeral token?

💡 Result:

Yes, @google/genai requires httpOptions.apiVersion = 'v1alpha' when connecting to Gemini Live with an ephemeral token. Ephemeral tokens are exclusively compatible with the Live API and mandate the v1alpha API version. Official documentation explicitly states that when using an ephemeral token as the apiKey for client initialization, the http_options must be set to {'api_version': 'v1alpha'}. This is also required for Live API preview features like affective dialog and proactive audio, and for creating the ephemeral tokens themselves.

Citations:

1: https://ai.google.dev/gemini-api/docs/live-api/capabilities

2: https://ai.google.dev/gemini-api/docs/live-api/ephemeral-tokens

3: https://ai.google.dev/gemini-api/docs/ephemeral-tokens

4: https://googleapis-python-genai-70.mintlify.app/guides/http-options

5: https://ai.google.dev/gemini-api/docs/api-versions

🏁 Script executed:

# First, let's check the file structure and the specific lines mentioned head -n 200 packages/typescript/ai-gemini/src/realtime/adapter.ts | tail -n 20

Repository: TanStack/ai

Length of output: 520

🏁 Script executed:

# Let's also search for other GoogleGenAI instantiations in the codebase rg "new GoogleGenAI" -A 3 packages/typescript/ai-gemini/

Repository: TanStack/ai

Length of output: 873

🏁 Script executed:

# Check the `@google/genai` version in the project to verify SDK compatibility cat packages/typescript/ai-gemini/package.json | grep -A 5 -B 5 "@google/genai"

Repository: TanStack/ai

Length of output: 264

🏁 Script executed:

# Check more context around the GoogleGenAI initialization to verify it's for Live API sed -n '180,210p' packages/typescript/ai-gemini/src/realtime/adapter.ts

Repository: TanStack/ai

Length of output: 907

🏁 Script executed:

# Also check the token.ts file to understand the different use case cat -n packages/typescript/ai-gemini/src/realtime/token.ts | head -n 50

Repository: TanStack/ai

Length of output: 1966

Add v1alpha API version when initializing GoogleGenAI with ephemeral token.

Ephemeral tokens for Gemini Live API are v1alpha-only. When using an ephemeral token as the API key, the client must be initialized with httpOptions: { apiVersion: 'v1alpha' }. Without this configuration, Live API requests will route to the wrong API version and fail.

Proposed fix

const ai = new GoogleGenAI({ - apiKey: token.token + apiKey: token.token, + httpOptions: { + apiVersion: 'v1alpha', + }, });

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const ai = new GoogleGenAI({

apiKey: token.token

});

const ai = new GoogleGenAI({

apiKey: token.token,

httpOptions: {

apiVersion: 'v1alpha',

},

});

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 188 - 190, The GoogleGenAI client is being constructed with an ephemeral token (token.token) but missing the v1alpha API version; update the initialization of GoogleGenAI (the ai instance) to include httpOptions: { apiVersion: 'v1alpha' } when using ephemeral tokens so Live API requests use the v1alpha endpoint, e.g., detect where new GoogleGenAI({ apiKey: token.token }) is created and add the httpOptions.apiVersion flag.

packages/typescript/ai-gemini/src/realtime/adapter.ts

packages/typescript/ai-gemini/src/realtime/token.ts

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/typescript/ai-elevenlabs/src/realtime/adapter.ts (1)
35-47: ⚠️ Potential issue | 🟠 Major

Wire options and config parameters to session initialization.

The connect() method accepts both options (from elevenlabsRealtime()) and _config (RealtimeSessionConfig) but passes neither to createElevenLabsConnection(), causing caller-provided configuration to be silently dropped. While ElevenLabsRealtimeOptions includes meaningful fields (connectionMode, debug), and the Gemini adapter demonstrates proper config wiring, the ElevenLabs implementation ignores both parameters. At minimum, rename _config to config to reflect its intended use and thread it through to session creation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-elevenlabs/src/realtime/adapter.ts` around lines 35 -
47, The connect implementation of elevenlabsRealtime is dropping the
caller-provided ElevenLabsRealtimeOptions and RealtimeSessionConfig; rename the
unused parameter _config to config in the connect signature and thread both
options (from elevenlabsRealtime) and config (RealtimeSessionConfig) into
createElevenLabsConnection so session initialization receives
connectionMode/debug and session config; update the call sites inside
elevenlabsRealtime.connect to pass (token, config, options, clientToolDefs) or
the equivalent parameter order expected by createElevenLabsConnection and adjust
createElevenLabsConnection invocation accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@packages/typescript/ai-elevenlabs/src/realtime/adapter.ts`:
- Around line 35-47: The connect implementation of elevenlabsRealtime is
dropping the caller-provided ElevenLabsRealtimeOptions and
RealtimeSessionConfig; rename the unused parameter _config to config in the
connect signature and thread both options (from elevenlabsRealtime) and config
(RealtimeSessionConfig) into createElevenLabsConnection so session
initialization receives connectionMode/debug and session config; update the call
sites inside elevenlabsRealtime.connect to pass (token, config, options,
clientToolDefs) or the equivalent parameter order expected by
createElevenLabsConnection and adjust createElevenLabsConnection invocation
accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e9ecd1b6-aed5-4bb5-81bc-c287adfd2e67

📥 Commits

Reviewing files that changed from the base of the PR and between dadfae9 and 0711441.

📒 Files selected for processing (5)

packages/typescript/ai-client/src/realtime-client.ts
packages/typescript/ai-client/src/realtime-types.ts
packages/typescript/ai-elevenlabs/src/realtime/adapter.ts
packages/typescript/ai-gemini/src/realtime/adapter.ts
packages/typescript/ai-openai/src/realtime/adapter.ts

🚧 Files skipped from review as they are similar to previous changes (3)

packages/typescript/ai-client/src/realtime-client.ts
packages/typescript/ai-client/src/realtime-types.ts
packages/typescript/ai-openai/src/realtime/adapter.ts

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (2)

packages/typescript/ai-gemini/src/realtime/adapter.ts (2)

218-220: ⚠️ Potential issue | 🟠 Major

Initialize GoogleGenAI with v1alpha for ephemeral tokens.

Gemini’s ephemeral tokens are Live-only and the official docs require the v1alpha API version when using them. Without that, the SDK can hit the wrong websocket/API path and fail to connect. (ai.google.dev)

🔧 Proposed fix

   const ai = new GoogleGenAI({
-    apiKey: token.token
-  });
+    apiKey: token.token,
+    httpOptions: {
+      apiVersion: 'v1alpha',
+    },
+  })

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 218 -
220, The GoogleGenAI client is initialized without the API version so ephemeral
tokens can target the wrong websocket/API path; update the GoogleGenAI
instantiation (the new GoogleGenAI({ apiKey: token.token }) call) to include the
v1alpha API version required for ephemeral tokens (e.g., add the
apiVersion/version option set to "v1alpha") so the SDK uses the correct
live-only endpoint when using token.token.

209-215: ⚠️ Potential issue | 🔴 Critical

Encode PCM as base64 bytes, not CSV.

buf.toString() produces comma-separated integers, but Gemini Live expects audio.data to be base64-encoded raw 16-bit PCM bytes. In the current form microphone audio will be rejected or decoded as garbage. (ai.google.dev)

🐛 Proposed fix

 function convertFloat32ToInt16(buffer: Float32Array) {
-  let l = buffer.length;
-  const buf = new Int16Array(l);
-  while (l--) {
-    buf[l] = Math.min(1, Math.max(-1, buffer[l]!)) * 0x7fff;
-  }
-  return buf.toString();
+  const pcm = new Int16Array(buffer.length)
+  for (let i = 0; i < buffer.length; i++) {
+    pcm[i] = Math.min(1, Math.max(-1, buffer[i]!)) * 0x7fff
+  }
+
+  const bytes = new Uint8Array(pcm.buffer)
+  let binary = ''
+  for (const byte of bytes) {
+    binary += String.fromCharCode(byte)
+  }
+  return btoa(binary)
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 209 -
215, convertFloat32ToInt16 currently returns CSV via buf.toString(); update it
to produce base64-encoded raw 16-bit PCM bytes instead: convert the Float32Array
samples into an Int16Array (clamped to -1..1 and scaled by 0x7fff) as done in
convertFloat32ToInt16, then create a Uint8Array view over the Int16Array.buffer
(ensure little-endian PCM ordering) and return a base64 string of those raw
bytes (e.g. Buffer.from(uint8Array).toString('base64') in Node or equivalent in
browsers); ensure the function still accepts a Float32Array and returns the
base64 audio.data string expected by Gemini Live.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/typescript/ai-gemini/src/realtime/adapter.ts`:
- Around line 107-113: The code destructures provider-specific fields from
config.providerOptions without guarding for it; update the destructuring in
adapter.ts so it handles an undefined RealtimeSessionConfig.providerOptions by
using a safe fallback (e.g., null-coalescing or an early guard) before
extracting languageCode, contextWindowCompression, proactivity,
enableAffectiveDialog, and thinkingConfig; ensure the change references
config.providerOptions and GeminiRealtimeProviderOptions and preserves types
while providing sensible defaults or returning early when providerOptions is
absent.
- Around line 254-281: The transcription guard incorrectly checks
inputTranscription.finished/outputTranscription.finished (Gemini never sends
`finished`), causing transcript events to be skipped; update the checks to only
verify presence of text (inputTranscription.text/outputTranscription.text) and
use response.serverContent?.turnComplete to determine finality (pass that value
as isFinal), and preserve the currentMode switch to 'thinking' and
emit('mode_change') when appropriate; locate this logic around
inputTranscription/outputTranscription and emit('transcript') to apply the
change.
- Around line 512-520: The sendToolResult implementation in sendToolResult is
sending functionResponses as a single object but the Gemini Live API expects an
array of FunctionResponse objects; update the call to session.sendToolResponse
so that the functionResponses property is an array (e.g., [ { id: callId,
response: { result } } ]) and adjust typing if necessary to match the
FunctionResponse[] shape used elsewhere.
- Around line 242-247: The code is incorrectly encoding response.data as UTF-8;
instead extract the base64 PCM from
response.serverContent.modelTurn.parts[].inlineData.data, decode it to binary
and pass the resulting ArrayBuffer to playIncomingAudioChunk; implement a helper
like base64ToArrayBuffer(base64: string) that uses atob and Uint8Array to
produce an ArrayBuffer, and replace the textEncoder.encode(response.data).buffer
call in the adapter (where playIncomingAudioChunk is invoked) with
base64ToArrayBuffer(response.serverContent.modelTurn.parts[i].inlineData.data)
(ensuring you handle the correct part index and null checks).

---

Duplicate comments:
In `@packages/typescript/ai-gemini/src/realtime/adapter.ts`:
- Around line 218-220: The GoogleGenAI client is initialized without the API
version so ephemeral tokens can target the wrong websocket/API path; update the
GoogleGenAI instantiation (the new GoogleGenAI({ apiKey: token.token }) call) to
include the v1alpha API version required for ephemeral tokens (e.g., add the
apiVersion/version option set to "v1alpha") so the SDK uses the correct
live-only endpoint when using token.token.
- Around line 209-215: convertFloat32ToInt16 currently returns CSV via
buf.toString(); update it to produce base64-encoded raw 16-bit PCM bytes
instead: convert the Float32Array samples into an Int16Array (clamped to -1..1
and scaled by 0x7fff) as done in convertFloat32ToInt16, then create a Uint8Array
view over the Int16Array.buffer (ensure little-endian PCM ordering) and return a
base64 string of those raw bytes (e.g.
Buffer.from(uint8Array).toString('base64') in Node or equivalent in browsers);
ensure the function still accepts a Float32Array and returns the base64
audio.data string expected by Gemini Live.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 90dd861d-bc24-41c4-b3c2-d27c79ea301b

📥 Commits

Reviewing files that changed from the base of the PR and between 0711441 and 5bd8d46.

📒 Files selected for processing (7)

examples/ts-react-chat/src/lib/use-realtime.ts
packages/typescript/ai-gemini/src/index.ts
packages/typescript/ai-gemini/src/realtime/adapter.ts
packages/typescript/ai-gemini/src/realtime/index.ts
packages/typescript/ai-gemini/src/realtime/token.ts
packages/typescript/ai-gemini/src/realtime/types.ts
packages/typescript/ai/src/realtime/types.ts

✅ Files skipped from review due to trivial changes (1)

packages/typescript/ai-gemini/src/realtime/index.ts

🚧 Files skipped from review as they are similar to previous changes (4)

packages/typescript/ai-gemini/src/index.ts
examples/ts-react-chat/src/lib/use-realtime.ts
packages/typescript/ai-gemini/src/realtime/token.ts
packages/typescript/ai-gemini/src/realtime/types.ts

coderabbitai · 2026-04-15T08:59:36Z

packages/typescript/ai-gemini/src/realtime/adapter.ts

+  const {
+    languageCode,
+    contextWindowCompression,
+    proactivity,
+    enableAffectiveDialog,
+    thinkingConfig
+  } = config.providerOptions as GeminiRealtimeProviderOptions


⚠️ Potential issue | 🔴 Critical

Guard optional providerOptions before destructuring.

RealtimeSessionConfig.providerOptions is optional, so this throws on the default path before the socket is even opened.

🐛 Proposed fix

const { languageCode, contextWindowCompression, proactivity, enableAffectiveDialog, thinkingConfig - } = config.providerOptions as GeminiRealtimeProviderOptions + } = (config.providerOptions ?? {}) as GeminiRealtimeProviderOptions

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const {

languageCode,

contextWindowCompression,

proactivity,

enableAffectiveDialog,

thinkingConfig

} = config.providerOptions as GeminiRealtimeProviderOptions

const {

languageCode,

contextWindowCompression,

proactivity,

enableAffectiveDialog,

thinkingConfig

} = (config.providerOptions ?? {}) as GeminiRealtimeProviderOptions

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 107 - 113, The code destructures provider-specific fields from config.providerOptions without guarding for it; update the destructuring in adapter.ts so it handles an undefined RealtimeSessionConfig.providerOptions by using a safe fallback (e.g., null-coalescing or an early guard) before extracting languageCode, contextWindowCompression, proactivity, enableAffectiveDialog, and thinkingConfig; ensure the change references config.providerOptions and GeminiRealtimeProviderOptions and preserves types while providing sensible defaults or returning early when providerOptions is absent.

coderabbitai · 2026-04-15T08:59:36Z

packages/typescript/ai-gemini/src/realtime/adapter.ts

+        if (response.data) {
+          // TODO: Decode chunk and play using an `AudioWorklet` or
+          // buffer them into an AudioContext
+
+          playIncomingAudioChunk(textEncoder.encode(response.data).buffer)
+


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

head -250 packages/typescript/ai-gemini/src/realtime/adapter.ts | tail -30

Repository: TanStack/ai

Length of output: 985

🏁 Script executed:

# Get the exact lines mentioned sed -n '240,250p' packages/typescript/ai-gemini/src/realtime/adapter.ts

Repository: TanStack/ai

Length of output: 420

🏁 Script executed:

# Check the full context around those lines (expand window) sed -n '230,260p' packages/typescript/ai-gemini/src/realtime/adapter.ts

Repository: TanStack/ai

Length of output: 1077

🏁 Script executed:

# Look for response object type definitions rg "interface.*response|type.*response" packages/typescript/ai-gemini/src/realtime/adapter.ts -A 5

Repository: TanStack/ai

Length of output: 37

🏁 Script executed:

# Search for base64ToArrayBuffer or similar utilities in the codebase rg "base64|Base64" packages/typescript/ai-gemini/src/realtime/ -i

Repository: TanStack/ai

Length of output: 37

🏁 Script executed:

# Check imports and type definitions at the top of the file head -50 packages/typescript/ai-gemini/src/realtime/adapter.ts

Repository: TanStack/ai

Length of output: 1310

🏁 Script executed:

# Find type definitions or interface imports from google/genai rg "LiveConnectConfig|ServerContent|modelTurn" packages/typescript/ai-gemini/ -A 3

Repository: TanStack/ai

Length of output: 2624

🏁 Script executed:

# Look for any response type usage or definition rg "onmessage.*response" packages/typescript/ai-gemini/src/realtime/adapter.ts -A 20

Repository: TanStack/ai

Length of output: 747

🏁 Script executed:

# Check if there's any documentation or test files that show response structure find packages/typescript/ai-gemini -name "*.test.*" -o -name "*.spec.*" | head -5

Repository: TanStack/ai

Length of output: 226

🏁 Script executed:

# Search for response.serverContent usage in the file rg "response\.serverContent|modelTurn|inlineData" packages/typescript/ai-gemini/src/realtime/adapter.ts

Repository: TanStack/ai

Length of output: 300

🏁 Script executed:

# Look for any existing base64 utilities in the entire ai package rg "base64|atob|btoa" packages/typescript/ --type ts

Repository: TanStack/ai

Length of output: 15127

🏁 Script executed:

# Look at more context around the problematic code and where response.serverContent is actually used sed -n '230,300p' packages/typescript/ai-gemini/src/realtime/adapter.ts

Repository: TanStack/ai

Length of output: 2166

🏁 Script executed:

# Check if response.data is used elsewhere or what it might be rg "response\.data" packages/typescript/ai-gemini/src/realtime/adapter.ts -B 2 -A 2

Repository: TanStack/ai

Length of output: 329

🏁 Script executed:

# Look for how audio is currently being played rg "playIncomingAudioChunk" packages/typescript/ai-gemini/src/realtime/adapter.ts -B 3 -A 3

Repository: TanStack/ai

Length of output: 571

🏁 Script executed:

# Search for inlineData in the codebase to see if it's used elsewhere rg "inlineData" packages/typescript/ai-gemini/

Repository: TanStack/ai

Length of output: 1781

🏁 Script executed:

# Check the TTS adapter to see how it correctly handles audio data cat packages/typescript/ai-gemini/src/adapters/tts.ts

Repository: TanStack/ai

Length of output: 6691

🏁 Script executed:

# Look for base64 to ArrayBuffer conversion utilities rg "atob|base64.*ArrayBuffer|Uint8Array.*base64" packages/typescript/ai-gemini/ -B 2 -A 2

Repository: TanStack/ai

Length of output: 37

🏁 Script executed:

# Check the realtime adapter more thoroughly for context on response types sed -n '1,100p' packages/typescript/ai-gemini/src/realtime/adapter.ts | grep -A 10 "import\|type.*response"

Repository: TanStack/ai

Length of output: 1201

🏁 Script executed:

# Search for documentation or type definitions of the `@google/genai` live response rg "response.*inlineData|modelTurn.*parts|serverContent" packages/typescript/ai-gemini/src/realtime/ -B 2 -A 2

Repository: TanStack/ai

Length of output: 1773

🏁 Script executed:

# Look for existing atob usage or base64 decoding patterns in ai-gemini or ai-openai rg "atob|Buffer\.from.*base64" packages/typescript/ai-gemini/ packages/typescript/ai-openai/src/adapters/transcription.ts -B 2 -A 2

Repository: TanStack/ai

Length of output: 1088

🏁 Script executed:

# Check if there are any utilities files in ai-gemini where such a helper could be added find packages/typescript/ai-gemini/src -name "*.ts" -type f | head -20

Repository: TanStack/ai

Length of output: 1192

🏁 Script executed:

# Look at the full context of response.data usage to understand where it comes from sed -n '200,260p' packages/typescript/ai-gemini/src/realtime/adapter.ts

Repository: TanStack/ai

Length of output: 1737

🏁 Script executed:

# Look for how response.data is supposed to be structured - check Google Gemini API types rg "response\.data|LiveServerMessage" packages/typescript/ai-gemini/ -B 2 -A 2

Repository: TanStack/ai

Length of output: 815

🏁 Script executed:

# Verify the exact structure by looking at how modelTurn is accessed elsewhere sed -n '260,330p' packages/typescript/ai-gemini/src/realtime/adapter.ts

Repository: TanStack/ai

Length of output: 1963

🏁 Script executed:

# Check if there's a utils file where base64ToArrayBuffer could be added ls -la packages/typescript/ai-gemini/src/*.ts

Repository: TanStack/ai

Length of output: 346

🏁 Script executed:

# Look for any documentation or comments about response.data in the Google Gemini SDK rg "response\.data" packages/typescript/ --type ts

Repository: TanStack/ai

Length of output: 610

🏁 Script executed:

# Check the complete playIncomingAudioChunk function to understand what it expects rg "function playIncomingAudioChunk|const playIncomingAudioChunk" packages/typescript/ai-gemini/src/realtime/adapter.ts -A 10

Repository: TanStack/ai

Length of output: 452

🏁 Script executed:

# Check if there's any test file for the realtime adapter that shows expected behavior cat packages/typescript/ai-gemini/tests/gemini-adapter.test.ts | head -100

Repository: TanStack/ai

Length of output: 2632

🏁 Script executed:

# Let me verify the exact Google Gemini Live API response structure from their docs web_search Google Gemini Live API response structure serverContent modelTurn inlineData audio

Repository: TanStack/ai

Length of output: 105

🌐 Web query:

Google Gemini Live API response structure serverContent modelTurn inlineData audio base64

💡 Result:

The Google Gemini Live API uses a stateful WebSocket connection for real-time interactions. Server responses are sent as JSON objects conforming to BidiGenerateContentServerMessage protobuf structures. Key structure for model responses: - Top-level: {"serverContent": {...}, "turnComplete": boolean, ... other fields like usageMetadata} - serverContent (BidiGenerateContentServerContent): Contains modelTurn (Content), inputTranscription, outputTranscription, generationComplete (bool), etc. - modelTurn (Content): Has role: "model", parts: array of Part objects. - parts: Can include text (string), inlineData (Blob with data: base64-encoded bytes, mimeType e.g. "audio/pcm;rate=24000" for output audio), functionCall, etc. Audio responses (when response_modalities includes "AUDIO") are delivered in chunks via modelTurn.parts[].inlineData.data as base64-encoded raw PCM audio (24kHz, little-endian, 16-bit). Example processing (Python SDK): async for response in session.receive: if response.server_content and response.server_content.model_turn: for part in response.server_content.model_turn.parts: if part.inline_data: audio_data = part.inline_data.data # base64 str, decode to bytes JavaScript: const content = response.serverContent; if (content?.modelTurn?.parts) { for (const part of content.modelTurn.parts) { if (part.inlineData) { const audioData = part.inlineData.data; // base64 str } } } A single event can contain multiple parts (e.g., inlineData + transcript). Process all parts per event. turnComplete indicates end of model turn. Official protocol reference: https://ai.google.dev/api/live (WebSockets API reference detailing BidiGenerateContentServerContent and nested Content/Part/Blob). Capabilities guide with code examples: https://ai.google.dev/gemini-api/docs/live-api/capabilities

Citations:

1: https://ai.google.dev/gemini-api/docs/live-api/capabilities

2: https://ai.google.dev/gemini-api/docs/live-guide

3: https://ai.google.dev/api/live

4: https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-live

🏁 Script executed:

# Verify if response.data is a real field from the Google Gemini API by checking SDK usage patterns rg "response\.data|\.data\s*\?" packages/typescript/ai-gemini/src/realtime/adapter.ts -B 3 -A 3

Repository: TanStack/ai

Length of output: 432

🏁 Script executed:

# Look at the full picture - see all places where response object is used sed -n '235,250p' packages/typescript/ai-gemini/src/realtime/adapter.ts

Repository: TanStack/ai

Length of output: 648

🏁 Script executed:

# Check if there's a utils.ts or helper file in realtime folder ls -la packages/typescript/ai-gemini/src/realtime/

Repository: TanStack/ai

Length of output: 406

Fix audio decoding to read from response.serverContent.modelTurn.parts[].inlineData, not response.data.

The Google Gemini Live API delivers streamed audio as base64-encoded PCM in response.serverContent.modelTurn.parts[].inlineData.data, not in response.data. The current code UTF-8 encodes the base64 string instead of decoding the PCM bytes, breaking audio playback.

🔊 Directional fix

- if (response.data) { - // TODO: Decode chunk and play using an `AudioWorklet` or - // buffer them into an AudioContext - - playIncomingAudioChunk(textEncoder.encode(response.data).buffer) - - if (currentMode !== 'speaking') { - currentMode = 'speaking' - emit('mode_change', { mode: 'speaking' }) - } - } + for (const part of response.serverContent?.modelTurn?.parts ?? []) { + if (!part.inlineData?.data) continue + playIncomingAudioChunk(base64ToArrayBuffer(part.inlineData.data)) + + if (currentMode !== 'speaking') { + currentMode = 'speaking' + emit('mode_change', { mode: 'speaking' }) + } + }

Add a helper to decode base64 to ArrayBuffer:

function base64ToArrayBuffer(base64: string): ArrayBuffer { const binaryStr = atob(base64) const bytes = new Uint8Array(binaryStr.length) for (let i = 0; i < binaryStr.length; i++) { bytes[i] = binaryStr.charCodeAt(i) } return bytes.buffer }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 242 - 247, The code is incorrectly encoding response.data as UTF-8; instead extract the base64 PCM from response.serverContent.modelTurn.parts[].inlineData.data, decode it to binary and pass the resulting ArrayBuffer to playIncomingAudioChunk; implement a helper like base64ToArrayBuffer(base64: string) that uses atob and Uint8Array to produce an ArrayBuffer, and replace the textEncoder.encode(response.data).buffer call in the adapter (where playIncomingAudioChunk is invoked) with base64ToArrayBuffer(response.serverContent.modelTurn.parts[i].inlineData.data) (ensuring you handle the correct part index and null checks).

coderabbitai · 2026-04-15T08:59:37Z

packages/typescript/ai-gemini/src/realtime/adapter.ts

+        if (
+          inputTranscription &&
+          inputTranscription.text != undefined &&
+          inputTranscription.finished != undefined
+        ) {
+          if (inputTranscription.finished && currentMode !== 'thinking') {
+            currentMode = 'thinking'
+            emit('mode_change', { mode: 'thinking' })
+          }
+
+          emit('transcript', {
+            isFinal: inputTranscription.finished,
+            transcript: inputTranscription.text,
+            role: 'user',
+          })
+        }
+
+        if (
+          outputTranscription &&
+          outputTranscription.text != undefined &&
+          outputTranscription.finished != undefined
+        ) {
+          emit('transcript', {
+            isFinal: outputTranscription.finished,
+            transcript: outputTranscription.text,
+            role: 'assistant',
+          })
+        }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat packages/typescript/ai-gemini/src/realtime/adapter.ts | head -300 | tail -100

Repository: TanStack/ai

Length of output: 2809

🏁 Script executed:

wc -l packages/typescript/ai-gemini/src/realtime/adapter.ts

Repository: TanStack/ai

Length of output: 112

🏁 Script executed:

sed -n '240,290p' packages/typescript/ai-gemini/src/realtime/adapter.ts

Repository: TanStack/ai

Length of output: 1560

🏁 Script executed:

# Find type definition for BidiGenerateContentTranscription rg "BidiGenerateContentTranscription" --type ts -A 5

Repository: TanStack/ai

Length of output: 37

🏁 Script executed:

# Check imports to understand where the type comes from sed -n '1,50p' packages/typescript/ai-gemini/src/realtime/adapter.ts

Repository: TanStack/ai

Length of output: 1310

🏁 Script executed:

# Find `@google/genai` types find . -name "*.d.ts" -o -name "*.ts" | xargs rg "inputTranscription|outputTranscription" -B 3 -A 3 2>/dev/null | head -100

Repository: TanStack/ai

Length of output: 3365

🏁 Script executed:

# Check if there are any type definition files or documentation in node_modules find node_modules/@google/genai -name "*.d.ts" 2>/dev/null | head -5

Repository: TanStack/ai

Length of output: 37

🏁 Script executed:

# Look for type definitions in the codebase rg "type.*Transcription|interface.*Transcription" --type ts

Repository: TanStack/ai

Length of output: 8312

🏁 Script executed:

# Search for any documentation or comments about the transcription payload rg "finished" packages/typescript/ai-gemini/src/realtime/adapter.ts -B 3 -A 3

Repository: TanStack/ai

Length of output: 900

🏁 Script executed:

# Check package.json to see the Google GenAI version cat packages/typescript/ai-gemini/package.json | grep -A 2 -B 2 "@google/genai"

Repository: TanStack/ai

Length of output: 139

🌐 Web query:

Google GenAI live API BidiGenerateContentTranscription payload schema finished field

💡 Result:

The BidiGenerateContentTranscription message in Google's Gemini Live API (part of the GenAI bidirectional streaming API) is used for transcribing audio input or output in real-time conversations over WebSockets. It appears in BidiGenerateContentServerContent as inputTranscription or outputTranscription. Schema (from official protobuf definitions and API reference): - text: string (Transcription text. Output only.) No "finished" field exists in the core protobuf schema for BidiGenerateContentTranscription across Gemini Developer API, Vertex AI, and client libraries (Dart, JS SDKs). Official docs confirm only the 'text' field. Note: Some SDKs (e.g., JS genai) define a 'finished: bool' field expecting it to signal transcription turn completion, but it's not sent by the server (per open GitHub issue #1429, dated 2026-03-23). Developers must use alternative signals like turnComplete: true or outputTranscription arrival. Vertex AI docs mention a 'finished' bool ("The bool indicates the end of the transcription"), but this appears undocumented/inconsistent with protobuf; primary Gemini API docs do not include it. To enable transcriptions, include empty AudioTranscriptionConfig objects in BidiGenerateContentSetup: input_audio_transcription: {} or output_audio_transcription: {} (no fields; languageCodes supported in some backends like Vertex AI). Primary sources: Google AI for Developers Live API reference (ai.google.dev/api/live), protobuf (github.com/googleapis/googleapis), Vertex AI multimodal-live reference.

Citations:

1: https://ai.google.dev/api/live

2: https://docs.cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-live

3: https://ai.google.dev/gemini-api/docs/multimodal-live

4: https://pub.dev/documentation/google_cloud_ai_generativelanguage_v1beta/latest/generativelanguage/BidiGenerateContentServerContent-class.html

5: https://pub.dev/documentation/google_cloud_ai_generativelanguage_v1beta/latest/generativelanguage/BidiGenerateContentSetup-class.html

6: https://pub.dev/documentation/google_cloud_ai_generativelanguage_v1beta/latest/generativelanguage/BidiGenerateContentTranscription-class.html

7: https://firebase.google.com/docs/ai-logic/live-api/configuration

🌐 Web query:

site:ai.google.dev live API transcription payload structure

💡 Result:

The most prominent and up-to-date (as of 2026) live API for transcription is Google's Gemini Live API, which supports real-time audio input over WebSockets and provides input/output transcriptions. It uses a stateful WebSocket connection at wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent. Key payload structures from official docs: 1. Initial setup message (BidiGenerateContentSetup, sent first after connection): { "model": "string", // e.g., "gemini-3.1-flash-live-preview" "generationConfig": { "candidateCount": integer, "maxOutputTokens": integer, "temperature": number, "topP": number, "topK": integer, "presencePenalty": number, "frequencyPenalty": number, "responseModalities": ["AUDIO"], // or others "speechConfig": object }, "inputAudioTranscription": {}, // Enables input transcription (empty config) "outputAudioTranscription": {}, // Enables output transcription // Optional: systemInstruction, tools, sessionResumption, etc. } Clients wait for server SetupComplete before sending more. 2. Real-time input for live audio (BidiGenerateContentRealtimeInput): Sent as JSON over WS for streaming audio chunks: { "realtimeInput": { "mediaChunks": [ { "data": "base64-encoded-raw-16bit-PCM-audio-chunk", "mimeType": "audio/pcm;rate=16000" // Specify sample rate } ] } } Audio must be raw little-endian 16-bit PCM (input up to 16kHz, output 24kHz). Multiple chunks can be sent concurrently with text/video. 3. Server responses include transcriptions independently: - inputTranscription: { "text": "transcribed user input" } - outputTranscription: { "text": "transcribed model output" } Other messages: clientContent for text/context, toolResponse for functions. For dedicated STT (not conversational), Google Cloud Speech-to-Text uses gRPC/REST streamingRecognize with audio chunks in StreamingRecognizeRequest, but Gemini Live API is the current "live API transcription" focus per recent docs. Examples: - Python SDK: session.send_realtime_input(audio=types.Blob(data=chunk, mime_type="audio/pcm;rate=16000")) - JS: session.sendRealtimeInput({ audio: { data: chunk.toString('base64'), mimeType: 'audio/pcm;rate=16000' } })

Citations:

1: https://ai.google.dev/api/live

2: https://ai.google.dev/gemini-api/docs/live-api/capabilities

3: https://ai.google.dev/gemini-api/docs/live-guide

4: https://ai.google.dev/gemini-api/docs/live

🏁 Script executed:

rg "turnComplete|finished" packages/typescript/ai-gemini/src/realtime/adapter.ts

Repository: TanStack/ai

Length of output: 385

🏁 Script executed:

# Check if there are any test files or issues documented find packages/typescript/ai-gemini -name "*.test.ts" -o -name "*.spec.ts" | head -5

Repository: TanStack/ai

Length of output: 226

Remove the finished field checks—Gemini never sends this field in transcription payloads.

The server's BidiGenerateContentTranscription message contains only a text field. The finished field does not exist in the official Google GenAI API schema, so the guards at lines 254–281 always fail and transcript events never fire.

Use response.serverContent?.turnComplete to signal transcription completion instead, as already done elsewhere in the code.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 254 - 281, The transcription guard incorrectly checks inputTranscription.finished/outputTranscription.finished (Gemini never sends `finished`), causing transcript events to be skipped; update the checks to only verify presence of text (inputTranscription.text/outputTranscription.text) and use response.serverContent?.turnComplete to determine finality (pass that value as isFinal), and preserve the currentMode switch to 'thinking' and emit('mode_change') when appropriate; locate this logic around inputTranscription/outputTranscription and emit('transcript') to apply the change.

coderabbitai · 2026-04-15T08:59:37Z

packages/typescript/ai-gemini/src/realtime/adapter.ts

+    sendToolResult(callId: string, result: string) {
+      session.sendToolResponse({
+        functionResponses: {
+          id: callId,
+          response: {
+            result
+          }
+        }
+      })


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's verify the actual code at the specified location cat -n packages/typescript/ai-gemini/src/realtime/adapter.ts | sed -n '510,525p'

Repository: TanStack/ai

Length of output: 562

🏁 Script executed:

# Check if there are type definitions or schema files for the Gemini API fd -e ts -e tsx | xargs grep -l "functionResponses" | head -20

Repository: TanStack/ai

Length of output: 167

🏁 Script executed:

# Check Google Gemini API documentation or check for any local schema/type definitions find . -type f \( -name "*.ts" -o -name "*.tsx" \) | xargs grep -l "sendToolResponse" | head -20

Repository: TanStack/ai

Length of output: 110

🏁 Script executed:

# Check the test file to see how functionResponses is expected to be used cat packages/typescript/ai-gemini/tests/gemini-adapter.test.ts | grep -A 20 -B 5 "functionResponses"

Repository: TanStack/ai

Length of output: 1077

🏁 Script executed:

# Search for type definitions of sendToolResponse and functionResponses rg "functionResponses|sendToolResponse" -A 5 -B 2 --type ts

Repository: TanStack/ai

Length of output: 1953

🏁 Script executed:

# Look for type definitions in node_modules or package definitions find . -name "*.d.ts" | xargs grep -l "functionResponses" 2>/dev/null | head -10

Repository: TanStack/ai

Length of output: 37

🏁 Script executed:

# Find where the session object comes from in the adapter head -50 packages/typescript/ai-gemini/src/realtime/adapter.ts

Repository: TanStack/ai

Length of output: 1310

🏁 Script executed:

# Search for session type definition or creation rg "session.*sendToolResponse|new.*Session|session\s*:" packages/typescript/ai-gemini/src/realtime/adapter.ts -B 5 -A 2

Repository: TanStack/ai

Length of output: 278

🏁 Script executed:

# Look for Gemini API imports rg "import.*google|from.*google|@google" packages/typescript/ai-gemini/src/realtime/adapter.ts

Repository: TanStack/ai

Length of output: 163

🏁 Script executed:

# Check package.json to see what Google SDK dependencies are used cat packages/typescript/ai-gemini/package.json | grep -A 5 -B 5 dependencies

Repository: TanStack/ai

Length of output: 273

🏁 Script executed:

# Look for Google GenAI type definitions in node_modules find node_modules/@google/genai -name "*.d.ts" 2>/dev/null | head -5

Repository: TanStack/ai

Length of output: 37

🏁 Script executed:

# Try to find the sendToolResponse type definition rg "sendToolResponse" node_modules/@google/genai -A 10 2>/dev/null | head -50

Repository: TanStack/ai

Length of output: 37

🏁 Script executed:

# Search for other functionResponses usage patterns in the codebase rg "functionResponses.*\[|functionResponses.*\{" --type ts -A 2 -B 2

Repository: TanStack/ai

Length of output: 483

🏁 Script executed:

# Check if there are any integration examples or tests that show the correct format find . -type f \( -name "*.test.ts" -o -name "*.spec.ts" \) | xargs grep -l "sendToolResponse\|functionResponses" 2>/dev/null

Repository: TanStack/ai

Length of output: 115

🌐 Web query:

Google Gemini API Live functionResponses schema sendToolResponse

💡 Result:

In Gemini Live API (WebSockets), the payload type you send back when the server asks you to run tool calls is BidiGenerateContentToolResponse, whose JSON/field schema is: BidiGenerateContentToolResponse - functionResponses[]: array of FunctionResponse Each FunctionResponse object is matched back to the corresponding server-issued tool call by its id. The FunctionResponse shape (as used by the Google GenAI SDKs) is: - id: string (required) — the id of the ToolCall.FunctionCall from the model - name: string — the function name - response: object — your tool’s JSON result payload Example (SDK-style, conceptual): - Build a list of FunctionResponse objects (one per function call) - Call session.send_tool_response(function_responses=...) Example (WebSocket reference shape, conceptual): { "toolResponse": { "functionResponses": [ { "id": "<tool_call_id>", "name": "<function_name>", "response": {"result": "ok"} } ] } } Key constraints from official docs: - The Live API uses dedicated bidi messages for tool calling (ToolCall from server, ToolResponse back to client). - You must include the matching id in each FunctionResponse so the API can map each result to the correct FunctionCall. - Live API doesn’t support automatic tool response handling; you must send the responses yourself via session.send_tool_response. Related guidance for “functionResponses” naming: the WebSockets API reference lists BidiGenerateContentToolResponse.function_responses[] (snake_case) while the higher-level SDK examples use functionResponses (camelCase) in JavaScript objects.

Citations:

1: https://ai.google.dev/api/live

2: https://ai.google.dev/gemini-api/docs/live-tools?hl=id

3: https://docs.cloud.google.com/vertex-ai/generative-ai/docs/live-api/tools

4: https://github.com/googleapis/python-genai/blob/main/google/genai/live.py

5: https://ai.google.dev/gemini-api/docs/function-calling

🌐 Web query:

@google/genai sendToolResponse functionResponses array documentation

💡 Result:

The sendToolResponse method in the @google/genai JavaScript/TypeScript SDK is used in Live API sessions to send responses back to the model after it requests tool/function calls. It takes an object with a functionResponses property, which is an array of function response objects. Key details from official documentation and examples: - Usage context: Called on a Live session object (returned from ai.live.connect or similar). - Signature: session.sendToolResponse({ functionResponses: functionResponses }) - functionResponses array: Each element is an object with: - name: string - The name of the function/tool called by the model (matches fc.name from the tool call). - id: string - The unique ID from the model's tool call (fc.id), required for mapping. - response: object - The result of executing the function (e.g., { result: "ok" } or any JSON-serializable output). Example from Google AI for Developers docs: if (response.toolCall) { const functionResponses = []; for (const fc of response.toolCall.functionCalls) { const result = myToolFunction(fc.args); functionResponses.push({ name: fc.name, id: fc.id, response: { result } }); } session.sendToolResponse({ functionResponses }); } This is primarily for the experimental Live API (e.g., Gemini Live models like gemini-live-2.5-flash). For standard generateContent, function responses are sent via content history with functionResponse parts. The full TypeScript interfaces (e.g., FunctionResponse) are documented at https://googleapis.github.io/js-genai/release_docs/, where functionResponse in Part includes name, response, and id fields matching the examples.

Citations:

1: https://ai.google.dev/gemini-api/docs/live-api/get-started-sdk

2: https://googleapis.github.io/js-genai/release%5Fdocs/index.html

3: https://github.com/googleapis/python-genai/blob/main/google/genai/live.py

4: https://googleapis.github.io/js-genai/release_docs/classes/live.Live.html

🌐 Web query:

Google Gemini real-time API tool response payload format

💡 Result:

The Google Gemini real-time API (Live API) uses WebSocket messages for tool responses. When the model needs to call a tool (function), it sends a BidiGenerateContentToolCall message containing an array of FunctionCall objects. Each FunctionCall has: - name: string (the function name) - args: object (JSON parameters) - id: string (unique ID for matching responses) The client executes the functions and responds with a BidiGenerateContentToolResponse message containing an array of FunctionResponse objects, each matching by id: - id: string - name: string - response: object (JSON result from function execution) Example Python SDK handling (from official docs): function_response = types.FunctionResponse( id=fc.id, name=fc.name, response={"result": "ok"} ) This structure aligns with standard Gemini function calling but uses dedicated WebSocket messages (BidiGenerateContentToolCall/Response) instead of content parts in generateContent responses. For full protobuf details, see ai.google.dev/api/live and Dart API docs reflecting the schema.

Citations:

1: https://ai.google.dev/api/live

2: https://ai.google.dev/gemini-api/docs/live-tools?hl=id

3: https://pub.dev/documentation/google_cloud_ai_generativelanguage_v1beta/latest/generativelanguage/BidiGenerateContentToolCall-class.html

4: https://pub.dev/documentation/google_cloud_ai_generativelanguage_v1beta/latest/generativelanguage/BidiGenerateContentToolResponse-class.html

5: https://ai.google.dev/gemini-api/docs/function-calling

6: https://pub.dev/documentation/google_cloud_ai_generativelanguage_v1beta/latest/generativelanguage/BidiGenerateContentServerMessage-class.html

🏁 Script executed:

# Check if the name field is being used elsewhere or if it's optional rg "\.name" packages/typescript/ai-gemini/src/realtime/adapter.ts | grep -i tool

Repository: TanStack/ai

Length of output: 86

Wrap functionResponses in an array.

The Gemini Live API expects functionResponses as an array of FunctionResponse objects, not a single object. The current code sends the wrong payload structure, which will fail server-side tool result matching.

🛠️ Proposed fix

sendToolResult(callId: string, result: string) { session.sendToolResponse({ - functionResponses: { - id: callId, - response: { - result - } - } + functionResponses: [ + { + id: callId, + response: { + result, + }, + }, + ], }) },

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 512 - 520, The sendToolResult implementation in sendToolResult is sending functionResponses as a single object but the Gemini Live API expects an array of FunctionResponse objects; update the call to session.sendToolResponse so that the functionResponses property is an array (e.g., [ { id: callId, response: { result } } ]) and adjust typing if necessary to match the FunctionResponse[] shape used elsewhere.

Added gemini ephemeral realtime token

d4874c6

nikas-belogolov and others added 2 commits March 29, 2026 13:22

added initial realtime adapter

bb7d7b2

ci: apply automated fixes

69ae10c

Merge branch 'main' into feat/gemini-realtime-chat

f40f0c8

nikas-belogolov added 2 commits April 13, 2026 22:44

Added gemini adapter, refactored realtime client

8d4097f

Added gemini live option to example

dadfae9

nikas-belogolov marked this pull request as ready for review April 13, 2026 19:46

nikas-belogolov requested a review from a team April 13, 2026 19:46

coderabbitai bot reviewed Apr 13, 2026

View reviewed changes

Fixed refactoring of realtime client

0711441

coderabbitai bot reviewed Apr 13, 2026

View reviewed changes

Bug fixes, added go_away event, more gemini live provider config options

5bd8d46

coderabbitai bot reviewed Apr 15, 2026

View reviewed changes

Uh oh!

Conversation

nikas-belogolov commented Mar 29, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎯 Changes

✅ Checklist

🚀 Release Impact

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

nikas-belogolov commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nx-cloud bot commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Nx Cloud AI Fix Eligible

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nikas-belogolov commented Mar 29, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 29, 2026 •

edited

Loading

nikas-belogolov commented Mar 29, 2026 •

edited

Loading

nx-cloud bot commented Mar 29, 2026 •

edited

Loading