@valora-ai/voice

Auto-generated from JSDoc in packages/voice/src/mod.ts. Do not edit — run npm run docs:api.

createLocalRealtimeSession

function

function createLocalRealtimeSession(agent: VoiceAgent): LocalRealtimeSession

createSpeaker

function

function createSpeaker(deps: SpeakerDeps): Speaker

createStore

function

function createStore(initial: T): Store<T>

createTurns

function

function createTurns(now: () => number): Turns

createVoiceAgent

function

function createVoiceAgent(opts: VoiceAgentOptions): VoiceAgent

BrowserPlayer

class

class BrowserPlayer

`level(): number`

Instantaneous output amplitude 0–1 (RMS of the playing buffer), lightly shaped.

`resume(): Promise<void>`

Resume a suspended context (autoplay policy). Best-effort; needs a user gesture.

`play(pcm: Float32Array, sampleRate: number): Promise<void>`

alwaysEndOfTurn

variable

const alwaysEndOfTurn: TurnDetector

heuristicTurnDetector

variable

const heuristicTurnDetector: TurnDetector

Heuristic: not end-of-turn if the transcript trails off mid-clause.

Loadable

interface

interface Loadable

A loadable engine: the uniform create(repo?, opts?) factory the STT/LLM/TTS adapters share. E is the engine interface produced (TranscriptionModel, LanguageModel, SpeechModel). Two-plus adapters satisfy each — the seam is real, not hypothetical.

`create(repo?: string, opts?: LoadOpts): Promise<E>`

LocalRealtimeSession

interface

interface LocalRealtimeSession

`connect(): Promise<void>`

`disconnect(): Promise<void>`

`startAudioCapture(): Promise<void>`

`stopAudioCapture(): Promise<void>`

`sendText(text: string): void`

`interrupt(): void`

`mute(muted: boolean): void`

`unlock(): void`

`subscribe(onChange: () => void): () => void`

`getSnapshot(): LocalRealtimeSnapshot`

LocalRealtimeSnapshot

interface

interface LocalRealtimeSnapshot

`status: LocalRealtimeStatus`

`messages: Segment[]`

PlayerEngine

interface

interface PlayerEngine

Audio playback is an engine too — keeps the core free of AudioContext.

`play(pcm: Float32Array, sampleRate: number): Promise<void>`

Resolve when the clip finishes (or is stopped).

`stop(): void`

`resume(): void | Promise<void>`

Optional: unlock/resume output (e.g. a suspended AudioContext).

`level(): number`

Optional: instantaneous output amplitude 0–1 (RMS of what's currently playing). Lets the UI react to the agent's OWN voice while speaking (vs the mic level).

Segment

interface

interface Segment

A streamed transcript segment (cf. ReceivedTranscriptionSegment).

`id: string`

`text: string`

`final: boolean`

`at: number`

`role: user | agent`

Speaker

interface

interface Speaker

`feed(chunk: string, token: number): void`

Feed streamed LLM text; complete sentences are queued for synth+playback.

`flush(token: number): void`

Queue whatever partial sentence remains (call when the LLM stream ends).

`drain(): Promise<void>`

Resolve once everything queued has finished playing, then restore sensitivity.

`stop(): void`

Abandon all queued/playing audio now.

Store

interface

interface Store

`get(): T`

`set(next: T | ((prev: T) => T)): void`

`subscribe(onChange: () => void): () => void`

Subscribe to changes; returns an unsubscribe fn. (useSyncExternalStore shape.)

Turn

interface

interface Turn

`token: number`

`active: boolean`

True while this is the live turn; false once a newer turn began or it was abandoned.

`metrics: VoiceMetrics`

Current metrics for this turn ({ firstAudioMs, lastTurnMs }).

`markFirstAudio(): void`

Stamp first-audio latency (now - turn start). Idempotent-safe to call once.

`end(): VoiceMetrics`

Stamp turn end; returns the final metrics for this turn.

TurnDetector

interface

interface TurnDetector

Decides whether a transcribed utterance ends the user's turn, or whether they're mid-thought and we should keep listening (the "semantic turn detection" idea). Default is always-true (pure VAD-silence turn-taking). Swap in a model-backed implementation for fewer mid-pause interruptions.

`isEndOfTurn(transcript: string): boolean | Promise<boolean>`

Turns

interface

interface Turns

`current: Turn | null`

The current live turn handle, or null if none active.

`begin(): Turn`

Begin a new turn (invalidates any previous/current). Returns its handle.

`abandon(): void`

Bump the token WITHOUT starting a turn — abandons the current one (barge-in/interrupt/stop).

`isActive(token: number): boolean`

Is this token the current live turn?

VoiceAction

interface

interface VoiceAction

`id: string`

`description?: string`

`match(text: string): boolean | Promise<boolean>`

`execute(ctx: VoiceActionContext): VoiceActionResult | Promise<VoiceActionResult>`

VoiceActionContext

interface

interface VoiceActionContext

`text: string`

`pendingText: string`

VoiceActionResult

interface

interface VoiceActionResult

`handled: boolean`

`reply?: string`

`data?: unknown`

VoiceAgent

interface

interface VoiceAgent

`state: VoiceState`

`subscribe(onChange: () => void): () => void`

useSyncExternalStore-compatible reactive snapshot — the sole notification surface.

`getSnapshot(): VoiceSnapshot`

`start(): Promise<void>`

`stop(): Promise<void>`

`dispose(): Promise<void>`

`interrupt(): void`

`mute(muted: boolean): void`

`sendText(text: string): void`

Inject a typed user turn — runs the same lifecycle as a spoken one (mixed modality).

`speak(text: string): Promise<void>`

Make the agent speak arbitrary text now, outside a turn. Resolves when playback ends.

`unlock(): void`

Unlock audio output — call from a user gesture (resumes a suspended AudioContext).

`reset(): void`

Abandon any turn, clear the transcript + metrics + buffered text; keep engines/state idle.

VoiceAgentOptions

interface

interface VoiceAgentOptions

interface VoiceEngines

interface VoiceSnapshot

Reactive snapshot — always-current value for framework binding (useSyncExternalStore).

type LocalRealtimeStatus = disconnected | connecting | connected

VADEngine

type alias

type VADEngine = VadModel

VoiceActionEvent

type alias

type VoiceActionEvent = { type: action-start; at: number; id: string; text: string } | { type: action-end; at: number; id: string; handled: boolean; data?: unknown } | { type: action-error; at: number; id: string; error: Error }

VoiceEvent

type alias

type VoiceEvent = { type: state; at: number; state: VoiceState; previousState: VoiceState } | { type: speech-start; at: number } | { type: speech-end; at: number } | { type: turn-start; at: number; token: number } | { type: turn-end; at: number; token: number; metrics: VoiceMetrics } | { type: segment; at: number; segment: Segment } | { type: first-audio; at: number; token: number; metrics: VoiceMetrics } | { type: barge-in; at: number } | { type: interrupt; at: number } | { type: error; at: number; error: Error } | VoiceActionEvent

VoiceState

type alias

type VoiceState = loading | idle | listening | thinking | speaking

The single state enum that drives the whole UI (the agent state).

@valora-ai/voice

On this page