Valora
API Reference@valora-ai/voice

@valora-ai/voice

Framework-agnostic voice-agent runtime — the DOM-free core (mod.ts).

Auto-generated from JSDoc in packages/voice/src/mod.ts. Do not edit — run npm run docs:api.

createLocalRealtimeSession

function
function createLocalRealtimeSession(agent: VoiceAgent): LocalRealtimeSession

createSpeaker

function
function createSpeaker(deps: SpeakerDeps): Speaker

createStore

function
function createStore(initial: T): Store<T>

createTurns

function
function createTurns(now: () => number): Turns

createVoiceAgent

function
function createVoiceAgent(opts: VoiceAgentOptions): VoiceAgent

BrowserPlayer

class
class BrowserPlayer

level(): number

Instantaneous output amplitude 0–1 (RMS of the playing buffer), lightly shaped.

resume(): Promise<void>

Resume a suspended context (autoplay policy). Best-effort; needs a user gesture.

play(pcm: Float32Array, sampleRate: number): Promise<void>

stop(): void

alwaysEndOfTurn

variable
const alwaysEndOfTurn: TurnDetector

heuristicTurnDetector

variable
const heuristicTurnDetector: TurnDetector

Heuristic: not end-of-turn if the transcript trails off mid-clause.

Loadable

interface
interface Loadable

A loadable engine: the uniform create(repo?, opts?) factory the STT/LLM/TTS adapters share. E is the engine interface produced (TranscriptionModel, LanguageModel, SpeechModel). Two-plus adapters satisfy each — the seam is real, not hypothetical.

create(repo?: string, opts?: LoadOpts): Promise<E>

LocalRealtimeSession

interface
interface LocalRealtimeSession

connect(): Promise<void>

disconnect(): Promise<void>

startAudioCapture(): Promise<void>

stopAudioCapture(): Promise<void>

sendText(text: string): void

interrupt(): void

mute(muted: boolean): void

unlock(): void

subscribe(onChange: () => void): () => void

getSnapshot(): LocalRealtimeSnapshot

LocalRealtimeSnapshot

interface
interface LocalRealtimeSnapshot

status: LocalRealtimeStatus

messages: Segment[]

PlayerEngine

interface
interface PlayerEngine

Audio playback is an engine too — keeps the core free of AudioContext.

play(pcm: Float32Array, sampleRate: number): Promise<void>

Resolve when the clip finishes (or is stopped).

stop(): void

resume(): void | Promise<void>

Optional: unlock/resume output (e.g. a suspended AudioContext).

level(): number

Optional: instantaneous output amplitude 0–1 (RMS of what's currently playing). Lets the UI react to the agent's OWN voice while speaking (vs the mic level).

Segment

interface
interface Segment

A streamed transcript segment (cf. ReceivedTranscriptionSegment).

id: string

text: string

final: boolean

at: number

role: user | agent

Speaker

interface
interface Speaker

feed(chunk: string, token: number): void

Feed streamed LLM text; complete sentences are queued for synth+playback.

flush(token: number): void

Queue whatever partial sentence remains (call when the LLM stream ends).

drain(): Promise<void>

Resolve once everything queued has finished playing, then restore sensitivity.

stop(): void

Abandon all queued/playing audio now.

Store

interface
interface Store

get(): T

set(next: T | ((prev: T) => T)): void

subscribe(onChange: () => void): () => void

Subscribe to changes; returns an unsubscribe fn. (useSyncExternalStore shape.)

Turn

interface
interface Turn

token: number

active: boolean

True while this is the live turn; false once a newer turn began or it was abandoned.

metrics: VoiceMetrics

Current metrics for this turn ({ firstAudioMs, lastTurnMs }).

markFirstAudio(): void

Stamp first-audio latency (now - turn start). Idempotent-safe to call once.

end(): VoiceMetrics

Stamp turn end; returns the final metrics for this turn.

TurnDetector

interface
interface TurnDetector

Decides whether a transcribed utterance ends the user's turn, or whether they're mid-thought and we should keep listening (the "semantic turn detection" idea). Default is always-true (pure VAD-silence turn-taking). Swap in a model-backed implementation for fewer mid-pause interruptions.

isEndOfTurn(transcript: string): boolean | Promise<boolean>

Turns

interface
interface Turns

current: Turn | null

The current live turn handle, or null if none active.

begin(): Turn

Begin a new turn (invalidates any previous/current). Returns its handle.

abandon(): void

Bump the token WITHOUT starting a turn — abandons the current one (barge-in/interrupt/stop).

isActive(token: number): boolean

Is this token the current live turn?

VoiceAction

interface
interface VoiceAction

id: string

description?: string

match(text: string): boolean | Promise<boolean>

execute(ctx: VoiceActionContext): VoiceActionResult | Promise<VoiceActionResult>

VoiceActionContext

interface
interface VoiceActionContext

text: string

pendingText: string

VoiceActionResult

interface
interface VoiceActionResult

handled: boolean

reply?: string

data?: unknown

VoiceAgent

interface
interface VoiceAgent

state: VoiceState

subscribe(onChange: () => void): () => void

useSyncExternalStore-compatible reactive snapshot — the sole notification surface.

getSnapshot(): VoiceSnapshot

start(): Promise<void>

stop(): Promise<void>

dispose(): Promise<void>

interrupt(): void

mute(muted: boolean): void

sendText(text: string): void

Inject a typed user turn — runs the same lifecycle as a spoken one (mixed modality).

speak(text: string): Promise<void>

Make the agent speak arbitrary text now, outside a turn. Resolves when playback ends.

unlock(): void

Unlock audio output — call from a user gesture (resumes a suspended AudioContext).

reset(): void

Abandon any turn, clear the transcript + metrics + buffered text; keep engines/state idle.

VoiceAgentOptions

interface
interface VoiceAgentOptions

systemPrompt?: string

System prompt for the LLM's single-turn replies. Default preserves the current voice assistant behavior.

listenThreshold?: number

speakThreshold?: number

bargeInGraceMs?: number

maxSegments?: number

maxTurnWaitMs?: number

Commit a buffered (mid-thought) turn after this much silence regardless of the detector — bounds the turn-detection wait. 0/Infinity disables. Default 2500ms.

actions?: VoiceAction[]

onEvent?: (event: VoiceEvent) => void

onError?: (e: Error) => void

VoiceEngines

interface
interface VoiceEngines

vad: VADEngine

stt: TranscriptionModel

llm?: LanguageModel

tts?: SpeechModel

player?: PlayerEngine

turnDetector?: TurnDetector

streamingStt?: StreamingSTT

VoiceSnapshot

interface
interface VoiceSnapshot

Reactive snapshot — always-current value for framework binding (useSyncExternalStore).

state: VoiceState

level: number

segments: Segment[]

muted: boolean

metrics: VoiceMetrics

LocalRealtimeStatus

type alias
type LocalRealtimeStatus = disconnected | connecting | connected

VADEngine

type alias
type VADEngine = VadModel

VoiceActionEvent

type alias
type VoiceActionEvent = { type: action-start; at: number; id: string; text: string } | { type: action-end; at: number; id: string; handled: boolean; data?: unknown } | { type: action-error; at: number; id: string; error: Error }

VoiceEvent

type alias
type VoiceEvent = { type: state; at: number; state: VoiceState; previousState: VoiceState } | { type: speech-start; at: number } | { type: speech-end; at: number } | { type: turn-start; at: number; token: number } | { type: turn-end; at: number; token: number; metrics: VoiceMetrics } | { type: segment; at: number; segment: Segment } | { type: first-audio; at: number; token: number; metrics: VoiceMetrics } | { type: barge-in; at: number } | { type: interrupt; at: number } | { type: error; at: number; error: Error } | VoiceActionEvent

VoiceState

type alias
type VoiceState = loading | idle | listening | thinking | speaking

The single state enum that drives the whole UI (the agent state).

On this page

createLocalRealtimeSessioncreateSpeakercreateStorecreateTurnscreateVoiceAgentBrowserPlayerlevel(): numberresume(): Promise<void>play(pcm: Float32Array, sampleRate: number): Promise<void>stop(): voidalwaysEndOfTurnheuristicTurnDetectorLoadablecreate(repo?: string, opts?: LoadOpts): Promise<E>LocalRealtimeSessionconnect(): Promise<void>disconnect(): Promise<void>startAudioCapture(): Promise<void>stopAudioCapture(): Promise<void>sendText(text: string): voidinterrupt(): voidmute(muted: boolean): voidunlock(): voidsubscribe(onChange: () => void): () => voidgetSnapshot(): LocalRealtimeSnapshotLocalRealtimeSnapshotstatus: LocalRealtimeStatusmessages: Segment[]PlayerEngineplay(pcm: Float32Array, sampleRate: number): Promise<void>stop(): voidresume(): void | Promise<void>level(): numberSegmentid: stringtext: stringfinal: booleanat: numberrole: user | agentSpeakerfeed(chunk: string, token: number): voidflush(token: number): voiddrain(): Promise<void>stop(): voidStoreget(): Tset(next: T | ((prev: T) => T)): voidsubscribe(onChange: () => void): () => voidTurntoken: numberactive: booleanmetrics: VoiceMetricsmarkFirstAudio(): voidend(): VoiceMetricsTurnDetectorisEndOfTurn(transcript: string): boolean | Promise<boolean>Turnscurrent: Turn | nullbegin(): Turnabandon(): voidisActive(token: number): booleanVoiceActionid: stringdescription?: stringmatch(text: string): boolean | Promise<boolean>execute(ctx: VoiceActionContext): VoiceActionResult | Promise<VoiceActionResult>VoiceActionContexttext: stringpendingText: stringVoiceActionResulthandled: booleanreply?: stringdata?: unknownVoiceAgentstate: VoiceStatesubscribe(onChange: () => void): () => voidgetSnapshot(): VoiceSnapshotstart(): Promise<void>stop(): Promise<void>dispose(): Promise<void>interrupt(): voidmute(muted: boolean): voidsendText(text: string): voidspeak(text: string): Promise<void>unlock(): voidreset(): voidVoiceAgentOptionssystemPrompt?: stringlistenThreshold?: numberspeakThreshold?: numberbargeInGraceMs?: numbermaxSegments?: numbermaxTurnWaitMs?: numberactions?: VoiceAction[]onEvent?: (event: VoiceEvent) => voidonError?: (e: Error) => voidVoiceEnginesvad: VADEnginestt: TranscriptionModelllm?: LanguageModeltts?: SpeechModelplayer?: PlayerEngineturnDetector?: TurnDetectorstreamingStt?: StreamingSTTVoiceSnapshotstate: VoiceStatelevel: numbersegments: Segment[]muted: booleanmetrics: VoiceMetricsLocalRealtimeStatusVADEngineVoiceActionEventVoiceEventVoiceState

Valora is local-first

No API key, no server — everything in this doc runs on-device.

Star on GitHub