@valora-ai/voice
Framework-agnostic voice-agent runtime — the DOM-free core (mod.ts).
Auto-generated from JSDoc in
packages/voice/src/mod.ts. Do not edit — runnpm run docs:api.
createLocalRealtimeSession
functionfunction createLocalRealtimeSession(agent: VoiceAgent): LocalRealtimeSessioncreateSpeaker
functionfunction createSpeaker(deps: SpeakerDeps): SpeakercreateStore
functionfunction createStore(initial: T): Store<T>createTurns
functionfunction createTurns(now: () => number): TurnscreateVoiceAgent
functionfunction createVoiceAgent(opts: VoiceAgentOptions): VoiceAgentBrowserPlayer
classclass BrowserPlayerlevel(): number
Instantaneous output amplitude 0–1 (RMS of the playing buffer), lightly shaped.
resume(): Promise<void>
Resume a suspended context (autoplay policy). Best-effort; needs a user gesture.
play(pcm: Float32Array, sampleRate: number): Promise<void>
stop(): void
alwaysEndOfTurn
variableconst alwaysEndOfTurn: TurnDetectorheuristicTurnDetector
variableconst heuristicTurnDetector: TurnDetectorHeuristic: not end-of-turn if the transcript trails off mid-clause.
Loadable
interfaceinterface LoadableA loadable engine: the uniform create(repo?, opts?) factory the STT/LLM/TTS adapters
share. E is the engine interface produced (TranscriptionModel, LanguageModel,
SpeechModel). Two-plus adapters satisfy each — the seam is real, not hypothetical.
create(repo?: string, opts?: LoadOpts): Promise<E>
LocalRealtimeSession
interfaceinterface LocalRealtimeSessionconnect(): Promise<void>
disconnect(): Promise<void>
startAudioCapture(): Promise<void>
stopAudioCapture(): Promise<void>
sendText(text: string): void
interrupt(): void
mute(muted: boolean): void
unlock(): void
subscribe(onChange: () => void): () => void
getSnapshot(): LocalRealtimeSnapshot
LocalRealtimeSnapshot
interfaceinterface LocalRealtimeSnapshotstatus: LocalRealtimeStatus
messages: Segment[]
PlayerEngine
interfaceinterface PlayerEngineAudio playback is an engine too — keeps the core free of AudioContext.
play(pcm: Float32Array, sampleRate: number): Promise<void>
Resolve when the clip finishes (or is stopped).
stop(): void
resume(): void | Promise<void>
Optional: unlock/resume output (e.g. a suspended AudioContext).
level(): number
Optional: instantaneous output amplitude 0–1 (RMS of what's currently playing). Lets the UI react to the agent's OWN voice while speaking (vs the mic level).
Segment
interfaceinterface SegmentA streamed transcript segment (cf. ReceivedTranscriptionSegment).
id: string
text: string
final: boolean
at: number
role: user | agent
Speaker
interfaceinterface Speakerfeed(chunk: string, token: number): void
Feed streamed LLM text; complete sentences are queued for synth+playback.
flush(token: number): void
Queue whatever partial sentence remains (call when the LLM stream ends).
drain(): Promise<void>
Resolve once everything queued has finished playing, then restore sensitivity.
stop(): void
Abandon all queued/playing audio now.
Store
interfaceinterface Storeget(): T
set(next: T | ((prev: T) => T)): void
subscribe(onChange: () => void): () => void
Subscribe to changes; returns an unsubscribe fn. (useSyncExternalStore shape.)
Turn
interfaceinterface Turntoken: number
active: boolean
True while this is the live turn; false once a newer turn began or it was abandoned.
metrics: VoiceMetrics
Current metrics for this turn ({ firstAudioMs, lastTurnMs }).
markFirstAudio(): void
Stamp first-audio latency (now - turn start). Idempotent-safe to call once.
end(): VoiceMetrics
Stamp turn end; returns the final metrics for this turn.
TurnDetector
interfaceinterface TurnDetectorDecides whether a transcribed utterance ends the user's turn, or whether they're mid-thought and we should keep listening (the "semantic turn detection" idea). Default is always-true (pure VAD-silence turn-taking). Swap in a model-backed implementation for fewer mid-pause interruptions.
isEndOfTurn(transcript: string): boolean | Promise<boolean>
Turns
interfaceinterface Turnscurrent: Turn | null
The current live turn handle, or null if none active.
begin(): Turn
Begin a new turn (invalidates any previous/current). Returns its handle.
abandon(): void
Bump the token WITHOUT starting a turn — abandons the current one (barge-in/interrupt/stop).
isActive(token: number): boolean
Is this token the current live turn?
VoiceAction
interfaceinterface VoiceActionid: string
description?: string
match(text: string): boolean | Promise<boolean>
execute(ctx: VoiceActionContext): VoiceActionResult | Promise<VoiceActionResult>
VoiceActionContext
interfaceinterface VoiceActionContexttext: string
pendingText: string
VoiceActionResult
interfaceinterface VoiceActionResulthandled: boolean
reply?: string
data?: unknown
VoiceAgent
interfaceinterface VoiceAgentstate: VoiceState
subscribe(onChange: () => void): () => void
useSyncExternalStore-compatible reactive snapshot — the sole notification surface.
getSnapshot(): VoiceSnapshot
start(): Promise<void>
stop(): Promise<void>
dispose(): Promise<void>
interrupt(): void
mute(muted: boolean): void
sendText(text: string): void
Inject a typed user turn — runs the same lifecycle as a spoken one (mixed modality).
speak(text: string): Promise<void>
Make the agent speak arbitrary text now, outside a turn. Resolves when playback ends.
unlock(): void
Unlock audio output — call from a user gesture (resumes a suspended AudioContext).
reset(): void
Abandon any turn, clear the transcript + metrics + buffered text; keep engines/state idle.
VoiceAgentOptions
interfaceinterface VoiceAgentOptionssystemPrompt?: string
System prompt for the LLM's single-turn replies. Default preserves the current voice assistant behavior.
listenThreshold?: number
speakThreshold?: number
bargeInGraceMs?: number
maxSegments?: number
maxTurnWaitMs?: number
Commit a buffered (mid-thought) turn after this much silence regardless of the detector — bounds the turn-detection wait. 0/Infinity disables. Default 2500ms.
actions?: VoiceAction[]
onEvent?: (event: VoiceEvent) => void
onError?: (e: Error) => void
VoiceEngines
interfaceinterface VoiceEnginesvad: VADEngine
stt: TranscriptionModel
llm?: LanguageModel
tts?: SpeechModel
player?: PlayerEngine
turnDetector?: TurnDetector
streamingStt?: StreamingSTT
VoiceSnapshot
interfaceinterface VoiceSnapshotReactive snapshot — always-current value for framework binding (useSyncExternalStore).
state: VoiceState
level: number
segments: Segment[]
muted: boolean
metrics: VoiceMetrics
LocalRealtimeStatus
type aliastype LocalRealtimeStatus = disconnected | connecting | connectedVADEngine
type aliastype VADEngine = VadModelVoiceActionEvent
type aliastype VoiceActionEvent = { type: action-start; at: number; id: string; text: string } | { type: action-end; at: number; id: string; handled: boolean; data?: unknown } | { type: action-error; at: number; id: string; error: Error }VoiceEvent
type aliastype VoiceEvent = { type: state; at: number; state: VoiceState; previousState: VoiceState } | { type: speech-start; at: number } | { type: speech-end; at: number } | { type: turn-start; at: number; token: number } | { type: turn-end; at: number; token: number; metrics: VoiceMetrics } | { type: segment; at: number; segment: Segment } | { type: first-audio; at: number; token: number; metrics: VoiceMetrics } | { type: barge-in; at: number } | { type: interrupt; at: number } | { type: error; at: number; error: Error } | VoiceActionEventVoiceState
type aliastype VoiceState = loading | idle | listening | thinking | speakingThe single state enum that drives the whole UI (the agent state).