Local runtime APIs

On-device LLM / TTS / STT / VAD over WebGPU — engine, AI SDK provider, CLI, and React hooks.

Runs the hand-written WebGPU compute kernels from the webml-community HF Spaces (LFM2.5, Gemma 4) under Bun's headless WebGPU (Metal on macOS) or in the browser. Same kernels, unmodified — just no server in front of them.

bun add @valora-ai/ai-sdk @valora-ai/react @valora-ai/gemma
# Add provider packages as you use them:
bun add @valora-ai/lfm2 @valora-ai/kokoro @valora-ai/moonshine-stt @valora-ai/silero-vad

Models

id	engine	repo
`lfm2.5-230m`	lfm2	LiquidAI/LFM2.5-230M-GGUF
`lfm2.5-350m`	lfm2	LiquidAI/LFM2.5-350M-GGUF
`lfm2.5-1.2b`	lfm2	LiquidAI/LFM2.5-1.2B-Instruct-GGUF
`lfm2.5-1.2b-thinking`	lfm2	LiquidAI/LFM2.5-1.2B-Thinking-GGUF
`gemma-4-e2b`	gemma	google/gemma-4-E2B-it-qat-mobile-transformers

Speech: Kokoro-82M (TTS, via transformers.js) and Whisper / LFM2.5-Audio (STT). VAD is Silero.

Valora keeps model families in separate provider packages. Install only the providers your app imports.

Requires

Bun ≥ 1.3.14 for the repo CLI.
A WebGPU-capable GPU.

CLI

Run the WebGPU kernels under Bun's headless WebGPU:

bun run list                 # list models
bun run chat                 # pick a model, chat in terminal
bun run chat lfm2.5-350m     # chat with a specific model
bun run serve lfm2.5-350m    # OpenAI-compatible endpoint on :8080

Point any OpenAI client at http://localhost:8080/v1. Gated/private HF repos: set HF_TOKEN.

AI SDK provider (in-process, no server)

A native AI SDK 7 (LanguageModelV3) provider runs the engines directly — no HTTP hop, even locally:

import { generateText, streamText } from 'ai';
import { valora } from '@valora-ai/ai-sdk';

const { text } = await generateText({ model: valora('gemma-4-e2b'), prompt: 'hi' });

const result = streamText({ model: valora('lfm2.5-350m'), prompt: 'Count to 5.' });
for await (const chunk of result.textStream) console.log(chunk);

valora('id') is the default registry path for built-in model ids. For browser apps that need app-owned ids, private/gated Hugging Face repos, or one model configuration shared with voice, create a configured provider and adapt it into AI SDK:

import { generateText } from 'ai';
import { createValora } from '@valora-ai/ai-sdk';
import { createLfm2Provider } from '@valora-ai/lfm2/provider';

const lfm2 = createLfm2Provider({
  auth: { type: 'bearer', token: hfToken },
  models: {
    'team-chat': {
      source: {
        type: 'huggingface',
        repo: 'LiquidAI/LFM2.5-350M-GGUF',
        revision: 'main',
        file: 'LFM2.5-350M-Q4_0.gguf',
      },
    },
  },
});

const configuredValora = createValora({ languageModels: [lfm2] });
await generateText({ model: configuredValora('team-chat'), prompt: 'hi' });

The same lfm2 object exposes native Valora engines through lfm2.languageModel('team-chat'); @valora-ai/voice consumes that directly. Bearer auth belongs in provider/load configuration. accessToken still works as a compatibility alias, but secrets should not be stored on ModelCard.meta.

The same package exposes explicit audio adapters:

import {
  createValoraSpeechModel,
  createValoraTranscriptionModel,
} from '@valora-ai/ai-sdk';

const speech = createValoraSpeechModel('kokoro', async () => speechEngine);
const transcription = createValoraTranscriptionModel('whisper', async () => sttEngine);

Speech returns WAV bytes. Transcription accepts mono 16-bit PCM WAV in this first pass, decodes it to Float32Array, and calls the local TranscribeEngine.

React hooks (in-browser, WebGPU)

import { createGemmaProvider } from '@valora-ai/gemma/provider';
import {
  createValoraRuntime,
  ValoraRuntimeProvider,
  useLocalModel,
  useLocalRealtime,
  useLocalTTS,
  useTranscribe,
  useVAD,
} from '@valora-ai/react/local';

const runtime = createValoraRuntime({
  languageModels: [createGemmaProvider()],
});

function Voice() {
  const { status, send } = useLocalModel('gemma-4-e2b');
  const { speak } = useLocalTTS();
  const { transcribe } = useTranscribe();
  useVAD({ onSpeechEnd: (audio) => transcribe(audio) }); // Float32 @ 16kHz
}

export default function App() {
  return (
    <ValoraRuntimeProvider runtime={runtime}>
      <Voice />
    </ValoraRuntimeProvider>
  );
}

useLocalModel('gemma-4-e2b') can still run through the legacy static browser loader when no runtime provider is mounted. Mount a ValoraRuntimeProvider when the app needs configured model sources, bearer auth, custom ids, or one provider setup shared with local voice. Hooks need a WebGPU browser and a bundler (Vite / Next client component). These four hooks are the same native engine contracts @valora-ai/voice's createVoiceAgent expects.

useLocalRealtime(agent) is the hook for an assembled VoiceAgent: it returns the session snapshot plus connect, disconnect, sendText, interrupt, mute, and unlock, while keeping capture and inference local.

See the local API reference.

Local runtime APIs

Models

Requires

CLI

AI SDK provider (in-process, no server)

React hooks (in-browser, WebGPU)

On this page