Local runtime APIs
On-device LLM / TTS / STT / VAD over WebGPU — engine, AI SDK provider, CLI, and React hooks.
Runs the hand-written WebGPU compute kernels from the webml-community HF Spaces (LFM2.5, Gemma 4) under Bun's headless WebGPU (Metal on macOS) or in the browser. Same kernels, unmodified — just no server in front of them.
bun add @valora-ai/ai-sdk @valora-ai/react @valora-ai/gemma
# Add provider packages as you use them:
bun add @valora-ai/lfm2 @valora-ai/kokoro @valora-ai/moonshine-stt @valora-ai/silero-vadModels
| id | engine | repo |
|---|---|---|
lfm2.5-230m | lfm2 | LiquidAI/LFM2.5-230M-GGUF |
lfm2.5-350m | lfm2 | LiquidAI/LFM2.5-350M-GGUF |
lfm2.5-1.2b | lfm2 | LiquidAI/LFM2.5-1.2B-Instruct-GGUF |
lfm2.5-1.2b-thinking | lfm2 | LiquidAI/LFM2.5-1.2B-Thinking-GGUF |
gemma-4-e2b | gemma | google/gemma-4-E2B-it-qat-mobile-transformers |
Speech: Kokoro-82M (TTS, via transformers.js) and Whisper / LFM2.5-Audio (STT). VAD is Silero.
Valora keeps model families in separate provider packages. Install only the providers your app imports.
Requires
- Bun ≥ 1.3.14 for the repo CLI.
- A WebGPU-capable GPU.
CLI
Run the WebGPU kernels under Bun's headless WebGPU:
bun run list # list models
bun run chat # pick a model, chat in terminal
bun run chat lfm2.5-350m # chat with a specific model
bun run serve lfm2.5-350m # OpenAI-compatible endpoint on :8080Point any OpenAI client at http://localhost:8080/v1. Gated/private HF repos:
set HF_TOKEN.
AI SDK provider (in-process, no server)
A native AI SDK 7 (LanguageModelV3) provider runs the engines directly —
no HTTP hop, even locally:
import { generateText, streamText } from 'ai';
import { valora } from '@valora-ai/ai-sdk';
const { text } = await generateText({ model: valora('gemma-4-e2b'), prompt: 'hi' });
const result = streamText({ model: valora('lfm2.5-350m'), prompt: 'Count to 5.' });
for await (const chunk of result.textStream) console.log(chunk);valora('id') is the default registry path for built-in model ids. For browser
apps that need app-owned ids, private/gated Hugging Face repos, or one model
configuration shared with voice, create a configured provider and adapt it into
AI SDK:
import { generateText } from 'ai';
import { createValora } from '@valora-ai/ai-sdk';
import { createLfm2Provider } from '@valora-ai/lfm2/provider';
const lfm2 = createLfm2Provider({
auth: { type: 'bearer', token: hfToken },
models: {
'team-chat': {
source: {
type: 'huggingface',
repo: 'LiquidAI/LFM2.5-350M-GGUF',
revision: 'main',
file: 'LFM2.5-350M-Q4_0.gguf',
},
},
},
});
const configuredValora = createValora({ languageModels: [lfm2] });
await generateText({ model: configuredValora('team-chat'), prompt: 'hi' });The same lfm2 object exposes native Valora engines through
lfm2.languageModel('team-chat'); @valora-ai/voice consumes that directly.
Bearer auth belongs in provider/load configuration. accessToken still works as
a compatibility alias, but secrets should not be stored on ModelCard.meta.
The same package exposes explicit audio adapters:
import {
createValoraSpeechModel,
createValoraTranscriptionModel,
} from '@valora-ai/ai-sdk';
const speech = createValoraSpeechModel('kokoro', async () => speechEngine);
const transcription = createValoraTranscriptionModel('whisper', async () => sttEngine);Speech returns WAV bytes. Transcription accepts mono 16-bit PCM WAV in this
first pass, decodes it to Float32Array, and calls the local TranscribeEngine.
React hooks (in-browser, WebGPU)
import { createGemmaProvider } from '@valora-ai/gemma/provider';
import {
createValoraRuntime,
ValoraRuntimeProvider,
useLocalModel,
useLocalRealtime,
useLocalTTS,
useTranscribe,
useVAD,
} from '@valora-ai/react/local';
const runtime = createValoraRuntime({
languageModels: [createGemmaProvider()],
});
function Voice() {
const { status, send } = useLocalModel('gemma-4-e2b');
const { speak } = useLocalTTS();
const { transcribe } = useTranscribe();
useVAD({ onSpeechEnd: (audio) => transcribe(audio) }); // Float32 @ 16kHz
}
export default function App() {
return (
<ValoraRuntimeProvider runtime={runtime}>
<Voice />
</ValoraRuntimeProvider>
);
}useLocalModel('gemma-4-e2b') can still run through the legacy static browser
loader when no runtime provider is mounted. Mount a ValoraRuntimeProvider
when the app needs configured model sources, bearer auth, custom ids, or one
provider setup shared with local voice. Hooks need a WebGPU browser and a
bundler (Vite / Next client component). These four hooks are the same native
engine contracts @valora-ai/voice's createVoiceAgent expects.
useLocalRealtime(agent) is the hook for an assembled VoiceAgent: it returns
the session snapshot plus connect, disconnect, sendText, interrupt,
mute, and unlock, while keeping capture and inference local.
See the local API reference.