Valora

Local runtime APIs

On-device LLM / TTS / STT / VAD over WebGPU — engine, AI SDK provider, CLI, and React hooks.

Runs the hand-written WebGPU compute kernels from the webml-community HF Spaces (LFM2.5, Gemma 4) under Bun's headless WebGPU (Metal on macOS) or in the browser. Same kernels, unmodified — just no server in front of them.

bun add @valora-ai/ai-sdk @valora-ai/react @valora-ai/gemma
# Add provider packages as you use them:
bun add @valora-ai/lfm2 @valora-ai/kokoro @valora-ai/moonshine-stt @valora-ai/silero-vad

Models

idenginerepo
lfm2.5-230mlfm2LiquidAI/LFM2.5-230M-GGUF
lfm2.5-350mlfm2LiquidAI/LFM2.5-350M-GGUF
lfm2.5-1.2blfm2LiquidAI/LFM2.5-1.2B-Instruct-GGUF
lfm2.5-1.2b-thinkinglfm2LiquidAI/LFM2.5-1.2B-Thinking-GGUF
gemma-4-e2bgemmagoogle/gemma-4-E2B-it-qat-mobile-transformers

Speech: Kokoro-82M (TTS, via transformers.js) and Whisper / LFM2.5-Audio (STT). VAD is Silero.

Valora keeps model families in separate provider packages. Install only the providers your app imports.

Requires

  • Bun ≥ 1.3.14 for the repo CLI.
  • A WebGPU-capable GPU.

CLI

Run the WebGPU kernels under Bun's headless WebGPU:

bun run list                 # list models
bun run chat                 # pick a model, chat in terminal
bun run chat lfm2.5-350m     # chat with a specific model
bun run serve lfm2.5-350m    # OpenAI-compatible endpoint on :8080

Point any OpenAI client at http://localhost:8080/v1. Gated/private HF repos: set HF_TOKEN.

AI SDK provider (in-process, no server)

A native AI SDK 7 (LanguageModelV3) provider runs the engines directly — no HTTP hop, even locally:

import { generateText, streamText } from 'ai';
import { valora } from '@valora-ai/ai-sdk';

const { text } = await generateText({ model: valora('gemma-4-e2b'), prompt: 'hi' });

const result = streamText({ model: valora('lfm2.5-350m'), prompt: 'Count to 5.' });
for await (const chunk of result.textStream) console.log(chunk);

valora('id') is the default registry path for built-in model ids. For browser apps that need app-owned ids, private/gated Hugging Face repos, or one model configuration shared with voice, create a configured provider and adapt it into AI SDK:

import { generateText } from 'ai';
import { createValora } from '@valora-ai/ai-sdk';
import { createLfm2Provider } from '@valora-ai/lfm2/provider';

const lfm2 = createLfm2Provider({
  auth: { type: 'bearer', token: hfToken },
  models: {
    'team-chat': {
      source: {
        type: 'huggingface',
        repo: 'LiquidAI/LFM2.5-350M-GGUF',
        revision: 'main',
        file: 'LFM2.5-350M-Q4_0.gguf',
      },
    },
  },
});

const configuredValora = createValora({ languageModels: [lfm2] });
await generateText({ model: configuredValora('team-chat'), prompt: 'hi' });

The same lfm2 object exposes native Valora engines through lfm2.languageModel('team-chat'); @valora-ai/voice consumes that directly. Bearer auth belongs in provider/load configuration. accessToken still works as a compatibility alias, but secrets should not be stored on ModelCard.meta.

The same package exposes explicit audio adapters:

import {
  createValoraSpeechModel,
  createValoraTranscriptionModel,
} from '@valora-ai/ai-sdk';

const speech = createValoraSpeechModel('kokoro', async () => speechEngine);
const transcription = createValoraTranscriptionModel('whisper', async () => sttEngine);

Speech returns WAV bytes. Transcription accepts mono 16-bit PCM WAV in this first pass, decodes it to Float32Array, and calls the local TranscribeEngine.

React hooks (in-browser, WebGPU)

import { createGemmaProvider } from '@valora-ai/gemma/provider';
import {
  createValoraRuntime,
  ValoraRuntimeProvider,
  useLocalModel,
  useLocalRealtime,
  useLocalTTS,
  useTranscribe,
  useVAD,
} from '@valora-ai/react/local';

const runtime = createValoraRuntime({
  languageModels: [createGemmaProvider()],
});

function Voice() {
  const { status, send } = useLocalModel('gemma-4-e2b');
  const { speak } = useLocalTTS();
  const { transcribe } = useTranscribe();
  useVAD({ onSpeechEnd: (audio) => transcribe(audio) }); // Float32 @ 16kHz
}

export default function App() {
  return (
    <ValoraRuntimeProvider runtime={runtime}>
      <Voice />
    </ValoraRuntimeProvider>
  );
}

useLocalModel('gemma-4-e2b') can still run through the legacy static browser loader when no runtime provider is mounted. Mount a ValoraRuntimeProvider when the app needs configured model sources, bearer auth, custom ids, or one provider setup shared with local voice. Hooks need a WebGPU browser and a bundler (Vite / Next client component). These four hooks are the same native engine contracts @valora-ai/voice's createVoiceAgent expects.

useLocalRealtime(agent) is the hook for an assembled VoiceAgent: it returns the session snapshot plus connect, disconnect, sendText, interrupt, mute, and unlock, while keeping capture and inference local.

See the local API reference.

On this page

Valora is local-first

No API key, no server — everything in this doc runs on-device.

Star on GitHub