Speech to Text
<kai-voice-input> is a mic button that records audio and hands the Blob to your transcription backend. Wire it in, supply an async transcriber, and the transcribed text fires back as a kai-transcription event.
How it works
Section titled “How it works”The element exposes two output channels:
kai-audio-captured— fires immediately when the user stops recording, with the rawBlob. Use this if you handle transcription yourself and don’t settranscribe.kai-transcription— fires after yourtranscribefunction resolves, withdetail.text. Use this to insert the transcribed text into your input.
transcribe is a function, so you assign it in JavaScript — a function can’t be an HTML attribute.
Basic setup
Section titled “Basic setup”Register the web components once, then wire the element:
<kai-voice-input id="voice"></kai-voice-input>
<script type="module"> import '@kitn.ai/ui/elements';
const voice = document.getElementById('voice');
voice.transcribe = async (blob) => { const form = new FormData(); form.append('audio', blob, 'recording.webm');
const res = await fetch('/api/transcribe', { method: 'POST', body: form }); const { text } = await res.json(); return text; };
voice.addEventListener('kai-transcription', (e) => { document.getElementById('prompt').value = e.detail.text; });</script>Composing with kai-chat
Section titled “Composing with kai-chat”kai-chat has a built-in voice attribute that renders a mic button in the toolbar and fires a kai-voice event when clicked. Use that event to open <kai-voice-input> or trigger recording programmatically.
Alternatively, place <kai-voice-input> beside the chat element and push the transcribed text into the chat’s value:
<div style="display: flex; flex-direction: column; height: 100dvh;"> <kai-chat id="chat" style="flex: 1;"></kai-chat> <kai-voice-input id="voice" style="margin: 8px;"></kai-voice-input></div>
<script type="module"> import '@kitn.ai/ui/elements';
const chat = document.getElementById('chat'); const voice = document.getElementById('voice');
voice.transcribe = async (blob) => { const form = new FormData(); form.append('audio', blob, 'recording.webm'); const res = await fetch('/api/transcribe', { method: 'POST', body: form }); const { text } = await res.json(); return text; };
// Inject the transcription into the chat input voice.addEventListener('kai-transcription', (e) => { chat.value = e.detail.text; });</script>Framework usage
Section titled “Framework usage”import { VoiceInput } from '@kitn.ai/ui/react';
export function VoiceButton({ onTranscription }: { onTranscription: (text: string) => void }) { return ( <VoiceInput transcribe={async (blob) => { const form = new FormData(); form.append('audio', blob, 'recording.webm'); const res = await fetch('/api/transcribe', { method: 'POST', body: form }); const { text } = await res.json(); return text; }} onTranscription={(e) => onTranscription(e.detail.text)} /> );}<template> <kai-voice-input ref="voiceRef" @kai-transcription="onTranscription" /></template>
<script setup>import '@kitn.ai/ui/elements';import { onMounted, ref } from 'vue';
const voiceRef = ref(null);const emit = defineEmits(['transcription']);
onMounted(() => { voiceRef.value.transcribe = async (blob) => { const form = new FormData(); form.append('audio', blob, 'recording.webm'); const res = await fetch('/api/transcribe', { method: 'POST', body: form }); const { text } = await res.json(); return text; };});
const onTranscription = (e) => emit('transcription', e.detail.text);</script><script> import '@kitn.ai/ui/elements';
let voiceEl;
function onMount(el) { voiceEl = el; el.transcribe = async (blob) => { const form = new FormData(); form.append('audio', blob, 'recording.webm'); const res = await fetch('/api/transcribe', { method: 'POST', body: form }); const { text } = await res.json(); return text; }; }
function onTranscription(e) { console.log(e.detail.text); }</script>
<kai-voice-input use:onMount on:kai-transcription={onTranscription}></kai-voice-input>import { onMount } from 'solid-js';import { VoiceInput } from '@kitn.ai/ui';
export function VoiceButton(props: { onTranscription: (text: string) => void }) { return ( <VoiceInput onTranscribe={async (blob) => { const form = new FormData(); form.append('audio', blob, 'recording.webm'); const res = await fetch('/api/transcribe', { method: 'POST', body: form }); const { text } = await res.json(); return text; }} onTranscription={(text) => props.onTranscription(text)} /> );}Handling the raw audio yourself
Section titled “Handling the raw audio yourself”If you want full control — for example, to stream audio, show a waveform, or batch multiple recordings — listen to kai-audio-captured instead of setting transcribe. The element fires it with the raw Blob as soon as the user stops recording:
voice.addEventListener('kai-audio-captured', async (e) => { const { blob } = e.detail; // audio/webm;codecs=opus
// Send to your pipeline, update UI, etc. await uploadAudio(blob);});Disabled state
Section titled “Disabled state”Set the disabled attribute to make the mic button non-interactive — for example, while the assistant is generating a reply:
<kai-voice-input id="voice" disabled></kai-voice-input>// Toggle based on loading statechat.addEventListener('kai-submit', () => { voice.disabled = true; });// Re-enable after the reply is completefunction onReplyDone() { voice.disabled = false; }API reference
Section titled “API reference”| Prop | Type | Default | Description |
|---|---|---|---|
transcribe | (audio: Blob) => Promise<string> | — | Async transcriber, assigned in JavaScript. When set, the resolved text fires kai-transcription. |
disabled | boolean | false | Disables the mic button. |
Events
Section titled “Events”| Event | Detail | Description |
|---|---|---|
kai-audio-captured | { blob: Blob } | Raw audio captured as soon as recording stops — fires before transcription. |
kai-transcription | { text: string } | Transcription resolved. Only fires when transcribe is set. |
What’s next
Section titled “What’s next”- Installation — register the elements and add the theme.
- Text to Speech — the reverse direction: stream synthesized audio from the assistant.