Skip to content
kitn AI/UI

Speech to Text

<kai-voice-input> is a mic button that records audio and hands the Blob to your transcription backend. Wire it in, supply an async transcriber, and the transcribed text fires back as a kai-transcription event.

The element exposes two output channels:

  • kai-audio-captured — fires immediately when the user stops recording, with the raw Blob. Use this if you handle transcription yourself and don’t set transcribe.
  • kai-transcription — fires after your transcribe function resolves, with detail.text. Use this to insert the transcribed text into your input.

transcribe is a function, so you assign it in JavaScript — a function can’t be an HTML attribute.

Register the web components once, then wire the element:

<kai-voice-input id="voice"></kai-voice-input>
<script type="module">
import '@kitn.ai/ui/elements';
const voice = document.getElementById('voice');
voice.transcribe = async (blob) => {
const form = new FormData();
form.append('audio', blob, 'recording.webm');
const res = await fetch('/api/transcribe', { method: 'POST', body: form });
const { text } = await res.json();
return text;
};
voice.addEventListener('kai-transcription', (e) => {
document.getElementById('prompt').value = e.detail.text;
});
</script>

kai-chat has a built-in voice attribute that renders a mic button in the toolbar and fires a kai-voice event when clicked. Use that event to open <kai-voice-input> or trigger recording programmatically.

Alternatively, place <kai-voice-input> beside the chat element and push the transcribed text into the chat’s value:

<div style="display: flex; flex-direction: column; height: 100dvh;">
<kai-chat id="chat" style="flex: 1;"></kai-chat>
<kai-voice-input id="voice" style="margin: 8px;"></kai-voice-input>
</div>
<script type="module">
import '@kitn.ai/ui/elements';
const chat = document.getElementById('chat');
const voice = document.getElementById('voice');
voice.transcribe = async (blob) => {
const form = new FormData();
form.append('audio', blob, 'recording.webm');
const res = await fetch('/api/transcribe', { method: 'POST', body: form });
const { text } = await res.json();
return text;
};
// Inject the transcription into the chat input
voice.addEventListener('kai-transcription', (e) => {
chat.value = e.detail.text;
});
</script>
import { VoiceInput } from '@kitn.ai/ui/react';
export function VoiceButton({ onTranscription }: { onTranscription: (text: string) => void }) {
return (
<VoiceInput
transcribe={async (blob) => {
const form = new FormData();
form.append('audio', blob, 'recording.webm');
const res = await fetch('/api/transcribe', { method: 'POST', body: form });
const { text } = await res.json();
return text;
}}
onTranscription={(e) => onTranscription(e.detail.text)}
/>
);
}

If you want full control — for example, to stream audio, show a waveform, or batch multiple recordings — listen to kai-audio-captured instead of setting transcribe. The element fires it with the raw Blob as soon as the user stops recording:

voice.addEventListener('kai-audio-captured', async (e) => {
const { blob } = e.detail; // audio/webm;codecs=opus
// Send to your pipeline, update UI, etc.
await uploadAudio(blob);
});

Set the disabled attribute to make the mic button non-interactive — for example, while the assistant is generating a reply:

<kai-voice-input id="voice" disabled></kai-voice-input>
// Toggle based on loading state
chat.addEventListener('kai-submit', () => { voice.disabled = true; });
// Re-enable after the reply is complete
function onReplyDone() { voice.disabled = false; }
PropTypeDefaultDescription
transcribe(audio: Blob) => Promise<string>Async transcriber, assigned in JavaScript. When set, the resolved text fires kai-transcription.
disabledbooleanfalseDisables the mic button.
EventDetailDescription
kai-audio-captured{ blob: Blob }Raw audio captured as soon as recording stops — fires before transcription.
kai-transcription{ text: string }Transcription resolved. Only fires when transcribe is set.
  • Installation — register the elements and add the theme.
  • Text to Speech — the reverse direction: stream synthesized audio from the assistant.