Speech to Text

<kai-voice-input> is a mic button that records audio and hands the Blob to your transcription backend. Wire it in, supply an async transcriber, and the transcribed text fires back as a kai-transcription event.

How it works

The element exposes two output channels:

kai-audio-captured — fires immediately when the user stops recording, with the raw Blob. Use this if you handle transcription yourself and don’t set transcribe.
kai-transcription — fires after your transcribe function resolves, with detail.text. Use this to insert the transcribed text into your input.

transcribe is a function, so you assign it in JavaScript — a function can’t be an HTML attribute.

Basic setup

<kai-voice-input id="voice"></kai-voice-input>

<script type="module">
  import '@kitn.ai/ui/elements';

  const voice = document.getElementById('voice');

  voice.transcribe = async (blob) => {
    const form = new FormData();
    form.append('audio', blob, 'recording.webm');

    const res = await fetch('/api/transcribe', { method: 'POST', body: form });
    const { text } = await res.json();
    return text;
  };

  voice.addEventListener('kai-transcription', (e) => {
    document.getElementById('prompt').value = e.detail.text;
  });
</script>

Composing with kai-chat

kai-chat has a built-in voice attribute that renders a mic button in the toolbar and fires a kai-voice event when clicked. Use that event to open <kai-voice-input> or trigger recording programmatically.

Alternatively, place <kai-voice-input> beside the chat element and push the transcribed text into the chat’s value:

<div style="display: flex; flex-direction: column; height: 100dvh;">
  <kai-chat id="chat" style="flex: 1;"></kai-chat>
  <kai-voice-input id="voice" style="margin: 8px;"></kai-voice-input>
</div>

<script type="module">
  import '@kitn.ai/ui/elements';

  const chat = document.getElementById('chat');
  const voice = document.getElementById('voice');

  voice.transcribe = async (blob) => {
    const form = new FormData();
    form.append('audio', blob, 'recording.webm');
    const res = await fetch('/api/transcribe', { method: 'POST', body: form });
    const { text } = await res.json();
    return text;
  };

  // Inject the transcription into the chat input
  voice.addEventListener('kai-transcription', (e) => {
    chat.value = e.detail.text;
  });
</script>

Framework usage

import { VoiceInput } from '@kitn.ai/ui/react';

export function VoiceButton({ onTranscription }: { onTranscription: (text: string) => void }) {
  return (
    <VoiceInput
      transcribe={async (blob) => {
        const form = new FormData();
        form.append('audio', blob, 'recording.webm');
        const res = await fetch('/api/transcribe', { method: 'POST', body: form });
        const { text } = await res.json();
        return text;
      }}
      onTranscription={(e) => onTranscription(e.detail.text)}
    />
  );
}

<template>
  <kai-voice-input
    ref="voiceRef"
    @kai-transcription="onTranscription"
  />
</template>

<script setup>
import '@kitn.ai/ui/elements';
import { onMounted, ref } from 'vue';

const voiceRef = ref(null);
const emit = defineEmits(['transcription']);

onMounted(() => {
  voiceRef.value.transcribe = async (blob) => {
    const form = new FormData();
    form.append('audio', blob, 'recording.webm');
    const res = await fetch('/api/transcribe', { method: 'POST', body: form });
    const { text } = await res.json();
    return text;
  };
});

const onTranscription = (e) => emit('transcription', e.detail.text);
</script>

<script>
  import '@kitn.ai/ui/elements';

  let voiceEl;

  function onMount(el) {
    voiceEl = el;
    el.transcribe = async (blob) => {
      const form = new FormData();
      form.append('audio', blob, 'recording.webm');
      const res = await fetch('/api/transcribe', { method: 'POST', body: form });
      const { text } = await res.json();
      return text;
    };
  }

  function onTranscription(e) {
    console.log(e.detail.text);
  }
</script>

<kai-voice-input
  use:onMount
  on:kai-transcription={onTranscription}
></kai-voice-input>

import { onMount } from 'solid-js';
import { VoiceInput } from '@kitn.ai/ui';

export function VoiceButton(props: { onTranscription: (text: string) => void }) {
  return (
    <VoiceInput
      onTranscribe={async (blob) => {
        const form = new FormData();
        form.append('audio', blob, 'recording.webm');
        const res = await fetch('/api/transcribe', { method: 'POST', body: form });
        const { text } = await res.json();
        return text;
      }}
      onTranscription={(text) => props.onTranscription(text)}
    />
  );
}

Handling the raw audio yourself

If you want full control — for example, to stream audio, show a waveform, or batch multiple recordings — listen to kai-audio-captured instead of setting transcribe. The element fires it with the raw Blob as soon as the user stops recording:

voice.addEventListener('kai-audio-captured', async (e) => {
  const { blob } = e.detail; // audio/webm;codecs=opus

  // Send to your pipeline, update UI, etc.
  await uploadAudio(blob);
});

Disabled state

Set the disabled attribute to make the mic button non-interactive — for example, while the assistant is generating a reply:

<kai-voice-input id="voice" disabled></kai-voice-input>

// Toggle based on loading state
chat.addEventListener('kai-submit', () => { voice.disabled = true; });
// Re-enable after the reply is complete
function onReplyDone() { voice.disabled = false; }

API reference

Props

Prop	Type	Default	Description
`transcribe`	`(audio: Blob) => Promise<string>`	—	Async transcriber, assigned in JavaScript. When set, the resolved text fires `kai-transcription`.
`disabled`	`boolean`	`false`	Disables the mic button.

Events

Event	Detail	Description
`kai-audio-captured`	`{ blob: Blob }`	Raw audio captured as soon as recording stops — fires before transcription.
`kai-transcription`	`{ text: string }`	Transcription resolved. Only fires when `transcribe` is set.

What’s next

Installation — register the elements and add the theme.
Text to Speech — the reverse direction: stream synthesized audio from the assistant.