Voice assistant

A hands-free assistant: the mic captures a question, the transcript becomes a chat message, and the streamed reply is read back aloud. This demo simulates the speech-to-text step with a canned transcript — the spoken reply is real, powered by your browser’s Web Speech API. Tap the mic in the toolbar to try it.

How it works

<kai-chat> does the talking on both ends. Its built-in voice attribute adds a mic button to the input toolbar and fires a kai-voice event when tapped. You own what happens next: capture audio, transcribe it, push the text into the thread, then speak the finished reply with the Web Speech API.

<kai-chat id="chat" chat-title="Voice assistant" voice></kai-chat>

<script type="module">
  import '@kitn.ai/ui/elements';

  const chat = document.getElementById('chat');
  chat.messages = [];

  // Read the finished reply aloud — guarded so unsupported browsers stay silent.
  function speak(text) {
    if (!('speechSynthesis' in window) || !text) return;
    const utter = new SpeechSynthesisUtterance(text);
    utter.lang = 'en-US';
    speechSynthesis.cancel(); // stop any previous utterance
    speechSynthesis.speak(utter);
  }

  // Run one turn: append the question, stream the reply, then speak it.
  async function ask(question) {
    const assistantId = crypto.randomUUID();
    chat.messages = [
      ...chat.messages,
      { id: crypto.randomUUID(), role: 'user', content: question },
      { id: assistantId, role: 'assistant', content: '' },
    ];
    chat.loading = true;

    let answer = '';
    for await (const token of streamAnswer(question)) {
      answer += token;
      chat.messages = chat.messages.map((m) =>
        m.id === assistantId ? { ...m, content: answer } : m,
      );
    }

    chat.loading = false;
    speak(answer); // hear the reply once it finishes streaming
  }

  // Mic tapped → record, transcribe, then run the turn with the transcript.
  chat.addEventListener('kai-voice', async () => {
    const transcript = await transcribeFromMic(); // your STT pipeline
    if (transcript) ask(transcript);
  });

  // Typed messages follow the same path.
  chat.addEventListener('kai-submit', (e) => {
    const text = e.detail.value.trim();
    if (text) ask(text);
  });
</script>

Mic capture needs the MediaRecorder API and a transcription backend, which can’t run in this sandbox — the demo above swaps transcribeFromMic() for a canned transcript and a scripted stream so the flow is visible end to end. In production, drop <kai-voice-input> into transcribeFromMic() to record audio and return text from your speech-to-text provider.

Next steps

Speech to Text recipe — wire real mic capture with <kai-voice-input> and a transcription backend, the piece this demo stubs out.
Text to Speech recipe — choosing a voice, stopping playback, and swapping the browser-native path for higher-quality cloud TTS.
kai-chat reference — the voice attribute, the kai-voice event, and the full streaming loop.
Drop-in chat — the baseline streaming loop this example layers voice on top of.