Skip to content
kitn AI/UI

Voice assistant

A hands-free assistant: the mic captures a question, the transcript becomes a chat message, and the streamed reply is read back aloud. This demo simulates the speech-to-text step with a canned transcript — the spoken reply is real, powered by your browser’s Web Speech API. Tap the mic in the toolbar to try it.

<kai-chat> does the talking on both ends. Its built-in voice attribute adds a mic button to the input toolbar and fires a kai-voice event when tapped. You own what happens next: capture audio, transcribe it, push the text into the thread, then speak the finished reply with the Web Speech API.

<kai-chat id="chat" chat-title="Voice assistant" voice></kai-chat>
<script type="module">
import '@kitn.ai/ui/elements';
const chat = document.getElementById('chat');
chat.messages = [];
// Read the finished reply aloud — guarded so unsupported browsers stay silent.
function speak(text) {
if (!('speechSynthesis' in window) || !text) return;
const utter = new SpeechSynthesisUtterance(text);
utter.lang = 'en-US';
speechSynthesis.cancel(); // stop any previous utterance
speechSynthesis.speak(utter);
}
// Run one turn: append the question, stream the reply, then speak it.
async function ask(question) {
const assistantId = crypto.randomUUID();
chat.messages = [
...chat.messages,
{ id: crypto.randomUUID(), role: 'user', content: question },
{ id: assistantId, role: 'assistant', content: '' },
];
chat.loading = true;
let answer = '';
for await (const token of streamAnswer(question)) {
answer += token;
chat.messages = chat.messages.map((m) =>
m.id === assistantId ? { ...m, content: answer } : m,
);
}
chat.loading = false;
speak(answer); // hear the reply once it finishes streaming
}
// Mic tapped → record, transcribe, then run the turn with the transcript.
chat.addEventListener('kai-voice', async () => {
const transcript = await transcribeFromMic(); // your STT pipeline
if (transcript) ask(transcript);
});
// Typed messages follow the same path.
chat.addEventListener('kai-submit', (e) => {
const text = e.detail.value.trim();
if (text) ask(text);
});
</script>

Mic capture needs the MediaRecorder API and a transcription backend, which can’t run in this sandbox — the demo above swaps transcribeFromMic() for a canned transcript and a scripted stream so the flow is visible end to end. In production, drop <kai-voice-input> into transcribeFromMic() to record audio and return text from your speech-to-text provider.

  • Speech to Text recipe — wire real mic capture with <kai-voice-input> and a transcription backend, the piece this demo stubs out.
  • Text to Speech recipe — choosing a voice, stopping playback, and swapping the browser-native path for higher-quality cloud TTS.
  • kai-chat reference — the voice attribute, the kai-voice event, and the full streaming loop.
  • Drop-in chat — the baseline streaming loop this example layers voice on top of.