Skip to content

Feature Request - Add streaming support for on-device speech transcription (SpeechAnalyzer) #21

@Kae7in

Description

@Kae7in

Creating this issue as suggested in the docs here Image

Problem

Currently, @react-native-ai/apple only supports batch transcription via NativeAppleTranscription.transcribe(), which processes the entire audio buffer after recording completes. This means users don't see any transcription text until
they've finished speaking, resulting in a suboptimal UX compared to cloud-based solutions like AssemblyAI, Deepgram, or OpenAI's Whisper API.

Use Case

I'm building an AI voice assistant for iOS 26+ that needs to:

  • Show real-time transcription as the user speaks (partial results)
  • Provide visual feedback that the app is "listening and understanding"
  • Match the UX expectations set by Siri and other voice assistants

Current Behavior

  // Only get results AFTER recording stops
  const result = await NativeAppleTranscription.transcribe(audioBuffer, "en-US");
  const finalText = result.segments.map(seg => seg.text).join(" ");
  // No way to get partial results during recording

Desired Behavior

Apple's SpeechAnalyzer framework (iOS 26+) natively supports streaming transcription with partial results. Ideally, the library would expose this capability:

  // Proposed API (similar to LLM streaming)
  const stream = await NativeAppleTranscription.transcribeStream(audioStream, "en-US");

  stream.on('partial', (text) => {
    // Update UI with partial transcript while user is speaking
    console.log('Partial:', text);
  });

  stream.on('final', (result) => {
    // Get final transcript with timestamps
    console.log('Final:', result);
  });

Or using event emitters like you do for LLM streams:

  const streamId = NativeAppleTranscription.startStreamingTranscription("en-US");

  NativeAppleTranscription.onPartialTranscript((data) => {
    if (data.streamId === streamId) {
      updateUI(data.partialText);
    }
  });

  // Feed audio chunks as they're recorded
  audioRecorder.on('chunk', (chunk) => {
    NativeAppleTranscription.feedAudioChunk(streamId, chunk);
  });

Why This Matters

  • Better UX: Real-time feedback is essential for voice interfaces
  • Competitive: On-device + real-time would be a killer combination (privacy + UX)
  • Apple's Intent: Apple built streaming support into SpeechAnalyzer for this exact reason

Technical Notes

  • Apple's SpeechAnalyzer already supports this via the SpeechTranscriber module
  • You've already implemented streaming for LLM (doStream() in AppleLLMChatLanguageModel)
  • The pattern could be similar to your LLM streaming implementation

References

Workaround

Currently using AssemblyAI's WebSocket API for real-time transcription, but would love to migrate fully to on-device processing for privacy and offline support.
Alternatively, we've looked at using AssemblyAI which supports this.


Optional: Would you accept a PR?

Happy to contribute this feature if you can provide guidance on the native Swift implementation approach! 🙌


Additional Context

  • iOS Version: 26.0+
  • Library Version: @react-native-ai/[email protected]
  • React Native: 0.79.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions