-
Notifications
You must be signed in to change notification settings - Fork 164
Description
Creating this issue as suggested in the docs here 
Problem
Currently, @react-native-ai/apple only supports batch transcription via NativeAppleTranscription.transcribe(), which processes the entire audio buffer after recording completes. This means users don't see any transcription text until
they've finished speaking, resulting in a suboptimal UX compared to cloud-based solutions like AssemblyAI, Deepgram, or OpenAI's Whisper API.
Use Case
I'm building an AI voice assistant for iOS 26+ that needs to:
- Show real-time transcription as the user speaks (partial results)
- Provide visual feedback that the app is "listening and understanding"
- Match the UX expectations set by Siri and other voice assistants
Current Behavior
// Only get results AFTER recording stops
const result = await NativeAppleTranscription.transcribe(audioBuffer, "en-US");
const finalText = result.segments.map(seg => seg.text).join(" ");
// No way to get partial results during recording
Desired Behavior
Apple's SpeechAnalyzer framework (iOS 26+) natively supports streaming transcription with partial results. Ideally, the library would expose this capability:
// Proposed API (similar to LLM streaming)
const stream = await NativeAppleTranscription.transcribeStream(audioStream, "en-US");
stream.on('partial', (text) => {
// Update UI with partial transcript while user is speaking
console.log('Partial:', text);
});
stream.on('final', (result) => {
// Get final transcript with timestamps
console.log('Final:', result);
});
Or using event emitters like you do for LLM streams:
const streamId = NativeAppleTranscription.startStreamingTranscription("en-US");
NativeAppleTranscription.onPartialTranscript((data) => {
if (data.streamId === streamId) {
updateUI(data.partialText);
}
});
// Feed audio chunks as they're recorded
audioRecorder.on('chunk', (chunk) => {
NativeAppleTranscription.feedAudioChunk(streamId, chunk);
});
Why This Matters
- Better UX: Real-time feedback is essential for voice interfaces
- Competitive: On-device + real-time would be a killer combination (privacy + UX)
- Apple's Intent: Apple built streaming support into SpeechAnalyzer for this exact reason
Technical Notes
- Apple's SpeechAnalyzer already supports this via the SpeechTranscriber module
- You've already implemented streaming for LLM (doStream() in AppleLLMChatLanguageModel)
- The pattern could be similar to your LLM streaming implementation
References
- https://developer.apple.com/documentation/speech/speechanalyzer
- https://www.callstack.com/blog/on-device-speech-transcription-with-apple-speechanalyzer mentions streaming capabilities
Workaround
Currently using AssemblyAI's WebSocket API for real-time transcription, but would love to migrate fully to on-device processing for privacy and offline support.
Alternatively, we've looked at using AssemblyAI which supports this.
Optional: Would you accept a PR?
Happy to contribute this feature if you can provide guidance on the native Swift implementation approach! 🙌
Additional Context
- iOS Version: 26.0+
- Library Version: @react-native-ai/[email protected]
- React Native: 0.79.5