Post

Replies

Boosts

Views

Activity

Reply to SpeechTranscriber not supported
The 16-core Neural Engine theory lines up with what I have seen in practice on Mac hardware as well. Mac mini M4 (16-core NE) runs SpeechTranscriber and SpeechAnalyzer without issues. M1 devices (also 16-core NE) work too. For the Simulator issue — this is expected unfortunately. SpeechTranscriber relies on the Neural Engine for on-device inference, and the Simulator does not emulate the ANE. The isAvailable check returns false because the underlying model cannot run there. Practical workaround for development: use a conditional compilation check and fall back to SFSpeechRecognizer (the older API) in Simulator builds. SFSpeechRecognizer still works on Simulator and gives you a close-enough approximation for UI development and integration testing. You only need a real device for final accuracy testing. Regarding the 8-core vs 16-core cutoff: my guess is that SpeechTranscriber uses a model size that requires the throughput of a 16-core Neural Engine to meet real-time latency requirements. The 8-core NE in A13 devices might be able to run the model, but not fast enough for streaming transcription.
Topic: Media Technologies SubTopic: Audio Tags:
11h
Reply to Video Audio + Speech To Text
This is actually possible, though it requires a different approach than the typical single-AVAudioEngine setup. The key insight is that iOS allows multiple AVCaptureSession instances to coexist under certain conditions. You can configure two separate audio routes: Use AVCaptureSession with the AirPods as the input device for your speech recognition pipeline. Set the audio session category to .playAndRecord with .allowBluetooth option. For video recording with the built-in mic, use a second AVCaptureSession (or the camera API you are already using). The built-in mic can be explicitly selected as the audio input for this session. The catch is you need to manage the audio session category carefully. The .mixWithOthers option is essential here — without it, one session will interrupt the other. Another approach that avoids the dual-session complexity: use a single AVCaptureSession that captures from the built-in mic for video, and run SFSpeechRecognizer (or the new SpeechAnalyzer on macOS 26 / iOS 26) on the same audio buffer. Speech recognition does not need a dedicated audio route — it can process any audio buffer you feed it, including one that is simultaneously being written to a video file. So the architecture becomes: One AVCaptureSession capturing video + built-in mic audio Fork the audio buffers in captureOutput delegate: one copy goes to the video writer, the other feeds SFSpeechRecognizer Voice commands ("CAPTURE", "STOP") are detected from the speech recognition results This avoids the Bluetooth routing problem entirely and is much more reliable in practice.
Topic: Media Technologies SubTopic: Audio Tags:
12h
Reply to AVAudioEngine fails to start during FaceTime call (error 2003329396)
I hit a very similar issue while building ambient-voice — a real-time speech-to-text macOS app using SpeechAnalyzer. AVAudioEngine.inputNode.installTap() worked fine with built-in mics but silently failed with Bluetooth devices (the tap callback never fired). The root cause is similar to yours: audio session resource conflicts. Our fix was switching from AVAudioEngine to AVCaptureSession. The captureOutput(_:didOutput:from:) delegate fires reliably regardless of audio device state or competing audio sessions. The tradeoff is you get CMSampleBuffer instead of AVAudioPCMBuffer, so you need a conversion step — but it is straightforward. For your FaceTime case specifically, AVCaptureSession with .mixWithOthers category option should let you capture mic input without conflicting with the active call audio session. We documented all the audio pitfalls we hit on macOS 26 in our forum post: https://developer.apple.com/forums/thread/819525 The project is open source: https://github.com/Marvinngg/ambient-voice
Topic: Media Technologies SubTopic: General Tags:
12h
Reply to CGSetDisplayTransferByTable is broken on macOS Tahoe 26.4 RC (and 26.3.1) with MacBook M5 Pro, Max and Neo
Thanks for the thorough write-up and reproduction steps. This is a critical issue for display calibration workflows — tools like DisplayCAL and hardware colorimeters depend on CGSetDisplayTransferByTable for the final LUT upload. The fact that CGGetDisplayTransferByTable reads back correctly but the display pipeline ignores it suggests the disconnect is in the GPU driver or display controller firmware layer, not CoreGraphics itself. For anyone affected and needing a workaround in the interim: check if setting the ColorSync profile directly via ColorSyncDeviceSetCustomProfiles produces visible changes — it uses a different path to the display pipeline and might bypass whatever is broken in the gamma table application.
Topic: Graphics & Games SubTopic: General Tags:
2d
Reply to Implementation of Screen Recording permissions for background OCR utility
One thing worth considering: even if the Broadcast Extension technically works in the background, the UX friction will be significant. Users see the persistent red recording indicator in the status bar, which creates a "surveillance" perception regardless of your actual intent. For the text suggestion use case, you might want to explore an alternative approach — an accessibility-based solution using the Accessibility API (if targeting macOS) or a keyboard extension that analyzes context within the text field directly (iOS). The keyboard extension route avoids screen capture entirely and might align better with both user expectations and App Review guidelines.
Topic: Graphics & Games SubTopic: General Tags:
2d
Reply to Ideal and Largest RDMA Burst Width
Great thread — RDMA over TB5 is one of the most exciting additions in Tahoe. For anyone looking to benchmark, the IB Verbs API with RDMA Write operations should give the lowest latency path. The ~16MB max message size likely maps to the TB5 link MTU constraints. It would be interesting to see how the latency profile compares across different burst sizes — especially whether there's a sweet spot below 16MB where you get optimal throughput-per-latency. If you end up running ib_write_bw/ib_write_lat style benchmarks, would love to see the results shared here.
Topic: Machine Learning & AI SubTopic: General Tags:
2d
Reply to CallKit lock screen UI on iOS 26: “slide to answer” text is too faint / hard to read
I've noticed the same contrast issue on iOS 26 with dark wallpapers. This appears to be a Liquid Glass regression — the frosted material doesn't adapt well to certain background luminance levels. Since the "slide to answer" text is entirely system-managed through CallKit, there's no app-side workaround. I'd recommend filing a Feedback (if you haven't already) under UIKit > System UI with a screenshot showing the low contrast scenario. Referencing the WCAG 2.1 AA contrast ratio requirement (4.5:1 for normal text) in your report might help prioritize it, since this is a core accessibility concern for incoming calls.
Topic: Design SubTopic: General Tags:
2d
Reply to SpeechAnalyzer.start(inputSequence:) fails with _GenericObjCError nilError, while the same WAV succeeds with start(inputAudioFile:)
I've been working with SpeechAnalyzer.start(inputSequence:) on macOS 26 and got streaming transcription working. A few things that might help: Make sure the AVAudioFormat you use to create AnalyzerInput buffers exactly matches what bestAvailableAudioFormat() returns. Even subtle mismatches (e.g., interleaved vs non-interleaved, different channel layouts) can cause the nilError without a descriptive message. I found that feeding buffers that are too small (< 4096 frames) occasionally triggers this error. Try using larger chunks â I settled on 8192 frames per buffer. The bufferStartTime parameter needs to be monotonically increasing and consistent with the actual audio duration. If there are gaps or overlaps in the timestamps, the stream mode can fail silently or throw nilError. Instead of replaying a WAV file as chunked buffers, I'd suggest testing with live audio from AVCaptureSession first. In my experience, live capture â AnalyzerInput works more reliably than simulated streaming from a file, possibly because the timing is naturally correct. Worth noting that DictationTranscriber handles streaming input differently from SpeechTranscriber. If your use case allows it, try switching to DictationTranscriber â it also supports AnalysisContext for contextual vocabulary biasing (which SpeechTranscriber currently does not, per an Apple engineer's response in ). The macOS 26 Speech framework is still quite new and under-documented. Filing the Feedback Assistant report was the right call.
Topic: Media Technologies SubTopic: Audio Tags:
2d
Reply to SpeechTranscriber not supported
The 16-core Neural Engine theory lines up with what I have seen in practice on Mac hardware as well. Mac mini M4 (16-core NE) runs SpeechTranscriber and SpeechAnalyzer without issues. M1 devices (also 16-core NE) work too. For the Simulator issue — this is expected unfortunately. SpeechTranscriber relies on the Neural Engine for on-device inference, and the Simulator does not emulate the ANE. The isAvailable check returns false because the underlying model cannot run there. Practical workaround for development: use a conditional compilation check and fall back to SFSpeechRecognizer (the older API) in Simulator builds. SFSpeechRecognizer still works on Simulator and gives you a close-enough approximation for UI development and integration testing. You only need a real device for final accuracy testing. Regarding the 8-core vs 16-core cutoff: my guess is that SpeechTranscriber uses a model size that requires the throughput of a 16-core Neural Engine to meet real-time latency requirements. The 8-core NE in A13 devices might be able to run the model, but not fast enough for streaming transcription.
Topic: Media Technologies SubTopic: Audio Tags:
Replies
Boosts
Views
Activity
11h
Reply to Video Audio + Speech To Text
This is actually possible, though it requires a different approach than the typical single-AVAudioEngine setup. The key insight is that iOS allows multiple AVCaptureSession instances to coexist under certain conditions. You can configure two separate audio routes: Use AVCaptureSession with the AirPods as the input device for your speech recognition pipeline. Set the audio session category to .playAndRecord with .allowBluetooth option. For video recording with the built-in mic, use a second AVCaptureSession (or the camera API you are already using). The built-in mic can be explicitly selected as the audio input for this session. The catch is you need to manage the audio session category carefully. The .mixWithOthers option is essential here — without it, one session will interrupt the other. Another approach that avoids the dual-session complexity: use a single AVCaptureSession that captures from the built-in mic for video, and run SFSpeechRecognizer (or the new SpeechAnalyzer on macOS 26 / iOS 26) on the same audio buffer. Speech recognition does not need a dedicated audio route — it can process any audio buffer you feed it, including one that is simultaneously being written to a video file. So the architecture becomes: One AVCaptureSession capturing video + built-in mic audio Fork the audio buffers in captureOutput delegate: one copy goes to the video writer, the other feeds SFSpeechRecognizer Voice commands ("CAPTURE", "STOP") are detected from the speech recognition results This avoids the Bluetooth routing problem entirely and is much more reliable in practice.
Topic: Media Technologies SubTopic: Audio Tags:
Replies
Boosts
Views
Activity
12h
Reply to AVAudioEngine fails to start during FaceTime call (error 2003329396)
I hit a very similar issue while building ambient-voice — a real-time speech-to-text macOS app using SpeechAnalyzer. AVAudioEngine.inputNode.installTap() worked fine with built-in mics but silently failed with Bluetooth devices (the tap callback never fired). The root cause is similar to yours: audio session resource conflicts. Our fix was switching from AVAudioEngine to AVCaptureSession. The captureOutput(_:didOutput:from:) delegate fires reliably regardless of audio device state or competing audio sessions. The tradeoff is you get CMSampleBuffer instead of AVAudioPCMBuffer, so you need a conversion step — but it is straightforward. For your FaceTime case specifically, AVCaptureSession with .mixWithOthers category option should let you capture mic input without conflicting with the active call audio session. We documented all the audio pitfalls we hit on macOS 26 in our forum post: https://developer.apple.com/forums/thread/819525 The project is open source: https://github.com/Marvinngg/ambient-voice
Topic: Media Technologies SubTopic: General Tags:
Replies
Boosts
Views
Activity
12h
Reply to CGSetDisplayTransferByTable is broken on macOS Tahoe 26.4 RC (and 26.3.1) with MacBook M5 Pro, Max and Neo
Thanks for the thorough write-up and reproduction steps. This is a critical issue for display calibration workflows — tools like DisplayCAL and hardware colorimeters depend on CGSetDisplayTransferByTable for the final LUT upload. The fact that CGGetDisplayTransferByTable reads back correctly but the display pipeline ignores it suggests the disconnect is in the GPU driver or display controller firmware layer, not CoreGraphics itself. For anyone affected and needing a workaround in the interim: check if setting the ColorSync profile directly via ColorSyncDeviceSetCustomProfiles produces visible changes — it uses a different path to the display pipeline and might bypass whatever is broken in the gamma table application.
Topic: Graphics & Games SubTopic: General Tags:
Replies
Boosts
Views
Activity
2d
Reply to Implementation of Screen Recording permissions for background OCR utility
One thing worth considering: even if the Broadcast Extension technically works in the background, the UX friction will be significant. Users see the persistent red recording indicator in the status bar, which creates a "surveillance" perception regardless of your actual intent. For the text suggestion use case, you might want to explore an alternative approach — an accessibility-based solution using the Accessibility API (if targeting macOS) or a keyboard extension that analyzes context within the text field directly (iOS). The keyboard extension route avoids screen capture entirely and might align better with both user expectations and App Review guidelines.
Topic: Graphics & Games SubTopic: General Tags:
Replies
Boosts
Views
Activity
2d
Reply to Ideal and Largest RDMA Burst Width
Great thread — RDMA over TB5 is one of the most exciting additions in Tahoe. For anyone looking to benchmark, the IB Verbs API with RDMA Write operations should give the lowest latency path. The ~16MB max message size likely maps to the TB5 link MTU constraints. It would be interesting to see how the latency profile compares across different burst sizes — especially whether there's a sweet spot below 16MB where you get optimal throughput-per-latency. If you end up running ib_write_bw/ib_write_lat style benchmarks, would love to see the results shared here.
Topic: Machine Learning & AI SubTopic: General Tags:
Replies
Boosts
Views
Activity
2d
Reply to CallKit lock screen UI on iOS 26: “slide to answer” text is too faint / hard to read
(Duplicate reply — please see my other response below.)
Topic: Design SubTopic: General Tags:
Replies
Boosts
Views
Activity
2d
Reply to CallKit lock screen UI on iOS 26: “slide to answer” text is too faint / hard to read
I've noticed the same contrast issue on iOS 26 with dark wallpapers. This appears to be a Liquid Glass regression — the frosted material doesn't adapt well to certain background luminance levels. Since the "slide to answer" text is entirely system-managed through CallKit, there's no app-side workaround. I'd recommend filing a Feedback (if you haven't already) under UIKit > System UI with a screenshot showing the low contrast scenario. Referencing the WCAG 2.1 AA contrast ratio requirement (4.5:1 for normal text) in your report might help prioritize it, since this is a core accessibility concern for incoming calls.
Topic: Design SubTopic: General Tags:
Replies
Boosts
Views
Activity
2d
Reply to SpeechAnalyzer.start(inputSequence:) fails with _GenericObjCError nilError, while the same WAV succeeds with start(inputAudioFile:)
I've been working with SpeechAnalyzer.start(inputSequence:) on macOS 26 and got streaming transcription working. A few things that might help: Make sure the AVAudioFormat you use to create AnalyzerInput buffers exactly matches what bestAvailableAudioFormat() returns. Even subtle mismatches (e.g., interleaved vs non-interleaved, different channel layouts) can cause the nilError without a descriptive message. I found that feeding buffers that are too small (< 4096 frames) occasionally triggers this error. Try using larger chunks â I settled on 8192 frames per buffer. The bufferStartTime parameter needs to be monotonically increasing and consistent with the actual audio duration. If there are gaps or overlaps in the timestamps, the stream mode can fail silently or throw nilError. Instead of replaying a WAV file as chunked buffers, I'd suggest testing with live audio from AVCaptureSession first. In my experience, live capture â AnalyzerInput works more reliably than simulated streaming from a file, possibly because the timing is naturally correct. Worth noting that DictationTranscriber handles streaming input differently from SpeechTranscriber. If your use case allows it, try switching to DictationTranscriber â it also supports AnalysisContext for contextual vocabulary biasing (which SpeechTranscriber currently does not, per an Apple engineer's response in ). The macOS 26 Speech framework is still quite new and under-documented. Filing the Feedback Assistant report was the right call.
Topic: Media Technologies SubTopic: Audio Tags:
Replies
Boosts
Views
Activity
2d