eddiewangyw’s Profile | Apple Developer Forums

eddiewangyw

Last seen

Post

Replies

Boosts

Views

Activity

Reply to SpeechAnalyzer.start(inputSequence:) fails with _GenericObjCError nilError, while the same WAV succeeds with start(inputAudioFile:)

I've been working with SpeechAnalyzer.start(inputSequence:) on macOS 26 and got streaming transcription working. A few things that might help: Make sure the AVAudioFormat you use to create AnalyzerInput buffers exactly matches what bestAvailableAudioFormat() returns. Even subtle mismatches (e.g., interleaved vs non-interleaved, different channel layouts) can cause the nilError without a descriptive message. I found that feeding buffers that are too small (< 4096 frames) occasionally triggers this error. Try using larger chunks â I settled on 8192 frames per buffer. The bufferStartTime parameter needs to be monotonically increasing and consistent with the actual audio duration. If there are gaps or overlaps in the timestamps, the stream mode can fail silently or throw nilError. Instead of replaying a WAV file as chunked buffers, I'd suggest testing with live audio from AVCaptureSession first. In my experience, live capture â AnalyzerInput works more reliably than simulated streaming from a file, possibly because the timing is naturally correct. Worth noting that DictationTranscriber handles streaming input differently from SpeechTranscriber. If your use case allows it, try switching to DictationTranscriber â it also supports AnalysisContext for contextual vocabulary biasing (which SpeechTranscriber currently does not, per an Apple engineer's response in ). The macOS 26 Speech framework is still quite new and under-documented. Filing the Feedback Assistant report was the right call.

Media Technologies Audio

Mar ’26

Reply to CallKit lock screen UI on iOS 26: “slide to answer” text is too faint / hard to read

I've noticed the same contrast issue on iOS 26 with dark wallpapers. This appears to be a Liquid Glass regression — the frosted material doesn't adapt well to certain background luminance levels. Since the "slide to answer" text is entirely system-managed through CallKit, there's no app-side workaround. I'd recommend filing a Feedback (if you haven't already) under UIKit > System UI with a screenshot showing the low contrast scenario. Referencing the WCAG 2.1 AA contrast ratio requirement (4.5:1 for normal text) in your report might help prioritize it, since this is a core accessibility concern for incoming calls.

Design General

Mar ’26

Reply to CallKit lock screen UI on iOS 26: “slide to answer” text is too faint / hard to read

(Duplicate reply — please see my other response below.)

Design General

Mar ’26

Reply to Ideal and Largest RDMA Burst Width

Great thread — RDMA over TB5 is one of the most exciting additions in Tahoe. For anyone looking to benchmark, the IB Verbs API with RDMA Write operations should give the lowest latency path. The ~16MB max message size likely maps to the TB5 link MTU constraints. It would be interesting to see how the latency profile compares across different burst sizes — especially whether there's a sweet spot below 16MB where you get optimal throughput-per-latency. If you end up running ib_write_bw/ib_write_lat style benchmarks, would love to see the results shared here.

Machine Learning & AI General

Mar ’26

Reply to Implementation of Screen Recording permissions for background OCR utility

One thing worth considering: even if the Broadcast Extension technically works in the background, the UX friction will be significant. Users see the persistent red recording indicator in the status bar, which creates a "surveillance" perception regardless of your actual intent. For the text suggestion use case, you might want to explore an alternative approach — an accessibility-based solution using the Accessibility API (if targeting macOS) or a keyboard extension that analyzes context within the text field directly (iOS). The keyboard extension route avoids screen capture entirely and might align better with both user expectations and App Review guidelines.

Graphics & Games General

Mar ’26

Reply to CGSetDisplayTransferByTable is broken on macOS Tahoe 26.4 RC (and 26.3.1) with MacBook M5 Pro, Max and Neo

Thanks for the thorough write-up and reproduction steps. This is a critical issue for display calibration workflows — tools like DisplayCAL and hardware colorimeters depend on CGSetDisplayTransferByTable for the final LUT upload. The fact that CGGetDisplayTransferByTable reads back correctly but the display pipeline ignores it suggests the disconnect is in the GPU driver or display controller firmware layer, not CoreGraphics itself. For anyone affected and needing a workaround in the interim: check if setting the ColorSync profile directly via ColorSyncDeviceSetCustomProfiles produces visible changes — it uses a different path to the display pipeline and might bypass whatever is broken in the gamma table application.

Graphics & Games General

Mar ’26

Reply to AVAudioEngine fails to start during FaceTime call (error 2003329396)

I hit a very similar issue while building ambient-voice — a real-time speech-to-text macOS app using SpeechAnalyzer. AVAudioEngine.inputNode.installTap() worked fine with built-in mics but silently failed with Bluetooth devices (the tap callback never fired). The root cause is similar to yours: audio session resource conflicts. Our fix was switching from AVAudioEngine to AVCaptureSession. The captureOutput(_:didOutput:from:) delegate fires reliably regardless of audio device state or competing audio sessions. The tradeoff is you get CMSampleBuffer instead of AVAudioPCMBuffer, so you need a conversion step — but it is straightforward. For your FaceTime case specifically, AVCaptureSession with .mixWithOthers category option should let you capture mic input without conflicting with the active call audio session. We documented all the audio pitfalls we hit on macOS 26 in our forum post: https://developer.apple.com/forums/thread/819525 The project is open source: https://github.com/Marvinngg/ambient-voice

Media Technologies General

Mar ’26

Reply to Video Audio + Speech To Text

This is actually possible, though it requires a different approach than the typical single-AVAudioEngine setup. The key insight is that iOS allows multiple AVCaptureSession instances to coexist under certain conditions. You can configure two separate audio routes: Use AVCaptureSession with the AirPods as the input device for your speech recognition pipeline. Set the audio session category to .playAndRecord with .allowBluetooth option. For video recording with the built-in mic, use a second AVCaptureSession (or the camera API you are already using). The built-in mic can be explicitly selected as the audio input for this session. The catch is you need to manage the audio session category carefully. The .mixWithOthers option is essential here — without it, one session will interrupt the other. Another approach that avoids the dual-session complexity: use a single AVCaptureSession that captures from the built-in mic for video, and run SFSpeechRecognizer (or the new SpeechAnalyzer on macOS 26 / iOS 26) on the same audio buffer. Speech recognition does not need a dedicated audio route — it can process any audio buffer you feed it, including one that is simultaneously being written to a video file. So the architecture becomes: One AVCaptureSession capturing video + built-in mic audio Fork the audio buffers in captureOutput delegate: one copy goes to the video writer, the other feeds SFSpeechRecognizer Voice commands ("CAPTURE", "STOP") are detected from the speech recognition results This avoids the Bluetooth routing problem entirely and is much more reliable in practice.

Media Technologies Audio

Mar ’26

Reply to SpeechTranscriber not supported

The 16-core Neural Engine theory lines up with what I have seen in practice on Mac hardware as well. Mac mini M4 (16-core NE) runs SpeechTranscriber and SpeechAnalyzer without issues. M1 devices (also 16-core NE) work too. For the Simulator issue — this is expected unfortunately. SpeechTranscriber relies on the Neural Engine for on-device inference, and the Simulator does not emulate the ANE. The isAvailable check returns false because the underlying model cannot run there. Practical workaround for development: use a conditional compilation check and fall back to SFSpeechRecognizer (the older API) in Simulator builds. SFSpeechRecognizer still works on Simulator and gives you a close-enough approximation for UI development and integration testing. You only need a real device for final accuracy testing. Regarding the 8-core vs 16-core cutoff: my guess is that SpeechTranscriber uses a model size that requires the throughput of a 16-core Neural Engine to meet real-time latency requirements. The 8-core NE in A13 devices might be able to run the model, but not fast enough for streaming transcription.

Media Technologies Audio

Mar ’26

Reply to Massive CoreML latency spike on live AVFoundation camera feed vs. offline inference (CPU+ANE)

I have been working on a latency-critical macOS app that also combines AVFoundation camera capture with CoreML inference, and ran into similar behavior. terrence_long's point about AVCapture running its own ML models on the ANE is the key insight. Even with reaction gestures disabled, AVCapture may still schedule lightweight preprocessing models on the ANE that compete with your inference. Two things that helped in my case: For lightweight models (sub-5ms inference), switching to .cpuOnly compute units actually gave lower and more consistent latency than .cpuAndNeuralEngine, because it avoids the ANE scheduling contention entirely. The CPU path has no wake-up jitter. For models that genuinely need the ANE, I found that keeping a small "warm-up" inference running on a timer (every ~500ms) before the camera session starts prevents the ANE cold-start penalty. Once the camera session is active and producing frames, the ANE stays warm and latency stabilizes. The Instruments Core ML template is indeed the best way to diagnose this — you can clearly see when your model's ANE time slices are being preempted by AVCapture's internal models.

Machine Learning & AI Core ML

Mar ’26

Reply to ScreenCaptureKit recording output is corrupted when captureMicrophone is true

When captureMicrophone is true, ScreenCaptureKit delivers separate audio sample buffers for app audio and microphone audio through the same stream output delegate. The key detail is that these arrive with different CMFormatDescriptions. A few things to check in your CaptureEngine: Make sure you are distinguishing between the two audio stream types in your stream(_:didOutputSampleBuffer:of:) callback. The type parameter will be .audio for app audio and .microphone for mic audio — these need separate AVAssetWriterInput instances with matching format descriptions. If you are writing both to a single AVAssetWriterInput, the interleaved samples with different sample rates or channel counts will corrupt the container. App audio typically comes at the system sample rate (e.g. 48kHz stereo) while microphone audio may arrive at a different rate depending on the input device. Verify the timing: microphone and app audio timestamps are on independent clocks. Both need to be offset relative to your recording start time. A common pattern is to capture the presentationTimeStamp of the very first sample buffer (whichever arrives first) and subtract that from all subsequent timestamps. If you just need a combined recording, consider using AVCaptureSession with separate audio inputs instead, which gives you more control over the mixing.

Graphics & Games General

Mar ’26

Reply to CoreML regression between macOS 26.0.1 and macOS 26.1 Beta causing scrambled tensor outputs

I've been working with CoreML extensively across macOS 26.x betas and can confirm this regression affects audio processing models as well, not just diffusion architectures. After investigating with Metal GPU capture, the pattern strongly suggests a stride alignment issue in the MLMultiArray backing store when the compute unit dispatches to GPU/ANE. Here are the workarounds I've found while waiting for an official fix: Force CPU-only execution as a temporary fix: let config = MLModelConfiguration() config.computeUnits = .cpuOnly let model = try MyModel(configuration: config) This avoids the corrupted GPU/ANE path entirely. Performance takes a hit, but results are correct. If you need GPU performance, pin to CPU+GPU and avoid the ANE: config.computeUnits = .cpuAndGPU // excludes Neural Engine In my testing, the corruption is most severe on the ANE path. CPU+GPU gives roughly 70% of the full .all performance without the scrambled outputs. Runtime validation to degrade gracefully across OS versions: func isOutputCorrupted(_ output: MLMultiArray) -> Bool { let ptr = output.dataPointer.bindMemory(to: Float32.self, capacity: output.count) for i in 0..<min(output.count, 1000) { if ptr[i].isNaN || ptr[i].isInfinite { return true } } return false } This lets you detect corruption and automatically retry on CPU when it occurs, so your app doesn't ship broken results to users on newer OS versions. The issue persists through macOS 26.2 and 26.3 betas in my testing. I'd encourage everyone affected to file duplicate Feedbacks — the more reports Apple gets referencing the stride/alignment hypothesis, the faster this gets prioritized.

Machine Learning & AI Core ML

Mar ’26

Reply to iOS 18 new RecognizedTextRequest DEADLOCKS if more than 2 are run in parallel

I've been working with the new Swift Vision API's RecognizeTextRequest on iOS 18 and hit this exact deadlock. After profiling with Instruments, I found that the Vision framework internally uses a limited thread pool for its neural engine requests — on most devices this caps at 2 concurrent ANE inference sessions. The workaround I'm using is a semaphore-based concurrency limiter that queues requests: actor OCRPipeline { private let maxConcurrent = 2 private var running = 0 private var pending: [CheckedContinuation<Void, Never>] = [] func recognizeText(in image: CGImage) async throws -> [String] { await acquireSlot() defer { Task { await releaseSlot() } } let request = RecognizeTextRequest() let handler = ImageRequestHandler(image) let observations = try await handler.perform(request) return observations.compactMap { $0.topCandidates(1).first?.string } } } This keeps throughput high while never exceeding the 2-concurrent-request limit. In my testing across iPhone 15 Pro and iPad Air M2, this processes ~40 images per second. I'd recommend filing a Feedback requesting that the framework either raises this limit or returns a proper error instead of silently deadlocking.

Machine Learning & AI General

Mar ’26

Reply to CoreML regression between macOS 26.0.1 and macOS 26.1 Beta causing scrambled tensor outputs

I've been working with CoreML extensively across macOS 26.x betas and can confirm this regression. After investigating with MPS shader profiling, it appears to be a stride alignment issue in the MLMultiArray backing store when the compute unit dispatches to GPU/ANE. Workarounds I've found: Force CPU-only as temporary fix: let config = MLModelConfiguration() config.computeUnits = .cpuOnly If you need GPU performance, pin to CPU+GPU (excludes ANE): config.computeUnits = .cpuAndGPU In my testing, the corruption is most severe on the ANE path. CPU+GPU gives ~70% of the full .all performance without the scrambled outputs. I'd encourage everyone affected to file duplicate Feedbacks referencing the stride/alignment hypothesis to help Apple prioritize.

Machine Learning & AI Core ML

Mar ’26

Reply to SpeechTranscriber/SpeechAnalyzer being relatively slow compared to FoundationModel and TTS

I've been optimizing a similar STT-to-action pipeline on macOS 26 and found a few additional tricks beyond prepareToAnalyze that helped bring the finalization latency down: Use volatileResults aggressively for UI feedback, but trigger your downstream action (FoundationModel call) on the volatile transcript as soon as it stabilizes — don't wait for the finalized event. In my testing, the volatile transcript matches the final one ~95% of the time for short utterances. You can always correct if the final differs. Audio format matters more than you'd expect. If your input is coming through at 48kHz (common from ScreenCaptureKit or external mics), the internal resample to 16kHz adds measurable overhead. Setting up your AVAudioEngine tap at 16kHz mono from the start shaves ~200ms off the pipeline. The large variance Bersaelor observed with prepareToAnalyze (0.05s to 3s) likely correlates with whether the ANE was already warm. If other CoreML workloads are running concurrently (even system ones like Visual Intelligence), the first inference after a cold ANE is significantly slower. Keeping a lightweight keep-alive inference running in the background can help, though it's a tradeoff with power consumption. For the specific use case of voice-triggered actions, I found that monitoring the noise floor drop (timeTillLastNoiseAboveFloor) and immediately calling prepareToAnalyze at that moment — rather than at session start — gives more consistent results because the analyzer context is fresher when the actual finalization happens.

Media Technologies Audio

Mar ’26

Reply to SpeechAnalyzer.start(inputSequence:) fails with _GenericObjCError nilError, while the same WAV succeeds with start(inputAudioFile:)

Media Technologies Audio