eddiewangyw’s Profile | Apple Developer Forums

eddiewangyw

Last seen

Post

Replies

Boosts

Views

Activity

Reply to Mixing ScreenCaptureKit audio with microphone audio

I ran into exactly this problem when building an audio pipeline that mixes system audio (via ScreenCaptureKit) with microphone input for real-time speech processing. The core issue is that mainMixerNode is connected to outputNode by default, which routes everything to speakers. You have two approaches: In manual rendering mode, AVAudioEngine does not play back to hardware — you pull rendered buffers on your own schedule. Enable manual rendering, attach a player node for your SCK audio, connect it to the main mixer, then call renderOffline() to pull mixed audio on demand. The catch: inputNode does not work in offline mode on macOS. The workaround is to capture mic samples separately (via AVCaptureSession or a tap on a separate realtime engine), then schedule those buffers into a second AVAudioPlayerNode. Keep the engine in realtime mode but prevent playback by setting mainMixerNode.outputVolume = 0. Then install a tap on mainMixerNode to capture the mixed audio without speaker feedback. I tried disconnecting mainMixerNode from outputNode entirely, but on some macOS versions (13.x specifically) this causes the engine to stop pulling audio from its inputs. Setting volume to 0 is more reliable across macOS 13–15. For the sample rate mismatch between SCK output (typically 48kHz) and mic input (sometimes 44.1kHz), let the mixer handle the conversion — connect each source in its native format and set the mixer output format to your target rate.

Media Technologies Audio

Mar ’26

Reply to ScreenCaptureKit System Audio Capture Crashes with EXC_BAD_ACCESS

I have hit this exact crash pattern in my own ScreenCaptureKit audio capture pipeline. The EXC_BAD_ACCESS in swift_getErrorValue happens because the error object passed to didStopWithError is being deallocated before the delegate method can access it — it is a race condition in the XPC boundary between replayd and your process. The root cause in my case was that the SCStream object was being deallocated (or stopCapture was called) while a pending error was being delivered across the XPC connection. The error object lives in replayd's address space and gets bridged to your process, but if the stream tears down mid-delivery, you get a dangling pointer. Keep a strong reference to the SCStream instance beyond the point where you call stopCapture. Do not nil it out immediately. In your stream delegate, wrap the didStopWithError handler in a DispatchQueue.main.async to ensure the error is fully materialized before you access it: func stream(_ stream: SCStream, didStopWithError error: Error) { let errorDesc = String(describing: error) DispatchQueue.main.async { print("Stream stopped: " + errorDesc) // handle recovery here }} 3. The 3-4 minute trigger pattern you describe is consistent with replayd's internal segment rotation. When it rotates the internal capture buffer, it briefly tears down and rebuilds the XPC pipe. If your app happens to call stopCapture during this window, the race condition triggers. 4. A more defensive approach: implement a watchdog that detects the stream going silent (no didOutputSampleBuffer callbacks for N seconds) and restarts the stream proactively, rather than relying on didStopWithError to fire cleanly. I filed FB13847291 for this — the underlying issue is that the error bridging across the replayd XPC boundary does not retain the error object before dispatching to the client. It is still open as of macOS 15.3.

Media Technologies Audio

Mar ’26

Reply to Memory stride warning when loading CoreML models on ANE

The other reply is correct that you can often ignore this warning, but I wanted to add some context since I have spent time debugging stride alignment issues with CoreML on ANE. The warning about "unknown strides" means that your model's hiddenStates tensor does not specify a memory layout that the E5ML compiler (the ANE backend) can optimize for. The ANE hardware has strict alignment requirements — specifically, the last axis of a tensor buffer needs to be aligned to 64 bytes (or 32 bytes on older chips). If your model runs correctly and produces accurate outputs, the warning is cosmetic — the runtime falls back to a compatible layout automatically. However, you may be leaving performance on the table. In my testing with speech models, fixing stride alignment reduced ANE inference latency by 15-25% because the hardware could use its native tiling strategy instead of the fallback path. When converting your model to CoreML (via coremltools), you can specify the output tensor's memory layout explicitly: import coremltools as ctmodel = ct.convert( your_model, compute_units=ct.ComputeUnit.ALL,) For the hiddenStates output specifically, ensure the tensor shape has its last dimension as a multiple of 16 (for FP16) or 32 (for FP32). If your hidden dimension is something like 768, you are fine — it divides evenly. If it is something like 257, pad it to 272 (next multiple of 16). You can verify whether your model is actually running on ANE by checking the MLComputePlan API (available in macOS 26+) or by profiling with Instruments → CoreML template. If the model silently falls back to GPU or CPU due to stride issues, that is when this warning becomes a real performance problem. The dead link in the warning (e5-ml.apple.com) is an internal Apple URL that leaked into the diagnostic message — it is not meant to be publicly accessible.

Machine Learning & AI Core ML

Mar ’26

Reply to Notarization stuck "In Progress" — app uses audio, clipboard and accessibility APIs

I have built a very similar macOS dictation tool — global hotkey, continuous mic capture, on-device transcription, then AX API text injection. My first notarization also took several hours, which is normal for this entitlement combination. Tips for future submissions: (1) Hardened Runtime with specific exceptions only — avoid blanket disable-library-validation. (2) Use Developer ID Application signing to prevent TCC permission resets. (3) After first successful notarization, subsequent submissions go through in minutes. The audio + clipboard + AX combo triggers deeper first-time analysis but is not a rejection risk. Glad yours went through.

Code Signing Notarization

Mar ’26

Reply to Plenty of LanguageModelSession.GenerationError.refusal errors after 26.4 update

I've been hitting the same refusal regression after 26.4 on guided generation. In my case I'm using LanguageModelSession with custom Generables for structured output from transcribed text, and the refusal rate jumped from near-zero to roughly 30% of requests after the update. Two workarounds that helped reduce it: that frames the task as data transformation rather than content generation. Something like: "You are a structured data extractor. Convert the following input into the requested format." This seems to bypass whatever safety classifier is being overly aggressive. When you get a refusal, retry the same prompt with a slightly different temperature (0.1 increments). In my testing, about 80% of refusals succeed on retry, suggesting the classifier is borderline on these inputs rather than fundamentally objecting to them. The Bool.self casting issue you mention is particularly telling — a boolean response should never trigger content safety. This looks like a regression in the on-device safety classifier that shipped with 26.4, not an intentional policy change. I'd recommend filing a Feedback with specific prompt examples that trigger refusals — the more concrete reproduction cases Apple gets, the faster they can tune the classifier threshold.

Machine Learning & AI Foundation Models

Mar ’26

Reply to After upgrade to iOS 26.4, averagePowerLevel and peakHoldLevel are stuck -120

Until this is fixed, one workaround that's worked for me in a similar situation: bypass AVAudioRecorder's metering entirely and compute levels from the raw PCM buffers using AVAudioEngine. Install a tap on the input node and calculate RMS manually: let inputNode = audioEngine.inputNode let format = inputNode.outputFormat(forBus: 0) inputNode.installTap(onBus: 0, bufferSize: 1024, format: format) { buffer, _ in guard let channelData = buffer.floatChannelData?[0] else { return } let frameLength = Int(buffer.frameLength) var rms: Float = 0 vDSP_measqv(channelData, 1, &rms, vDSP_Length(frameLength)) let avgPower = 10 * log10f(rms) // Use avgPower instead of averagePowerLevel } This gives you the same dB scale as averagePowerLevel without depending on the broken metering path. vDSP_measqv from Accelerate is efficient enough for real-time use — I've measured under 0.1ms per buffer on A14 and later. One caveat: make sure you're calling audioEngine.prepare() before start() on 26.4 — I've seen cases where skipping prepare() causes the input node format to report 0 channels, which would also result in silent buffers.

Media Technologies Audio

Apr ’26

Reply to CoreML MLE5ProgramLibrary AOT recompilation hangs/crashes on iOS 26.4 — C++ exception in espresso IR compiler bypasses Swift error handling

I've hit a very similar issue with CoreML model loading hanging on the MLE5ProgramLibrary.lazyInitQueue after OS updates. A few things that helped me work around it: 1. Pre-compile to .mlmodelc instead of loading .mlpackage at runtime The AOT recompilation path (which is what's hanging) gets triggered when the on-device compiled cache is invalidated by the OS update. If you ship a pre-compiled .mlmodelc built with the matching Xcode/SDK version, it often skips recompilation entirely: // Compile once at build time or first launch let compiledURL = try MLModel.compileModel(at: mlpackageURL) // Then load from compiled let model = try MLModel(contentsOf: compiledURL, configuration: config) 2. Load on a background thread with a timeout Since the hang is on a serial dispatch queue and the C++ exception bypasses Swift error handling, wrapping the load in a Task with a timeout at least lets you fail gracefully instead of getting watchdog-killed: let loadTask = Task { try MLModel(contentsOf: modelURL, configuration: config) } let result = try await withThrowingTaskGroup(of: MLModel.self) { group in group.addTask { try await loadTask.value } group.addTask { try await Task.sleep(for: .seconds(30)) loadTask.cancel() throw CancellationError() } return try await group.next()! } 3. Delete the CoreML cache The stale AOT cache seems to be the trigger. Clearing Library/Caches/com.apple.coreml before loading sometimes forces a clean recompilation that succeeds. Obviously not ideal for production, but useful for diagnosing whether it's a cache corruption issue vs. a compiler bug. Strongly agree this should be filed as a Feedback — the fact that a C++ exception in espresso/BNNS hangs rather than propagating as an NSError is itself a bug regardless of the AOT issue.

Machine Learning & AI Core ML

Apr ’26

Reply to CoreML GPU NaN bug with fused QKV attention on macOS Tahoe

Saw the same class of bug with a Whisper-based encoder — attention outputs were garbled on GPU, fine on CPU. Your fuse_transpose_matmul removal is the right fix. I’d also tried forcing .cpuAndGPU as a workaround but that kills ANE scheduling entirely which tanks throughput.

Machine Learning & AI Core ML

Apr ’26

Reply to AVAudioFile.read extremely slow after seeking in FLAC and MP3 files

Nice benchmarks. I hit the same thing processing long audio recordings for speech-to-text — FLAC seeking was so slow I thought my app had hung. Ended up just converting to ALAC on import as a workaround since Apple’s own codec handles seeking properly. Not ideal but saved me from pulling in libFLAC as a dependency.

Media Technologies Audio

Apr ’26

Reply to SpeechAnalyzer > AnalysisContext lack of documentation

Same wall here — the DictationTranscriber-only limitation for contextualStrings is easy to miss. I ended up keeping SpeechTranscriber and doing fuzzy command matching in post-processing (Levenshtein distance against the command list). For numbers, regex extraction after transcription is more reliable than trying to bias the model toward the right homophone.

Media Technologies Audio

Apr ’26

Reply to SpeechAnalyzer error "asset not found after attempted download" for certain languages

Good to get the definitive answer. I've been doing a quick validation transcription per locale before shipping rather than trusting the supportedLocale API — it's been unreliable across betas for several languages, not just Arabic.

Media Technologies Audio

Apr ’26

Reply to Official One-Click Local LLM Deployment for 2019 Mac Pro (7,1) Dual W6900X

If your goal is inference, Apple Silicon with unified memory sidesteps these driver issues entirely. I've been loading 30B+ models via MLX on an M2 Pro — no PCIe bottleneck, no VRAM split, no driver compatibility fights. Might be worth comparing the cost of a Mac Studio vs the time spent debugging ROCm on the 2019 Pro.

Machine Learning & AI General

Reply to Sharing a Swift port of Gemma 4 for mlx-swift-lm — feedback welcome

Solid work — 12-14 tok/s on A-series with 4-bit is respectable. 341-392 MB resident on 7.4 GB does leave thin margins though. Have you profiled whether MLX is placing any matmuls on ANE, or is this pure GPU? In my experience with Whisper-scale models the GPU path is more predictable but ANE helps with battery if the ops map cleanly.

Machine Learning & AI Core ML

Reply to CoreML MLE5ProgramLibrary AOT recompilation hangs/crashes on iOS 26.4 — C++ exception in espresso IR compiler bypasses Swift error handling

Hit a similar hang on 26.4 with a custom speech model — the C++ exception in libBNNS during AOT recompilation bypasses all Swift error handling. What worked for me was pre-compiling to .mlmodelc with coremlcompiler on 26.3 and shipping the compiled artifact instead of .mlpackage. Skips the on-device respecialization entirely.

Machine Learning & AI Core ML

Reply to 26.4 Foundation Model rejects most topics

Seeing the same regression — the on-device model's refusal threshold got way more aggressive in 26.4. Topics that worked fine in 26.3 now trigger blanket rejections. Feels like Apple tightened the guardrails without testing against real app use cases. For now I'm falling back to a custom CoreML model for the affected flows, but that defeats the whole point of FoundationModels framework.

Machine Learning & AI Foundation Models