VoIP PKPushKit notifications not delivered when powerd assertion policy 3 hits before apsd completes APNs reconnection

Question

Created Jun ’26

Replies 7

Boosts 0

Participants 2

We are seeing a reproducible scenario on iOS 26 where incoming VoIP push notifications are never delivered when the device has been idle and screen-locked for 30+ minutes. The same failure was observed simultaneously on WhatsApp, and Microsoft Teams and our app as well, on the same device during one incident, confirming this is a platform-level issue and not specific to our implementation.

We have captured full system logs across three separate incidents. Below are the exact log sequences.

Incident — All VoIP apps fail simultaneously (Our app, WhatsApp, Teams)

Device: iPhone 17 Pro · iOS: 18.x · Network: 5G NSA (kNRNSA)

The device had been idle with the screen locked for approximately 31 minutes. An LTE cell handover caused apsd to begin an APNs reconnection. powerd entered policy 3 before apsd reached channel-flow viable, defuncting the app.

17:45:59.562  symptomsd  New RRC 0 when previous 1 from pdp_ip0
              ↑ Radio drops to RRC_Idle. Device has been idle since 17:14:56 (31 min).

17:46:01.206  CommCenter  #I Mapping the registration state to kRegisteredHome
              ↑ LTE cell handover triggers RRC reconnect.

17:46:01.330  apsd  [C138 IPv4#b71cac13:5223 ready parent-flow
                    (satisfied (Path is satisfied), interface: pdp_ip0[lte],
                    scoped, ipv4, ipv6, dns, expensive, uses cell, LQM: good)]
                    event: path:satisfied_change @594.391s
              ↑ APNs path re-satisfied. Reconnection begins.
                channel-flow viable NOT yet reached — TLS handshake still in progress.

17:48:08.057  apsd  Powerd has requested assertion activity update
              ↑ Warning: powerd about to change policy.

              ── 2 minutes 40 seconds after APNs reconnect started ──

17:48:41.248  powerd  Sending com.apple.powerd.assertionpolicy 3
17:48:41.250  apsd    Update assertion policy 3
17:48:41.250  powerd  Activity changes from 0x1 to 0x0. UseActiveState:0
17:48:41.250  powerd  hidActive:0 displayOff:1 assertionActivityValid:0
              ↑ Screen off, device locked. OS enters restricted idle.
                apsd restricted. APNs reconnection abandoned.

17:48:42.669  kernel  necp_process_defunct_list: necp_update_client abort
                      nexus error (2) for pid 1518 Comera
              ↑ Kernel terminates Comera's network stack via NECP.
                No API available to prevent this.
                WhatsApp and Teams remain suspended — no DEFUNCT,
                but apsd in policy 3 means no push delivery for them either.

              ── Dead zone: VoIP pushes for all 3 apps undeliverable ──

17:50:04.028  powerd  Process CommCenter.104 Created SystemIsActive
                      "com.apple.ipTelephony.sipIncoming.cell"
              ↑ Incoming cellular PSTN call forces system wake.

17:50:04.494  powerd  Sending com.apple.powerd.assertionpolicy 0
17:50:04.598  apsd    Update assertion policy 0
              ↑ Full wake. Queued VoIP pushes from Comera, WhatsApp,
                and Teams are delivered simultaneously.
Gap between channel-flow viable needed and actual delivery: 4 minutes 3 seconds. Recovery trigger: external cellular call from carrier — not any app action.

Working case (same test, different conditions)

Device: iPhone 17 Pro · iOS: 26.5.1 · Screen unlocked, no hotspot

19:2x:xx  apsd  policy state {downgradeWhenLocked: NO,
                               isSystemLocked: NO,
                               isConnectedOnUltraConstrainedInterface: NO}
          ↑ Device unlocked. No policy 3. Comera NOT defuncted.
            Push delivered. Call rings normally.

Our implementation

PKPushRegistry is held strongly and re-registered on every applicationWillEnterForeground reportNewIncomingCall(with:update:completion:) is called synchronously within pushRegistry(_:didReceiveIncomingPushWith:) VoIP background mode entitlement is present App has com.apple.developer.pushkit.voip entitlement

Questions

Is there any entitlement or API to prevent NECP from defuncting a process holding an active PKPushRegistry? The VoIP push entitlement exists for exactly this background delivery scenario.

Is pushDisallowed being applied to apps with VoIP push entitlements when InternetSharingActive == 1 intentional? Should VoIP entitlements exempt an app from the Internet Sharing Policy gate in dasd?

Is there a documented way to know when apsd has fully completed APNs reconnection (i.e. channel-flow viable) so a server can time push retries more accurately within a call validity window?

What is the recommended apns-expiration value for VoIP pushes to survive brief APNs reconnection windows without exceeding a 60-second call validity period?

Full log stream captures available for all incidents.

Answer 1

DTS Engineer OP

Apple

Jun ’26

Part 1:

The device had been idle with the screen locked for approximately 31 minutes. An LTE cell handover caused APSD to begin an APNs reconnection. Powerd entered policy 3 before APSD reached channel-flow viable, defuncting the app.

So, as a general comment, I have to warn you that, in my experience, accurately inferring device-level network activity from system log activity is EXTREMELY difficult. It's generally possible to determine the basic cause of a given push failure ("the device wasn't connected“), but determining more than that is very, very tricky. Messages are often misleading or distracting, and it's very easy to assume that a given message is more relevant/meaningful than it really is. It's also easy to get so focused on micro-analysis that you totally overlook more high-level problems.

Case in point, if your analysis is correct (which I'm not sure it fully is), then the implication here:

              ↑ APNs path re-satisfied. Reconnection begins.
                channel-flow viable NOT yet reached — TLS handshake still in progress.
 
17:48:08.057  apsd  Powerd has requested assertion activity update
              ↑ Warning: Powerd about to change policy.
 
              ── 2 minutes 40 seconds after APNs reconnect started ──

...is that APNs spent 2+ minutes without being able to establish a viable connection on a theoretically functional LTE connection. I don't think there was a working LTE connection.

Within that context, it's important to understand that the "basic" cycle of any push implementation over cellular is going to end up looking something like this:

Cellular connectivity breaks down, breaking the APNS connection.
The APNS process attempts to reconnect using whatever radio connectivity it's able to maintain, keeping the device awake, as running the full network stack requires.
To save power, the APNS process eventually "gives up" when it's unable to establish a connection.
The device sleeps.
At some point, the device wakes up with a functioning network connection, allowing the APNS process to reconnect and resume connectivity.

The first thing I'd highlight here is that it will be fairly common for #5 to be associated with an incoming cell call. The cellular network is built around delivering calls, which means incoming calls are the primary wake "source" when circumstances mean the baseband has no other connectivity source. More to the point, this is a situation where you're only counting the successes, as the only time you'll see these "cell call wakes" is when the radio situation means that the device is capable of completing a call... since you can't "see" the calls the baseband never "got".

Secondly, the only real "variable" in the steps above is how long the device stays in #2 before transitioning to #3, with the longer wait wasting more power.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Answer 2

DTS Engineer OP

Apple

Jun ’26

Part 2, Getting to your questions:

Questions

Is there any entitlement or API to prevent NECP from defuncting a process holding an active PKPushRegistry?

No, but that's because PKPushRegistry is totally irrelevant to this entire process. Part of the core design of our VOIP architecture is that your app has NO role whatsoever in the process that actually delivers VoIP notifications to your device. That is, the process used to deliver VoIP notifications to your device works EXACTLY the same if your app is:

Awake in the foreground.
Awake in the background.
Suspended in the background.
Not running at all.

The system doesn't even know if your app is running AT ALL until the very last stage where callservicesd is actually delivering the notification to your app, which means your app can't really have any role in that delivery process. PKPushRegistry is simply the mechanism the system uses to deliver notifications into your app, not part of the delivery process itself.

The VoIP push entitlement exists for exactly this background delivery scenario.

If you're talking about the old "Unrestricted PushKit" entitlement, then no, that's not what it does.

Is pushDisallowed being applied to apps with VoIP push entitlements when InternetSharingActive == 1 intentional? Should VoIP entitlements exempt an app from the Internet Sharing Policy gate in dasd?

I'm not sure what you're referring to here.

Is there a documented way to know when apsd has fully completed APNs reconnection (i.e. channel-flow viable) so a server can time push retries more accurately within a call validity window?

No, nor would it be all that helpful. One of the longstanding issues with PushKit has been its tendency to deliver pushes LONG (minutes, sometimes hours) after their intended expiration. That behavior exists because there is a significant time window where:

The device has lost connectivity and is unreachable.
The push server doesn't "know" that yet (because the connection hasn't timed out), so it queues pushes for delivery.

However, that behavior works "for" incoming calls, not just against them. That is, if you send a push to a device that's experiencing intermittent connectivity and that device reconnects, then it's VERY likely that the push will be delivered no matter what expiration time you choose.

That leads to here:

What is the recommended apns-expiration value for VoIP pushes to survive brief APNs reconnection windows without exceeding a 60-second call validity period?

SO, historically my answer would have been to use "apns-expiration=0", since any value greater than 0 increased the likelihood of receiving long expired pushes.

However, the new delegate we introduced in iOS 26.4 changes that answer. That delegate will inform your app whether or not it's required to report a push and, critically, that determination is based on the pushes "raw" delivery latency, NOT the pushes’ own expiration. In practical terms, this means your app will not be required to report any push that didn't reach the device within a "reasonable" amount of time [1]. With the new delegate, you can basically use as large an expiration as you want, reporting new calls for the "live" calls and posting "Call Missed" notifications for the rest.

Having said that, my own intuition is that the main benefit of the new flow is simplicity, NOT increased reliability. Many developers start with the assumption that the critical factor here is the push reaching their app, so their entire focus is on "why didn't I get the push". The problem here is that, ignoring special circumstances [2], the PRIMARY reason pushes fail to reach devices... is that network conditions were extremely poor. Stating that more directly, if the network isn't reliable enough for pushes to work, then it's unlikely that VoIP calling will work.

[1] The exact time isn't documented but it's WELL below the 60s window you mentioned.

[2] Particularly on WiFi networks, there are a variety of network-level issues (for example, broken NAT implementations) which can disrupt push (as well as other things), but that's a totally different issue than this case.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Answer 3

Aadil23 OP

Jun ’26

Thank you — this is very helpful, especially the clarification that PKPushRegistry plays no role in delivery and that the new mustReport delegate is the recommended direction. We will adopt that delegate and post a "Call Missed" notification when delivery latency is too high.

Before we close this out, there is one point we'd like your read on, because it's the main reason we escalated.

This is a recent regression, not a long-standing condition. These same users, on the same devices, same carriers, and the same usage pattern, received VoIP calls reliably until approximately mid-May 2026 (about a month ago). The onset was relatively sudden and is now consistently reproducible. Nothing changed on our side in that window — same app build, same push server, same apns-expiration — yet reliability dropped sharply across multiple users at roughly the same time.

It is predominantly a cellular problem, but it is not exclusive to cellular — we have also reproduced it on Wi-Fi, just far more rarely and with much greater difficulty. Combined with the fact that the failure is cross-app (WhatsApp's own VoIP push was delivered ~3 minutes late on the same device in our tests), our working assumption is that something changed at the platform or network layer recently, rather than our implementation degrading.

Question: Were there changes in a recent OS release (or in APNs reconnection / power-management behavior) around that timeframe that could increase how long apsd stays unable to re-establish the APNs connection before the device sleeps? We're trying to understand whether this is expected new behavior or a regression worth a bug report.

For completeness, confirming our current implementation matches your guidance:

apns-expiration = 0 — every VoIP push is sent with apns-expiration = 0 (immediate-or-discard), specifically to avoid the long-delayed expired-push deliveries you described.
We report the call synchronously inside the push handler. The PKPushRegistry is created once and held strongly for the controller's lifetime (not a local var), and we call reportNewIncomingCall(with:update:completion:) synchronously

within pushRegistry(_:didReceiveIncomingPushWith:for:completion:) before invoking the PushKit completion handler:

  final class PushController: NSObject, PKPushRegistryDelegate {

      // Held strongly for the controller's lifetime — not a local var.
      private var voipRegistry: PKPushRegistry?
  
      func enablePushKit() {
          let registry = PKPushRegistry(queue: .main)
          registry.delegate = self
          registry.desiredPushTypes = [.voIP]
          self.voipRegistry = registry            // retained
      }

      func pushRegistry(_ registry: PKPushRegistry,
                        didReceiveIncomingPushWith payload: PKPushPayload,
                        for type: PKPushType,
                        completion: @escaping () -> Void) {

          let uuid = UUID()
          let update = CXCallUpdate()
          update.remoteHandle = CXHandle(type: .generic,
                                         value: payload.dictionaryPayload["caller"] as? String ?? "")
          update.hasVideo = (payload.dictionaryPayload["callType"] as? String) == "video"

          // Reported synchronously, in the same run-loop turn as the push;
          // the PushKit completion handler is only called afterwards.
          provider.reportNewIncomingCall(with: uuid, update: update) { error in
              completion()
          }
      }
  }

The CXProvider is created eagerly at app launch (before PushKit registration) so it is never nil when a push arrives.

Evidence attached — same device and build:

Cellular (frequent): apsd repeatedly logs Connection closed WWAN with isWWANUsable YES, isWiFiUsable NO (no Wi-Fi to fail over to), and no VoIP push is delivered within the call window.
Cross-app (cellular): WhatsApp's own VoIP push (net.whatsapp.WhatsApp) was delivered ~3 min 22 s and ~3 min 34 s late in two separate tests on the same device.
Wi-Fi (typical): APNs delivers the push and the call is reported in ~77 ms (apsd … Received message for enabled topic … → app launched → reportNewIncomingCallWithUUID). This is the common Wi-Fi behavior; however we have also observed the

same failure on Wi-Fi on rare occasions, which is why we don't believe it is purely a cellular-radio condition.

FAIL_17-48_cellular-idle_all-apps-comera-whatsapp-teams.txt

FAIL_13-43_cellular_whatsapp-late-3m34s.txt

FAIL_13-08_cellular_push-never-delivered.txt

Our open question is specifically why this became common only recently, given nothing changed in our app or push pipeline.

Answer 4

DTS Engineer OP

Apple

Jun ’26

First, a quick comment here:

Combined with the fact that the failure is cross-app.

This kind of "cross-app" failure is exactly what I'd expect no matter what was happening. There's only one connection, and our infrastructure doesn't really differentiate between sources in a meaningful way. Delivery failures that are app-specific are basically "always" some kind of app-level issue like mismanaging the push token or a failure in the app’s backend. They generally are not random/intermittent either.

For completeness, confirming our current implementation matches your guidance:

Yep, that all looks fine. FYI, mismanaging PKPushRegistry basically "always" causes your app to crash or be killed and very little else.

Question: Were there changes in a recent OS release (or in APNs reconnection/power-management behavior) around that timeframe that could increase how long apsd stays unable to re-establish the APNs connection before the device sleeps?

To be honest, I don't know. Bugs are certainly possible, so I can't entirely rule out the possibility. However, I'm not aware of any issues, and push failures broadly similar to what you're describing are extremely common and basically always have been. Having investigated a very large number of them, the VAST majority were caused by network issues external to the device. Even cases that most directly looked like a system-level bug have actually turned out to be network/configuration issues. For example, I spent a great deal of time investigating an issue where push delivery was failing, despite all log data indicating that apsd had a fully functional push connection. Further investigation showed that our servers did NOT have a valid connection, which eventually led to the developer discovering that the customer's misconfigured NAT router was severing the WAN side connection (to our server) while actively maintaining the LAN side connection (to the iOS device). This is why I'm reluctant to assume the failure is happening on the device— my overwhelming experience has been that the device simply isn't the component that tends to fail.

That actually leads to my biggest concern, which is around this point:

Our open question is specifically why this became common only recently, given nothing changed in our app or push pipeline.

The developer experience around VoIP apps centers around four basic components:

Your app
Your server
Our server
The local system (iOS)

Because of that, the natural instinct is to assume that any problem is caused by an issue in one of those 4 components. You've already ruled out your app and your server (as you said, nothing changed), and you have limited visibility into our server, so now you're investigating the "last" component, namely the local system.

However, the problem is that this ignores the single most complicated and unpredictable factor, namely "the world", meaning the larger network/usage environment the device and your app operate in. In an earlier message, I said:

So, as a general comment, I have to warn you that, in my experience, accurately inferring device-level network activity from system log activity is EXTREMELY difficult. It's generally possible to determine the basic cause of a given push failure ("the device wasn't connected“), but determining more than that is very, very tricky.

That actually understates the issue. My own experience is that unless one of these two situations applies:

You're able to reliably reproduce the issue in a highly controlled and well-understood environment.
You have detailed information about the EXACT conditions (location, timing, environment) a problem occurred in AND a very large number of sample failures.

...then it's basically impossible to investigate a push failure in a truly meaningful way. That is, any given failure could be caused by:

A system-level bug.
A random failure in some intermediate network component.
A systemic issue in the local environment/configuration.

All of those failures will present in the same way ("the push didn't arrive") and the device (and our servers) simply don't have any way to differentiate between them.

Moving on to a few specifics:

We're trying to understand whether this is expected new behavior or a regression worth a bug report.

The answer to "should I file a bug" is basically always "yes". In cases like this, that means filing a bug that, at a minimum, includes ALL of this data:

A sysdiagnose(s) from the device(s) that are experiencing the failure. Note that while the log can be triggered "awhile" after the immediate failure, the device must NOT have been rebooted since the failure occurred. If it's been rebooted, then I wouldn't even bother reporting it, since the log data loss renders the log largely useless.
A detailed log describing the full timeline of every failure you're reporting, including when you sent the push, when you "expected" it to arrive, and when it actually arrived.
Any other information that might be relevant.

Note that the log file is JUST as critical as the sysdiagnose. Speaking for myself, investigating a sysdiagnose without ANY time data is such an enormous time sink that I just can't justify doing it anymore. That leads to here:

Evidence attached — same device and build:

If you want me to look into the log side of this, please file a bug with the data above and then post the bug number back here.

Finally, a last comment here:

Our open question is specifically why this became common only recently, given nothing changed in our app or push pipeline.

I may be wrong about this, but it sounds like this is happening to some particular subset of your users. This is pretty common, and the first question I'd look closely at here is "what makes those users different".

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Answer 5

Aadil23 OP

Jun ’26

Based on our analysis of the TCP traffic betwen the iPhone and APNs when the device is locked, we found that the TCP connection can remain idle for up to 10 minutes. We also discovered that the carrier CGNAT employ a 5 minute idle timeout on TCP connection, therefore the iPhone will be in a blackout for up to 5 minutes. As seen in the packet capture screenshot below.

Screenshot 2026-06-17 at 5.09.16 PM.png

The last packet received from the APNs server was at 13:41:37.
The iPhone sent a packet at 13:51:12 and kept retransmitting until 13:51:59 for more than 1 minute.
A new TCP connection established at 13:52:12.

It would be helpful if you can confirm the max timeout/keep-alive for APNs TCP connection as reference for Telecom provider.

Answer 6

DTS Engineer OP

Apple

Jun ’26

Based on our analysis of the TCP traffic between the iPhone and APNs when the device is locked, we found that the TCP connection can remain idle for up to 10 minutes. We also discovered that the carrier CGNAT employs a 5-minute idle timeout on TCP connections.

I'm not in a position to verify the details of what you're describing, but the basic answer here is that APNs is a critical component of our ecosystem which every carrier is expected to properly support. By extension, if a carrier's network is configured such that it can create minute-long APNs blackouts, then I'd consider that a critical defect in the carrier’s network that they need to immediately address.

Please file a bug on this with as much detail as you can provide, particularly which carrier you're seeing this on and the specifics of where you're testing. I'd also strongly recommend that you contact the carrier as well through whatever business support channel you have available. There's no reason to discuss these specifics in public, but I do think this is an critical issue that needs to be addressed.

It would be helpful if you can confirm the max timeout/keep-alive for APNs TCP connection as a reference for Telecom providers.

I'm not at all involved in this process, but telecom providers have their own support resources within Apple that specifically exist to ensure they're properly configured to support our ecosystem.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Answer 7

Aadil23 OP

Jun ’26

I have filed a bug report with incident number: FB23374513