@DTS Engineer Hi Kevin!
I am looking to test the sample project and will get back to you on that.
Ruling an issue out, how frequently is the server emitting its keep-alive? One slightly less obvious detail here is that in a naive implementation where one device emits a keep-alive "every 5s" and the other requires a keep-alive "every 5s", you can create a situation where what you're actually depending on/measuring is the consistency of latency between the devices, NOT the reliability of the connection itself.
The server emits a keep-alive every 1s. Every time the app extension receives an event/keep-alive, it stores the last system time indicating when we last received communication. Then every 12s (sorry I forgot we updated from 5 -> 12s), we check "Has the extension received any communication within the previous 12s." So I do not believe we would run into the scenario you are describing.
What component actually sent that push? More specifically:
Is your push provider extension still running, as it's the component that sent the push?
Did some other component/mechanism send that push based on detecting that the extension was not longer running?
The push notifications are sent by our app extension, which is a class conforming to NEAppPushProvider. With the problem we are seeing, the extension is alive the entire time, as we also log when it stops for any reason.
The problem is the extension, which has a TCP connection with the server, will fail to receive any of the events/keep-alives, which are sent every 1s. Once it hits >12s with no data, it terminates the connection and reconnects. This entire time, the extension is still alive.
The strange part is on my busy home network on a TP-Link router, this logic will never fail. If tested on an identical router with only the server and phone, it will start failing after 1min of the phone being locked.
The article "Maintaining a Reliable Network Connection" explicitly states that you should be using the Network framework:
Yes, we are using the Network framework. Our implementation uses Swift-NIO Transport Services (NIOTSConnectionBootstrap), which is Apple's official bridge that provides Swift-NIO's API while using the Network framework underneath for the actual networking operations.
Specifically:
We use NIOTSConnectionBootstrap which leverages the Network framework
We directly configure Network framework parameters like NWParameters, NWProtocolTCP.Options, and NWProtocolTLS.Options
We use NWEndpoint.url() for endpoint creation
From what we understand, Swift-NIO Transport Services is Apple's recommended approach when you need Network framework's reliability and system integration but also require the advanced features that Swift-NIO provides (like our SSE parsing capabilities). So while we're using Swift-NIO's API for convenience, the Network framework is handling all the underlying transport operations as recommended.
And what happened? What did both sides of the connection see? And have you tried monitoring traffic on both the WiFi and Ethernet links?
When doing an ethernet link capture of the server, we can reliably see the keep-alives getting sent every 1s. When doing a monitor mode capture, we know the keep-alives are being sent because the app is parsing them, counting them, and logging the count every 60s. However, we were unable to see the packets for these. Under the assumption that it was due to the size and/or encryption of the connection.
While I will test the sample project, I think it is still relevant to understand:
What does the phone's Wi-Fi radio do when the device is locked? Given we only have this problem when the device is locked and not connected to power.
You previously mentioned that this framework tends to only work properly on "high quality networks." Could you elaborate on what makes a high quality network?
Topic:
App & System Services
SubTopic:
Networking
Tags: