Vectorized API for UDP and Packet Tunnel network extension.

A performance bottleneck we often hit is that we seem to be constrained by issuing a single sys call per packet. On platforms where vectored IO is supported, we can unlock 5x performance gains. Whilst we can read arrays of packets via the network extension API, the memory and concurrency model of that API seems to not be well documented, and I am not aware of any way to do vectored I/O on a UDP socket. Will we see an FFI friendly API for vectorised networking anytime soon?

As an addendum - we are aware of sendmsg_x and recvmsg_x but we dare not ship an iOS app using those functions directly.

Answered by Engineer in 891742022

Thanks for your question! You're absolutely right that being able to amortize the cost UDP sends and receives is important for reducing CPU cost.

The recommended approach to solve this is to use Network.framework, which is uses a userspace network stack and memory maps packets to the kernel. This allows UDP processing to be much more efficient.

Specifically, you'll want to create a UDP connection (NWConnection or nw_connection_t), and when you send and receive, use the "batch" functionality. See https://developer.apple.com/documentation/network/nwconnection/batch(_:). This works in both C and Swift.

Thanks for your question! You're absolutely right that being able to amortize the cost UDP sends and receives is important for reducing CPU cost.

The recommended approach to solve this is to use Network.framework, which is uses a userspace network stack and memory maps packets to the kernel. This allows UDP processing to be much more efficient.

Specifically, you'll want to create a UDP connection (NWConnection or nw_connection_t), and when you send and receive, use the "batch" functionality. See https://developer.apple.com/documentation/network/nwconnection/batch(_:). This works in both C and Swift.

Sorry for just piling on more questions, but is there a C analogue to read and write from a virtual tunnel device?

And is there a way to use this API where the caller is responsible for allocating the buffers? We'd very much like to reuse buffers that are passed to the completion handler of a receive call.

Vectorized API for UDP and Packet Tunnel network extension.
 
 
Q