We are experiencing an issue with Safari in all versions from 18.0 to 18.5 that does not occur in version 17. It affects both iPhones and Macs. And does not happen in Chrome or Windows.
The problem is impacting our customers, and our monitoring tools show a dramatic increase in error volume as more users buy/upgrade to iOS 18.
The issue relates to network connectivity that is lost randomly. I can reliably reproduce the issue online in production, as well as on my local development environment.
For example our website backoffice has a ping, that has a frequency of X seconds, or when user is doing actions like add to a cart increasing the quantity that requires backend validation with some specific frequency the issue is noticable...
To test this I ran a JS code to simulate a ping with a timer that calls a local-dev API (a probe that waits 2s to simulate "work") and delay the next HTTP requests with a dynamic value to simulate network conditions:
Note: To even make the issue more clear, I'm using GET with application/json payload to make the request not simple, and require a Pre-flight request, which doubles the issue.
(async () => {
for (let i = 0; i < 30; i++) {
try {
console.log(`Request start ${i} ${new Date().toLocaleString()}`);
const res = await fetch(`https://api.redated.com:8090/1/*****/probe?`, {
method: 'GET',
mode: "cors",
//headers: {'Content-Type': 'text/plain'},
headers: { 'Content-Type': 'application/json' },
});
console.log(`Request end ${i} ${new Date().toLocaleString()} status:`, res.status);
} catch (err) {
console.error(`Request ${i} ${new Date().toLocaleString()} error:`, err);
}
let delta = Math.floor(Math.random() * 10);
console.log("wait delta",delta);
await new Promise(r => setTimeout(r, 1000 - delta));
}
})();
For simplicity lets see a case where it fails 1 time only out of 10 requests.
(Adjusting the "delta" var on the time interval create more or less errors...)
This are the results:
The network connection was lost error, which is false, since this is on my localhost machine, but this happens many times and is very reproducible in local and production online.
The dev-tools and network tab shows empty for status error, ip, connection_id etc.. its like the request is being terminated very soon.
Later I did a detailed debugging with safari and wireshark to really nail down the network flow of the problem:
I will explain what this means:
Frame 10824 – 18:52:03.939197: new connection initiated (SYN, ACK, ECE).
Frame 10831 – 18:52:04.061531: Client sends payload (preflight request) to the server.
Frame 10959 – 18:52:09.207686: Server responds with data to (preflight response) to the client.
Frame 10960 – 18:52:09.207856: Client acknowledges (ACK) receipt of the preflight response.
Frame 10961 – 18:52:09.212188: Client sends the actual request payload after preflight OK and then server replies with ACK.
Frame 11092 – 18:52:14.332951: Server sends the final payload (main request response) to the client.
Frame 11093 – 18:52:14.333093: captures the client acknowledging the final server response, which marks the successful completion of the main request.
Frame 11146 – 18:52:15.348433: [IMPORTANT] the client attempts to send another new request just one second later, which is extremely close to the keep-alive timeout of 1 second. The last message from the server was at 18:52:14.332951, meaning the connection’s keep-alive timeout is predicted to end around 18:52:15.332951 but it does not. The new request is sent at 18:52:15.348433, just microseconds after the predicted timeout. The request leaves before the client browser knows the connection is closed, but by the time it arrives at the server, the connection is already dead.
Frame 11147 – 18:52:15.356910: Shows the server finally sending the FIN,ACK to indicate the connection is closed. This happens slightly later than the predicted time, at microsecond 356910 compared to the expected 332951. The FIN,ACK corresponds to sequence 1193 from the ACK of the last data packet in frame 11093.
Conclusions:
The root cause is related to network handling issues, when the server runs in a setting of keep-alive behavior and keep-alive timeout (in this case 1s) and network timming issue with Safari reusing a closed connection without retrying. In this situation the browser should retry the request, which is what other browsers do and what Safari did before version 18, since it did not suffer from this issue.
This behaviour must differ from previous Safari versions (however i read all the public change logs and could not related the regression change).
Also is more pronounced with HTTP/1.1 connections due to how the keep-alive is handled.
When the server is configured with a short keep-alive timeout of 1 second, and requests are sent at roughly one-second intervals, such as API pings at fixed intervals or user actions like incrementing a cart quantity that trigger backend calls where the probability of failure is high.
This effect is even more apparent when the request uses a preflight with POST because it doubles the chance, although GET requests are also affected.
This was a just a test case, but in real production our monitoring tools started to detect a big increment with this network error at scale, many requests per day... which is very disrupting, because user actions are randomly being dropped when the user actions and timming happens to be just near a previous connection, where keep alive timeout kicks-in, but because the browser is not yet notified it re-uses the same connection, but by the time it arrived the server is a dead connection. The safari just does nothing about it, does not even retry, be it a pre-flight or not, it just gives this error.
Other browsers don't have this issue.
Thanks!