This advice is for linux based servers.
Run modern kernels. Linux kernels after 5.0 switched to EDF scheduling which helps.
Make sure BQL is working for your ethernet device.
Kill pfifo_fast as your underlying qdisc everywhere. If your server workload is primarily tcp, switch the qdisc to sch_fq, which not only successfully multiplexes up to millions of flows, but also applies packet pacing which is a huge win. However, if your server (this includes whatever underlies your virtual machine) is acting as a "router" for vpns, or quic udp, tunnels of all sorts, presently fq_codel is the better (and often nowadays the default) choice. Monitor sch_fq with ebpf, or the fq_codel statistics. If you are consistently seeing drops or marks the server itself is queuing too much internally and it's time to get a server with more cpu and network bandwidth.
MEASURE. Take packet captures of your services from various vantage points, especially slow networks. Instrument your server side applications to monitor TCP_INFO statistics. Report on loss, marking, retransmits, and (especially) rtt. This measures how you are doing in the network path. ALSO instrument the proxies you are using - it's not unheard of to end up with megabyte of buffering between the server application and the proxy. Also use TCP_NOTSENT_LOWAT as a socket option where you can.
Evaluate tcp BBRv1. For some workloads - notably long running sender side limited youtube-like traffic - it's the best thing out there. For single flow streaming up or downloads, also. For sharded web traffic, it has a tendency to over-compete with itself. YMMV. See item 3.
Many cloud services today make no guarantees as to the actual performance of the underlying network. If you have a latency sensitive service (like videoconferencing) dedicated cpu instances or bare metal are better. You can also artificially limit your outgoing bandwidth via sch_cake easily, below what the cloud provider can reliably provide.
MEASURE MORE. There are some good tools for generating workloads arriving - networkQuality should be out soon for OSX, flent and irtt are common also.
check out the mailing lists at lists.bufferbloat.net
Topic:
App & System Services
SubTopic:
Networking
Tags: