Network Latency is critical for almost all kinds of networks and applications running on them. One of the key challenges is is finding the source of latency issues in an active network.
A team of researchers have found a solution, but it will require a new generation of networking hardware.
Issues with latency and dropped packets can be a overkill for network’s performance and it doesnt end there, it can cripple applications like real-time communications, scientific computing, and high-frequency trading.
What is the Challenge?
The biggest challenge that is seen by network engineers is that it can be extremely difficult to diagnose. The reason being — they may not appear under test conditions, and real-time monitoring of performance may require dedicated hardware or procedures that actually cut into the usable bandwidth.
A team of academic researchers have come up with what they think is a solution, one that could sample the transmission of a collection of representative packets in real time, in a manner that’s inexpensive in terms of both hardware and networking resources.
The researchers were supported by the National Science Foundation and Cisco. They presented their work at the SIGCOMM meeting on Thursday; they’ve placed a paper describing it in detail. The paper describes how the system—which they term a Lossy Difference Aggregator—would operate in principle, describe some simulations of its performance, and suggest how it might be implemented. Unfortunately, it appears that it would require an extension to an IEEE standard that’s only been adopted recently, as well as dedicated processing hardware.
Doing real-time monitoring, ignoring implementation details, is simple — Mark each network packet with a timestamp when it leaves a interface of hardware, and then compare that to the time at which it’s received or response is received. The latter calculates RTT, round trip time. The first one is better way of doing it but comes with a challenge — The challenge is communicating these timestamps between the hardware. Each has to be matched with a specific packet, which can be computationally intensive, and the two pieces of hardware have to transfer the data in order to make time comparisons. It’s possible to cut down on the work by choosing a representative sample of packets for a given time period, but coordinating the choice of packets across hardware can be a challenge.
The Lossy Difference Aggregator tries to handle this scenario well. The “lossy” part of its name implies – a way of selecting a representative subset of packets to track. As each packet comes into the router, it’s assigned a hash value. That value is then used to assign it a position in a data structure that has an arbitrary number of columns, termed “banks,” and slots within each column. Each entry contains the packet’s hash value and a timestamp.
Let’s take an example to make this easy. Say a structure limited to 1024 entries could contain a single bank with 1024 entries, or four banks with 256 entries each. The hash value is used to place the packet in a specific location in the structure. So, in the authors’ example, a hash with three leading zeros might assign it to bank 1, while seven leading zeros would place it in bank 2. A separate function can be used to assign it a row within the bank. Anything that doesn’t find a place in this structure is discarded and is not considered in the calculations.
After a set sampling time, the sending hardware transmits this structure to the equipment that should be receiving it, which has been building a similar structure out of the same packets. At this point, the actual performance data should be simple: lost packets can be identified as unfilled slots, and the time stamps can be used to calculate various latency figures. Because it’s so simple, the authors calculate that implementing it would require adding only an additional one percent to the transistor count of even the low-end ASICs currently in use. The data structure itself would require only 72Kbits of control traffic a second.
Mathematically, the authors demonstrate that the system would provide a statistically accurate measure of both the latency and its standard deviation. They also created a simulator, which they used to demonstrate its accuracy. Even under really bad conditions, like a 20 % packet loss rate, its estimates of latency would be within tolerance of 4 %. If you’re losing 20 percent of your packets, latency’s probably the least of your concerns.
Comparisons with a method of actively monitoring network performance showed that the Lossy Difference Aggregator provided more accurate latency measures.
Of course, network hardware will need to recognize this traffic as distinct from the packets it’s supposed to be routing. The authors suggest adding an extension to the IEEE 1588 standard, which is used for synchronizing the clocks of network equipment. Since accurate comparisons of time stamps require clock synchronization anyway, this seems like a reasonable suggestion.
The remaining challenge involves actually putting an implementation into hardware. The authors, perhaps due to their interactions with their sponsors at Cisco, seem especially attuned to the realities of the networking hardware world. The power of embedded processors is starting to commoditize and featurize the networking hardware market in the same way that the power of desktop processors has transformed the PC market. The specialized real-term monitoring hardware could represent a value-added proposition for vendors. Its first likely customers—high frequency traders and high performance computing centers—are also among the least price-sensitive.
The authors point out that the data generated by their method can provide value well before it’s fully deployed. Putting this hardware on either side of major network bottlenecks could be extremely useful, and it might be possible to arrange the protocol so that it operates across hardware that’s separated by a number of intervening devices. As the intervening hardware is replaced, the data returned will simply become finer-grained.
If and when this methodology is used to detect and optimize latency in slow networks like VPN — It could bring a revolution.