There are some age old conundrums that may never be solved, questions like dog or cat, Facebook or Instagram, burger or taco or most importantly of all, #whiteandgold or #blueandblack? And when it comes to measuring network performance, we often debate the concept of jitter or latency. But what exactly do these two things measure and which is best for providing accurate indication of network performance? Or more importantly what exactly should we measure to maintain a good end user experience?
Ok, so first of all, let’s start with some definitions, and we perhaps need to broaden the set of metrics a little to get a good understanding of the impact to network performance.
What are Latency and Delay?
Latency and delay are close relatives and are often used interchangeably. However strictly speaking they are different things. Delay can be defined as the length of time it takes for a packet (well more accurately a bit) to move from one host to another. Factors that affect delay include things like processing time, queueing delay, transmission delay and propagation time.
By contrast, latency is commonly defined as round trip time (or RTT), that is the time taken for a packet to be sent plus the time taken for a corresponding packet to return. In this definition, latency is in effect bidirectional delay. Strictly speaking from a network viewpoint, latency should not include processing time on the server but for practical purposes, most network based monitoring solutions include server processing time in latency calculations.
Again, latency or RTT is a very common performance metric as it is relatively easy to calculate, especially for protocols like TCP that provide a mechanism to acknowledge packets. By contrast, delay can be difficult, as the receiver typically has no way of determining the precise time the packet was sent.
What are Packet Inter Arrival Time and Jitter?
More recently, there have been tools with more sophisticated performance monitoring metrics, specifically tools that can calculate Packet Inter Arrival Time (IAT) and Jitter. Packets transmitted across a network will have differing delays, even if they traverse the same path. As mentioned above, delay is affected by such things as queueing and transmission delays in routers or switches or propagation delays in the network itself.
Because each packet will have a different delay value, the ‘gap’ between them as they arrive at the endpoint will vary. This gap, or the time between arriving packets, is called the inter arrival time or IAT.
Jitter is essentially a derivative of IAT: jitter is calculated as the inconsistency of the packet inter arrival times. In simple terms, if the packet IAT is consistent the jitter can be considered low, conversely if there is a wide variance of IAT values, the jitter is calculated as high. Many applications, especially real time communication applications like VoIP and video are very sensitive to jitter.
What Affects Jitter?
If packets take the same path across the network, we would expect that IAT should be reasonably consistent, after all they are taking the same ‘route’ and pretty much the same time. Turns out that this is not the case and IAT, and consequently jitter, can be greatly impacted by two significant network conditions:
- Packet Loss. If packets are lost across the network, the sending host needs to resend. Reliable transport protocols such as TCP have mechanisms to identify and address this condition, but the resulting side effect is larger IAT (and consequently jitter). Packet loss is a significant contributor to IAT and jitter.
- Network Congestion. Similar to packet loss, if the network is congested then sending hosts may not be able to transmit frames immediately but rather need to buffer and retry. At best this contributes to longer IAT times and at worst to packet loss as buffers overflow and packet are lost. Again, congestion is a major contributor to IAT times as well as overall end user experience.
There are of course many other factors that influence IAT and jitter, but for the sake of simplicity we have focused on the two major contributors of packet loss and congestion.
Calculating Packet Loss and Congestion
So if packet loss and congestion have such an important impact of network performance why don’t we just measure them and be done with it? Well the simple answer is that they are actually very difficult to measure.
It is very simple to measure things like byte and packet counts – as a packet arrives we simply count the number of bytes and add to the total. Measuring packet loss is somewhat more difficult, when we measure packet loss we are actually trying to count things that don’t exist, we are trying to measure a packet that was lost.
The situation is actually worse than it seems, not only are we trying to count a packet that doesn’t exist anymore, transport protocols such as TCP actually retransmit when packets are dropped, so whilst the first once may be lost, a subsequent replica one arrives in its place, albeit a little later than expected.
From a monitoring perspective it can be very difficult to identify a retransmitted frame over an originating packet. That is, it is very difficult to accurately measure packet loss. In the same way, accurate measurement of congestion poses similar difficulties.
Given that IAT is directly impacted by packet loss and congestion, by monitoring IAT (and jitter) we can gain a very valuable and accurate picture of network congestion and packet loss without having to monitor either directly. In fact, IAT and jitter provide a much richer source of data to profile end user experience over traditional packet loss or latency techniques.
But Isn’t Latency the Gold Standard for Network Performance?
Intuitively, it makes sense to think that measuring latency (or RTT) would be the best way to monitor network performance and end user experience. After all, latency measures round trip time which in essence mimics the user experience of hitting ‘enter’ and having data returned to the screen. Why worry about more esoteric metrics such as IAT and jitter at all? The intuitive belief that RTT is the best way to measure end user experience has also been perpetuated by network monitoring vendors whose only ‘delay’ metric is in fact RTT and so it is presented as the metric of choice.
Turns out that whilst useful, when compared with IAT and jitter, RTT falls short in two main areas:
- Firstly, many applications are tolerant of long RTT but particularly intolerant to IAT variation or jitter. In particular, RTT is not effective in measuring and maintaining real time streaming applications such as VoIP and video. These applications need more sophisticated metrics such as IAT and jitter.
- Many vendors calculate RTT as the time taken to establish the initial TCP three way handshake (SYN-SYNACK-ACK). That is, the RTT is measured at the start of the flow and typically never recalculated. The nature of the TCP protocol with its sliding window algorithm (among other things) makes calculating RTT mid-flow difficult. Calculating RTT based on the initial flow setup time is like calculating your commute time on how long it takes to back out of the garage, it may give some indication but doesn’t take into account all the red lights on the way to the office.
For these reasons, we are seeing many monitoring vendors start to move away from pure RTT measurement. By incorporating delay metrics such as IAT and Jitter – which are calculated for every packet, and hence for the life of the flow – we can start to build a much richer picture of end user experience.
Finally, unlike RTT, more advanced metrics like IAT and jitter can infer complex network conditions such as packet loss and congestion providing deeper insight into network performance and end user experience.