How to Measure Latency in the Cloud

When it comes to measuring applications performance across our local enterprise network, we think we know what network latency is and how to calculate it. But when we move these apps into the cloud there are a lot of subtleties that can impact latency in ways that we don’t immediately realize. Let’s examine what latency means for deploying cloud applications and how you, as a developer, can keep better track of it. The goal is to ensure the best performance of your cloud-based applications.

For years, latency has bedeviled applications developers who have taken for granted that packets could easily traverse a local network with minimal delays. It didn’t take long to realize the folly of this course of action: when it came time to deploy these applications across a wide-area network, many apps broke down because of networking delays of tens or hundreds of milliseconds. But these lessons learned decades ago have been forgotten. Today we have a new generation of developers and networking engineers who have to understand a new set of latency delays across the Internet.

Many of the current generation of developers have never experienced anything other than high-speed Internet access and assume that it has always been that way. This tends to make for some sloppy coding decisions, creating unnecessary back-and-forth computer communications that introduce long latency times in running their apps. As we’ll see, now when everything is moving to the cloud, latency becomes even more important than before.

Trying to define cloud latency isn’t easy.

In the days before the ubiquitous Internet, understanding latency was relatively simple. You looked at the number of router hops between you and your application, and the delays that the packets took to get from source to destination. For the most part, your corporation owned all of the intervening routers and the network delays remained fairly consistent and predictable.

Those days seem so quaint now, like when we look at one of the original DOS-based IBM dual-floppy drive PCs. With today’s cloud applications, the latency calculations aren’t so easy.

First off, the endpoints aren’t fixed. The users of our apps can be anywhere in the world, sitting on anything ranging from a high-speed fiber line in a densely served urban area to a satellite uplink in the middle of Africa, and everywhere in between. And the apps themselves can be located pretty much anywhere too: that is the beauty and freedom of the cloud. But this freedom comes at a price. The resulting latencies can be horrific and huge.

We also need to consider the location of the ultimate end users and the networks that connect them to the destination networks. We also need to understand how the cloud infrastructure is configured, and where the particular pieces of network, applications, servers, and storage fabrics are deployed and how they are connected.

And it also depends whom the ultimate “owners” and “users” of our apps are too. Latency can be important for the end-user experience of an enterprise’s apps. But if you are a service provider or a system integrator, you will want to control the network and deliver the appropriate service levels to your customers, and that means also controlling the expected latencies as part of these agreements.

One solution: triage your apps.

While reducing latency is desirable, not every app will need the lowest latencies. Applications such as such as financial services, video streaming, more complex Web/database services, backups. and 3-D engineering modeling are in this category. But apps such as email, analytics and some kinds of document management aren’t as demanding.

Latency has had three traditional metrics.

In the past, latency has three different measures: roundtrip time (RTT), traceroutes, and endpoint computational speed. Each of these is important to measure in understanding the true effect of latency, and only after understanding each of these metrics can you get the full picture.

RTT measures the time it takes one packet to transit the Internet from source to destination and back to the source, or the time it takes for an initial server connection. This is useful in interactive applications, and also in examining app-to-app situations, such as measuring the way a Web server and a database server interact and exchange data.

Traceroute is the name of a popular command that examines the individual hops or network routers that a packet takes to go from one place to another. Each hop can also introduce more or less latency. The path with the fewest and quickest hops may or may not correspond to what we would commonly think of as geographically the shortest link. For example, the lowest latency and fastest path between a computer in Singapore and one in Sydney Australia might go through San Francisco.

Finally there is the speed of the computers at the core of the application: their configuration will determine how quickly they can process the data. While this seems simple, it can be difficult to calculate once we start using cloud-based compute servers.

First complicating factor: distributed computing.

As we said earlier, in the days when everything was contained inside an enterprise data center, it was easier to locate bottlenecks because the enterprise owned the entire infrastructure from source to destination. But with the rise of Big Data apps built using tools such as Hadoop and R (the major open source statistics language used for data analytics), the nature of applications is changing and a lot more distributed. These apps employ tens or even thousands of compute servers that may be located all over the world, and have varying degrees of latency with each of their Internet connections. And depending on when these apps are running, the latencies can be better or worse as other Internet traffic waxes or wanes to compete for the same infrastructure and bandwidth.

Virtualization adds another layer of complexity, too.

Today’s modern data center isn’t just a bunch of rack-mounted servers but a complex web of hypervisors running dozens of virtual machines. This introduces yet another layer of complexity, since the virtualized network infrastructure can introduce its own series of packet delays before any data even leaves the rack itself!

Understand Quality of Service and what traffic is prioritized.

In the pre-cloud days, Service Level Agreements (SLAs)and Quality of Service were created to prioritize traffic and to make sure that latency-sensitive apps would have the network resources to run properly. These agreements were also put in place to ensure minimal downtime by penalizing the ISPs and other vendors who supplied the bandwidth and the computing resources.

But with the rise of more cloud and virtualized services, it isn’t so cut and dried. For one thing, the older SLAs typically didn’t differentiate between an outage in a server, a network card, a piece of the storage infrastructure, or a security exploit. But these different pieces are part and parcel to the smooth and continuous operation of any cloud infrastructure.

An example of this is a back office application that produces daily summary charts about a particular business process. If one of the many components of this app is down briefly, probably no one would notice nor really care, as long as the reports are produced eventually. We’ve put together the chart below that summarizes our thoughts on how critical particular apps are and under what circumstances they should be prioritized for particular SLAs.

This means that your SLAs need to handle a variety of situations. You don’t want to enforce (nor pay for) the same service levels on your test/dev cloud that you would on a production cloud.

Reducing latency has several dimensions.

So now that we have a better understanding of some of the complicating factors, the next step is to start to examine how you can reduce latencies in particular segments of your computing infrastructure. In a paper for Arista Networks, they mention four broad areas of focus:
• Reduce latency of each network node
• Reduce number of network nodes needed to traverse from one stage to another
• Eliminate network congestion
• Reduce transport protocol latency

Of course, they sell some of the gear that can help you reduce network switch transit times or cut network congestion, but still it is worth examining these more mundane pieces of your cloud provider’s network infrastructure (if you can) to see where you can start to apply some of these savings.

Can content delivery networks (CDNs) help?

Not much. CDNs are designed mostly for delivering static content to a broad collection of distributed end users. One of the largest CDNs is Akamai, which is based on 95,000 servers installed in 1,900 ISPs around the world. But many cloud applications have a different type of treatment, and in many cases won’t get much of a latency improvement from a CDN because they aren’t using static pieces of content. Nevertheless, CDNs are expanding their capabilities and trying to help reduce latencies by caching more than just static HTML pages. Certainly, it is worth investigating whether a CDN partner can improve your particular situation.

Conclusion

As we can see, cloud latency isn’t just about doing traceroutes and reducing router hops. It has several dimensions and complicating factors. Hopefully, we have given you some food for thought and provided some direction so that you can explore some of the specific issues with measuring and reducing latencies for your own cloud apps, along with some ideas on how you can better architect your own apps and networks.

Web Informant

David Strom's musings on technology