"How NOT to Measure Latency" by Gil Tene

0:05 - 0:08

Hi everyone, I'm Gil Tene.
0:08 - 0:12

I'm going to be talking about this subject
that I call "How NOT to Measure Latency".
0:14 - 0:17

It's a subject that I've been talking
about now for 3 years or so.
0:17 - 0:20

I keep the title and change all
the slides every time.
0:21 - 0:22

A bunch of this stuff is new.
0:22 - 0:28

So if you've seen any of my previous "How NOT to",
you'll see only some things that are common.
0:28 - 0:31

A nickname for the subject is this...
0:31 - 0:36

Because I often will get that reaction
from some people in the audience.
0:37 - 0:39

Ever since I've told people that it's a
nickname,
0:39 - 0:42

They feel free to actually exclaim,
"Oh S@%#!".
0:42 - 0:44

And feel free to do that here in this talk.
0:45 - 0:47

I'll prompt you in a couple of places
where it is natural.
0:48 - 0:51

But if just have the urge, go ahead.
0:51 - 0:53

Ummm...
0:53 - 0:55

So just a tiny bit about me.
0:55 - 0:57

I am the co-founder of Azul Systems.
0:58 - 1:00

I play around with garbage collection a lot.
1:00 - 1:03

Here is some evidence of me playing around
with garbage collection in my kitchen.
1:03 - 1:05

That's a trash compactor.
1:05 - 1:10

The compaction function wasn't working right,
so I had to fix it.
1:10 - 1:17

I thought it'd be funny to take a picture
with a book.
Not Synced

I've also built a lot of things.
Not Synced

I've been playing with computers since
the early 80's.
Not Synced

I've built hardware.
Not Synced

I've helped design chips.
Not Synced

I've built software at many
different levels.
Not Synced

Operating systems, drivers...
JVM's obviously.
Not Synced

And lots of big systems at the system level.
Not Synced

Built our own app server in the late 90's
because web logic wasn't around yet.
Not Synced

So, I've made a lot of mistakes,
and I've learned from a few of them.
Not Synced

This is actually a combination of a bunch
of those mistakes looking at latency.
Not Synced

I do have this hobby of depressing people
by pulling the wool up from over your eyes,
Not Synced

and this is what this talk is about.
Not Synced

So, I need to give you a choice right here.
Not Synced

There's the door.
Not Synced

You can take the blue pill,
and you can leave.
Not Synced

Tomorrow you can keep believing whatever
it is you want to believe.
Not Synced

But if you stay here and take the red pill,
I will show you a glimpse of how
Not Synced

far down the rabbit hole goes,
and it will never be the same again.
Not Synced

Let's talk about latency.
Not Synced

And when I say latency, I'm talking about
latency response time, any of those things
Not Synced

where you measure time from 'here to here',
and you're interested in how long it took.
Not Synced

We do this all the time, but I see a lot
of mish-mash in how people
Not Synced

treat the data, or think about it.
Not Synced

Latency is basically the time it took
something to happen once.
Not Synced

That one time, how long did it take.
Not Synced

And when we measure stuff, like we did
a million operations in the last hour,
Not Synced

we have a million latencies. Not one,
we have a million of them.
Not Synced

Our actual goal is to figure out how to
describe that million.
Not Synced

How did the million behave?
Not Synced

For example, 'they're all really good, and
they're all exactly the same', would be a
Not Synced

behavior that you will never see,
but that would be a great behavior.
Not Synced

So we need to talk about how things behave,
communicate, think, evaluate,
Not Synced

set requirements for, talk to other people,
but these are all common things around that.
Not Synced

To do that, we have to describe the
distribution, the set, the behavior,
Not Synced

but not the one.
Not Synced

For example, the behavior that says "the
the common case was x" is a piece of
Not Synced

information about the behavior,
but it's a tiny sliver.
Not Synced

Usually the least relevant one.
Not Synced

Well, there's some less relevant ones,
but not a strongly relevant one,
Not Synced

and one that people often focus on.
Not Synced

To take a look at what we actually do
with this stuff, almost on a daily basis,
Not Synced

this is a snapshot from a monitoring system.
Not Synced

A small dashboard on a big screen
in a monitoring system.
Not Synced

Where you're watching the response time of
a system over time.
Not Synced

This is a two hour window.
Not Synced

These lines that are 95th percentile,
90, 75, 50, and 25th percentiles,
Not Synced

you can look at how they behave over time.
Not Synced

We're a small audience here, if you look at
this picture, what draws your eye?
Not Synced

What do you want to go investigate here
or pay attention to ?
Not Synced

It's the big red spike there, right?
Not Synced

So we could look at the red spike,
cause it's different,
Not Synced

and say, "Woah, the 95th percentile shot up
here. And look, the 90th percentile
Not Synced

shot up at about the same time.
Not Synced

The rest of them didn't shoot up,
so maybe something happened here
Not Synced

that affected that much, I should probably
pay attention to it
Not Synced

because it's a monitoring system, and
I like things to be calm."
Not Synced

You could go investigate the why.
Not Synced

At this point, I've managed to waste
about 90 seconds of your life,
Not Synced

looking at a completely meaningless chart,
which unfortunately you do
Not Synced

every day, all the time.
Not Synced

This chart is the chart you want to show
somebody if you want to
Not Synced

hide the truth from them.
Not Synced

If you want to pull the wool
over their eyes.
Not Synced

This is the chart of the good stuff.
Not Synced

What's not on this chart?
Not Synced

The 5% worse things that happened during
this two hours.
Not Synced

They're not here.
Not Synced

This is only the good things that happened
during the things.
Not Synced

And to get this spike, that 5% had to be
so bad that it even pulled
Not Synced

the 95th percentile all up.
Not Synced

There is zero information here at all about
what happened bad during this two hours,
Not Synced

which makes it a bad fit for
a monitoring system.
Not Synced

It's a really good thing for
a marketing system.
Not Synced

It's a great way to get the bonus from your boss, even though you didn't do the work.
Not Synced

If you want to learn how to do that,
we can do another talk about that.
Not Synced

But this is not a good way to look at latency.
Not Synced

It's the opposite of good.
Not Synced

Unfortunately, this is one of the most
common tools used for
Not Synced

server monitoring on earth right now.
Not Synced

That's where the snapshot is from,
and this is what people look at.
Not Synced

I find this chart to be a goldmine
of information.
Not Synced

When I first showed it in another talk
like this, I had this really cool experience.
Not Synced

Somebody came up to me and said, "Hey,
as I was sitting here, I was texting one
Not Synced

of our guys, and he was saying,
Not Synced

'look, we have this issue with
our 95th percentile'."
Not Synced

And I got this chart from him!
Not Synced

So I went and said, "Hey, what does the
rest of the spectrum look like?"
Not Synced

This is the actual chart they got.
Not Synced

And when they look at the rest of the
spectrum, it looked like that.
Not Synced

That's what was hiding.
Not Synced

I noticed the scales are a little different.
Not Synced

That yellow line is that yellow line.
Not Synced

So that's a much more representative number.
Not Synced

Is it? Is that good enough?
Not Synced

That's the 99th percentile.
Not Synced

We still have another 1% of really bad
stuff that's hiding above the blue line.
Not Synced

I wonder how big that is?
Not Synced

I don't know because he didn't have the data.
Not Synced

So a common problem that we have is that
we only plot what's convenient.
Not Synced

We only plot what gives us nice,
colorful graphs.
Not Synced

And often, when we have to choose between
the stuff that hides the rest of the data,
Not Synced

and the stuff that is noise, we choose
the noise to display.
Not Synced

I like to rant about latency.
Not Synced

This is from a blog that I don't write
enough in, but the format for it was simple.
Not Synced

I tweet a single tweet about latency,
latency tip of the day,
Not Synced

and then I rant about my own tweet.
Not Synced

As an example, this chart is a goldmine
of information because it has so many
Not Synced

different things that are wrong in it,
but we won't get into all of them.
Not Synced

You can read it online.
Not Synced

Anyway, this is one to take away from
what we just said.
Not Synced

If you are not measuring and showing the
maximum value, what is it you are hiding?
Not Synced

And from whom?
Not Synced

If you're job is to hide the truth from
others, this is a good way to do it.
Not Synced

But if actually are interested in what's
going on, the number one indicator
Not Synced

you should never get rid of is the
maximum value.
Not Synced

That is not noise, that is the signal.
Not Synced

The rest of it is noise.
Not Synced

Okay, let's look at this chart for some
more cool stuff.
Not Synced

I'm gonna zoom in to a small part
of the chart, and ask you what that means.
Not Synced

What is the average of the 95th percentile
over 2 hours mean?
Not Synced

What is the math that does that?
Not Synced

What does it do?
Not Synced

Let's look at that, and I'll give you
an example with another percentile.
Not Synced

The 100th percentile. The max, right?
Not Synced

Let's take a data set.
Not Synced

Suppose this was the maximum every minute
for 15 minutes.
Not Synced

What does it mean to say that the average
max over the last 15 minutes was 42?
Not Synced

I specifically chose the data to
make that happen.
Not Synced

It's a meaningless statement.
Not Synced

It's a completely meaningless statement.
Not Synced

But when you see 95th percentile,
average 184, you think that the 95th
Not Synced

percentile for the last two hours
was around 184.
Not Synced

It makes you think that.
Not Synced

Putting this on a piece of paper is not
just noise and irrelevant,
Not Synced

it's a way to mislead people.
Not Synced

It's a way to mislead yourself, because
you'll start to believe your own mistruths.
Not Synced

This is true for any percentile.
Not Synced

There is no percentile that you could do
this math on.
Not Synced

Another tip, you cannot average percentiles.
Not Synced

That math doesn't happen.
Not Synced

But percentiles do matter. You really
want to know about them.
Not Synced

And a common misperception is that we want
to look at the main part of the spectrum,
Not Synced

not those outliers and perfection stuff.
Not Synced

Only people that actually bet their house
every day, or the bank on it,
Not Synced

need to know about the "five-nine's",
and all those.
Not Synced

The 99th percentile is a pretty
good number.
Not Synced

Is 99% really rare?
Not Synced

Let's look at some stuff, because we can
ask questions like, "If I were looking
Not Synced

at a webpage, what is the chance of me
hitting the 99th percentile?"
Not Synced

Of things like this: a search engine node,
or a key value store,
Not Synced

or a database, or a CDN, right?
Not Synced

Because they will report their 99th percentile.
Not Synced

They won't tell you anything above that,
but how many of the
Not Synced

webpages that we go to
actually experience this?
Not Synced

You want to say 1%, right?
Not Synced

Well, I went to some webpages and I counted
how many "http" requests were generated
Not Synced

by one click into that webpage,
and here are the numbers.
Not Synced

I ended that about a year ago.
Not Synced

They've probably gone up since then.
Not Synced

Now that translates into this math.
Not Synced

This is the likelihood of one click seeing
the 99th percentile.
Not Synced

And the only page where that is less than
50% is the clean google search page.
Not Synced

Where only a quarter will see the
99th percentile.
Not Synced

The 99th percentile is the thing that most
of your webpages will see.
Not Synced

Most of them will be there.
Not Synced

Now, we could look at other things.
Not Synced

We can pick which things to focus on.
Not Synced

Let's say I had to pick between the 95th
percentile, and the three 9's (99.9%).
Not Synced

The three 9's is way into perfection mode
for most people, or they think.
Not Synced

Which one of those represents our
community better?
Not Synced

Our population?
Not Synced

Our users?
Not Synced

Our experience?
Not Synced

Let's run a hypothetical.
Not Synced

Suppose we don't have that many pages,
and that many resources like we said before.
Not Synced

We'll be much more conservative.
Not Synced

A user session will only go through five
clicks, and each click will only bring up
Not Synced

up to 40 things.
Not Synced

A lot less, and they're all as clean
as the google page.
Not Synced

How many of the users will not experience
something worse than the 95th percentile?
Not Synced

Because that's what the 95th percentile
is good for, the people who see that.
Not Synced

Anybody above that, is that.
Not Synced

What are the chances of not seeing it?
Not Synced

That's an interesting number.
Not Synced

So you're watching a number that is
relevant to 0.003% of your users.
Not Synced

99.997% of your users are going to
see worse than this number.
Not Synced

Why are you looking at it?
Not Synced

Why are you spending time
thinking about it?
Not Synced

In reverse, we could say how many people
are going to see something
Not Synced

worse than the three 9's (99.9%)?
Not Synced

That's going to be 18%.
Not Synced

In reverse, 82% of the people will see
the three 9's (99.9%) or better.
Not Synced

That's a slightly better representation.
Not Synced

Probably not good enough either.
Not Synced

We could look at some more math with them,
same kind of scenario.
Not Synced

What percentile of http response time
will be the thing that 95%
Not Synced

of people experience in this scenario?
Not Synced

It's the 99.97 percentile that 95%
of people see.
Not Synced

And if you want to know what 99%
of the people see,
Not Synced

that's four and a half 9's (99.995%).
Not Synced

You want to know that number from Akamai
if you want to predict what 1%
Not Synced

of your users are going to experience.
Not Synced

When you know the 99th percentile,
you kind of know a tiny bit.
Not Synced

So here's another tip.
Not Synced

And this is not an exaggeration,
by the way.
Not Synced

The median, which is a much smaller
percentile, has that minuscule a chance
Not Synced

of ever being the number that
anybody experiences.
Not Synced

This is the chance of getting worse
than the median.
Not Synced

Which makes the median an irrelevant
number to look at.
Not Synced

Unfortunately, it's probably the most
common one looked at.
Not Synced

When people say "the typical",
they look at the thing that
Not Synced

everything will be worse than.
Not Synced

Okay, I'm sorry about that part.
Not Synced

We'll do some other parts.
Not Synced

Now, why is it that when we look at these
monitoring systems, we don't see
Not Synced

data with a lot of 9's?
Not Synced

Why do we stop at the
90, 95, 99th percentile?
Not Synced

Why don't we look further?
Not Synced

Now, some of it is because people think,
"Well that's perfection, I don't need it."
Not Synced

The other part is that it's hard.
Not Synced

It's hard because you can't
average percentiles.
Not Synced

We already talked about that.
Not Synced

But you also can't derive your
five 9's (99.999%) out of a lot
Not Synced

of 10 second samples of percentiles.
Not Synced

And the reason for that is, "Hey, in 10
seconds, maybe I only had 1,000 things."
Not Synced

I could take all the 10 seconds in the
world, there's no way to say what the
Not Synced

hour five 9's (99.999%) were, what the
minutes five 9's were
Not Synced

if I'm collecting just this data.
Not Synced

And unfortunately, the data being collected
and reported to the back ends of monitoring
Not Synced

is usually summarized at a second,
5 seconds, 10 seconds, etc.
Not Synced

Basically throwing away all the good data,
and leaving you with absolutely no way
Not Synced

to compute large 9's for longer
periods of time.
Not Synced

So, this is where you might want to look
at HDR Histogram.
Not Synced

It's an open source thing I've created
a few years ago.
Not Synced

I did it in Java, and know there's a
C, C-Sharp, Python, Erlang,
Not Synced

and Go ports of this that I didn't create.
Not Synced

And it lets you actually get an entire
percentile spectrum.
Not Synced

Some of you here I know are
already using it.
Not Synced

And you can look at all the percentiles.
Not Synced

Any number of 9's that's in the data, if
you just keep it right and report it right,
Not Synced

it's got a log format, you can
store things forever.
Not Synced

Well, for a long time.
Not Synced

Okay, so it lets you have nice things.
Not Synced

Enough for that advertisement.
Not Synced

Now, latency... Well, I think this is
slightly out of order.
Not Synced

Yeah, sorry.
Not Synced

This is the red/blue pill part, so I warn
you, this is your last chance.
Not Synced

There's a problem I call the
coordinated omission problem.
Not Synced

The coordinated omission problem is
basically a conspiracy.
Not Synced

It's a conspiracy that we're all part of.
Not Synced

I don't think anybody actually meant
to do it, but once I've noticed it,
Not Synced

everywhere I look, there it is.
Not Synced

Now, I've been using a specific way of
showing you numbers so far.
Not Synced

Has anybody here noticed how
I spell percentile?
Not Synced

(Audience Member): "You put lie at the
end of the percent sign."
Not Synced

Yeah, good.
Not Synced

So coordinated omission problem is the
"lie" in %lies.
Not Synced

And this is how it works.
Not Synced

One common way to do this is
to use a load generator.
Not Synced

Pretty much all load generator's
have this problem.
Not Synced

There are two that I know of that don't.
Not Synced

What you do with a load generator,
is you test.
Not Synced

You issue requests, or send packets.
Not Synced

And you measure how long something took.
Not Synced

And as long as the numbers go right,
measure them, put them in a bucket,
Not Synced

study them later, and get your
percentiles from it.
Not Synced

But what if the thing that you are
measuring took longer than the time
Not Synced

it would've taken until you send
the next thing?
Not Synced

You're supposed to send something
every second,
Not Synced

but this one took a second and a half.
Not Synced

Well you've got to wait before
you send the next one.
Not Synced

You just avoided measuring something
when the system was problematic.
Not Synced

You've coordinated with it.
Not Synced

You weren't looking at it then.
Not Synced

That's common scenario A: You've backed
off, and avoided measuring when it was bad.
Not Synced

Another way, is you measure inside your code.
Not Synced

We all do this. We all have to do this,
Not Synced

where we measure time, do something,
then measure time.
Not Synced

The delta between them is how long it took.
Not Synced

We can then put it in a stats bucket,
and then do the percentiles in that.
Not Synced

Unfortunately, if the system freezes right
here, for any reason,
Not Synced

an interrupted contact switch,
Not Synced

a cash buffer flushed to disk,
Not Synced

a garbage collection,
Not Synced

a re-indexing of your database,
this is a database.
Not Synced

This is Cassandra by the way,
measuring itself.
Not Synced

In any of the above, then you will
have one bad report
Not Synced

while 10,000 things are waiting in line.
Not Synced

And when they come in, they will look
really, really good.
Not Synced

Even though each one of them has had
a really bad experience.
Not Synced

It can even get worse, where maybe the
freeze happened outside the timing,
Not Synced

and you won't even know there was a freeze.
Not Synced

Now these are examples of admitting data
that is bad on a very selective basis.
Not Synced

It's not random sampling.
Not Synced

It's, "I don't like bad data",
Not Synced

or "I couldn't handle it",
Not Synced

or "I don't know about it",
Not Synced

so we'll just talk about the good.
Not Synced

What does that do to your data?
Not Synced

Because it often makes people feel like,
Not Synced

"Okay, yeah, I understand,
but it's a little bit of noise."
Not Synced

Let's run some hypotheticals,
and I'll show you some real numbers.
Not Synced

Imagine a perfect system.
Not Synced

It's doing 100 requests a second,
at exactly a millisecond each.
Not Synced

But we go and freeze the system,
after 100 seconds of perfect operations
Not Synced

for 100 seconds, and then repeat.
Not Synced

Now, I'm going to describe how the system
behaves in terms that should mean something,
Not Synced

and then we'll measure it.
Not Synced

If we actually wanted to describe the
system,
Not Synced

on the left we have an average
of one millisecond by the finish,
Not Synced

and on the right we have an
average of 50 seconds.
Not Synced

Why 50? Because if I randomly came in
in that 100 seconds,
Not Synced

I'll get anything from 0 to 100
with even distribution.
Not Synced

The overall average over 200 seconds
is 25 seconds.
Not Synced

If I just came in here and said,
"Surprise, how long did this take?"
Not Synced

On average, it will be 25.
Not Synced

I can also do the percentiles.
Not Synced

50th percentile will be really good,
and then it'll get really bad.
Not Synced

The four 9's is terrible.
Not Synced

This is a fair honest description of
this system if this is what it did.
Not Synced

And you can make the system do that.
Not Synced

That's what Control Z is good for.
Not Synced

You can make any of your systems do that.
Not Synced

Now lets go measure this system with
a load generator,
Not Synced

or with a monitoring system.
Not Synced

The common ones.
Not Synced

The ones everybody does.
Not Synced

On the left, we're going to get 10,000
results of one millisecond each.
Not Synced

Great.
Not Synced

And we're going to get one result of
100 seconds.
Not Synced

Wow, really big response time.
Not Synced

This is our data.
Not Synced

This is OUR data.
Not Synced

So now you go do math with it.
Not Synced

The average of that is 10.9 milliseconds.
Not Synced

A little less than 25 seconds.
Not Synced

And here are the percentiles.
Not Synced

Your load generator monitoring system
will tell you that this system is perfect.
Not Synced

You could go to production with it.
Not Synced

You like what you see.
Not Synced

Look at that, four 9's.
Not Synced

It is lying to you.
Not Synced

To your face.
Not Synced

And you can catch it doing that with a
Control Z-Test.
Not Synced

But people tend to not want to do that,
because then what are they going to do?
Not Synced

If you just do that test, and calibrate
your system, and you find it
Not Synced

telling you that, about this, the next
step should be to throw all the numbers away.
Not Synced

Don't believe anything else it says.
Not Synced

If it lies this big, what else did it do?
Not Synced

Don't waste your time on numbers
from uncalibrated systems.
Not Synced

Now the problem here was, that if you
want to measure the system,
Not Synced

you have to measure at random rates,
or same rates.
Not Synced

If you measure 10,000 things in 100 seconds,
there should be another 10,000 things here.
Not Synced

If you measure them, you would've gotten
all the right numbers.
Not Synced

Coordinated omission is the simple act of
erasing all that bad stuff.
Not Synced

The conspiracy here is that we all do it
without meaning to.
Not Synced

I don't know who put that in our systems,
but it happens to all of us .
Not Synced

Now, I often get people saying,
"Okay, I get it. All the numbers are wrong,
Not Synced

but at least for my job where I tune
performance, and I try to make things
Not Synced

faster, I can use the numbers to figure
out if I'm going in the right direction."
Not Synced

Is it better, or is it worse? Let me
dispel that for you for a second.
Not Synced

Suppose I went and took this system,
and improved it dramatically.
Not Synced

Rather than freezing for 100 seconds,
it will now answer every question.
Not Synced

It'll take a little longer,
5 milliseconds instead of one,
Not Synced

but it's much better than freezing, right?
Not Synced

So let's measure that system that we spent
weeks and weeks improving,
Not Synced

and see if it's better.
Not Synced

That's the data.
Not Synced

If we do the percentiles, it'll tell us
that we just really hurt the four 9's.
Not Synced

We made it go 5 times worse than before.
Not Synced

We should revert this change, go back to
that much better system we had before.
Not Synced

So this is just to make sure that you
don't think that you can have
Not Synced

any intuition based on any of these numbers.
Not Synced

They go backwards sometimes.
Not Synced

You don't know which way is good or bad.
Not Synced

And you'll never know which way is good
or bad with a system that lies like that.
Not Synced

The other cool technique is
what I call "Cheating Twice".
Not Synced

You have a constant load generator,
and it needs to do 100 per second.
Not Synced

When it woke up after 200 seconds,
it says,
Not Synced

"Woah, were 9,999 behind.
We've got to issue those requests."
Not Synced

So it issues those requests.
Not Synced

At this point, not only did it get rid of
all the bad requests,
Not Synced

it replaced every one of them with
a perfect request.
Not Synced

Coining the four 9's (99.99%), all the way
to four and a half 9's (99.995%),
Not Synced

it's twice as wrong as dropping them.
Not Synced

So these are all cool things that
happen to you.
Not Synced

I'm not going to spend much time on how
to fix those and avoid those.
Not Synced

There's a lot of other material that you
can find with me
Not Synced

talking about that, in longer talks.
Not Synced

But this is pretty bad.
Not Synced

And like I said...
Not Synced

That should've been up there before.
Not Synced

How did this repeat itself?
Not Synced

Did I create a loop in the
presentation somehow?
Not Synced

I don't know how to do that.
Not Synced

Let's see if I can get through here.
Not Synced

Hopefully editing later will take it out.
Not Synced

So we have the cheats twice.
Not Synced

There, okay.
Not Synced

So, after we look at coordinated
omission that way,
Not Synced

we should also look at response time,
and service time.
Not Synced

Coordinated omission, what it really is
achieving for you, unfortunately,
Not Synced

is that it makes something that you think
is response time, and only shows you
Not Synced

the service time component of latency.
Not Synced

This is a simple depiction of what service
time and response times are.
Not Synced

This guy is taking a certain amount of
time to take payment
Not Synced

or make a cup of coffee.
Not Synced

That's service time.
Not Synced

How long does it take to do the work?
Not Synced

This person has experienced
the response time,
Not Synced

which includes the amount of time they
have to wait before they
Not Synced

get to the person that does the work.
Not Synced

And the difference between those
two is immense.
Not Synced

The coordinated omission problem makes
something that you think is
Not Synced

response time, only measure the
service time,
Not Synced

and basically hide the fact that things
stalled, waited in line,
Not Synced

that this guy might've taken a lunch break,
Not Synced

and now we have line around,
building three times.
Not Synced

Service time stays the same.
Not Synced

This is the backwards part...
Not Synced

Now, let's look at what it
actually looks like.
Not Synced

In a load generator that I fixed,
I measured both
Not Synced

response time and service time,
Not Synced

this happens to be Casandra,
Not Synced

at a very low load.
Not Synced

And you can see that they're very very
similar, at a very low load.
Not Synced

Why? Because there's nobody in line.
Not Synced

This thing is really fast.
Not Synced

We're not asking for too much.
Not Synced

Casandra's pretty fast,
so they're the same.
Not Synced

But if I increase the load, we
start seeing gaps.
Not Synced

If I increase the load a little more,
the gap grows.
Not Synced

If I increase the load a little more,
the gap grows.
Not Synced

Now this is not the failure point yet.
Not Synced

If I actually increase it all the way past
the point where the system
Not Synced

can't even do the work I want,
service time stays the same,
Not Synced

response time goes through the roof.
Not Synced

This was when it was 100 and something
milliseconds, now it's 7 and a half seconds.
Not Synced

Why 7 and a half seconds?
Not Synced

Cause you're waiting in line that long
to go around the block.
Not Synced

The guy just can't serve as many people
as are showing up in line, you fall behind.
Not Synced

This is a virtual world reaction to this.
Not Synced

I really like this slide, it's where I came
up with the notion of a blue/red pill.
Not Synced

When you actually measure reality, people
tend to have this reaction when
Not Synced

they compare the two.
Not Synced

And if we actually look at these on the
two sides of a collapse point of a system,
Not Synced

this specific system can only do 87,000
things a second.
Not Synced

No matter how hard you press it,
that's all it'll do.
Not Synced

The service time on the two sides of
the collapse looks virtually identical,
Not Synced

which it would.
Not Synced

But if you compare the response time,
you have a very different picture.
Not Synced

And I'm showing this picture so you get
a feeling for what to look at
Not Synced

on whether or not you're measuring
the right one.
Not Synced

Whenever you push, you try and push load
beyond what the system can do,
Not Synced

you are falling behind over time.
Not Synced

This is a 250 second run,
Not Synced

where at the end of it
you are waiting for 8 seconds in line.
Not Synced

Why? Because for every second
that goes by, there are
Not Synced

3,000 more things that are
added to the line.
Not Synced

The interesting thing that happens when
you cross the threshold limit,
Not Synced

or capability of the system, is that
response time grows over time linearly.
Not Synced

It doesn't happen if you're below.
Not Synced

Only if you're above.
Not Synced

It's the point where that happens, and
any load generator that doesn't show
Not Synced

that line when you try pushing harder
than you can, is lying to you.
Not Synced

It's a simple sanity check.
Not Synced

If your load generator shows you that,
it didn't push.
Not Synced

Or it pushed, but it didn't
report correctly,
Not Synced

whichever it is.
Not Synced

If we draw that to scale...
Not Synced

Just to make sure, this was not to scale,
this is the scale, I just zoomed in
Not Synced

so you could see that it was
relatively stable.
Not Synced

So... I don't know what happened to the
order of the slides.
Not Synced

It's like looping and randoming.
Not Synced

There's some conspiracy going on there.
Not Synced

Now, latency doesn't live on it's own.
Not Synced

You do need to look at latency in the
context of load.
Not Synced

Cause as I showed you, as you're nearly
idle, things are nearly perfect.
Not Synced

Even these mistakes won't show up.
Not Synced

But as you start pressing, things start
cracking or behaving differently.
Not Synced

And usually when you want to know how much
your system can handle,
Not Synced

the answer is not 87,000 things a second,
because nobody wants the
Not Synced

response time that comes with that.
Not Synced

It's how many things can I handle so
that I don't get angry phone calls.
Not Synced

So I do get my bonus, and so my
company stays above ground.
Not Synced

This is not sustainable speed.
Not Synced

Running this experiment is really
interesting with software,
Not Synced

because it actually doesn't hurt, but
spending the next 6 months of your time
Not Synced

repeating this experiment, trying to
change the shape of the bumper
Not Synced

every time you hit the thing
is a waste of your time.
Not Synced

Your goal when you're trying to figure
out sustainable speed throughput,
Not Synced

whatever it is, is to see how fast you can
go without this happening,
Not Synced

and then to try and engineer
to improve that.
Not Synced

Meaning, can I make it go faster
without this happening?
Not Synced

Measuring what happens after you
hit the pole is useless for that exercise.
Not Synced

The only thing that matters about hitting
the pole, is that you hit the pole.
Not Synced

When you go and study the behavior
of latency, at saturation,
Not Synced

you are doing this.
Not Synced

You're looking at this and saying, "That
bumper, I don't like the shape of that.
Not Synced

Let's measure it closely and do this 100
times to see if we can vary it."
Not Synced

That's what it means to look at latency
at saturation,
Not Synced

and repeat, and repeat, and change,
and tune, and see if you can do it again.
Not Synced

If you're pressing it to the wall,
it should look like this.
Not Synced

And it shouldn't be a surprise that it's
a 7 and a half second response time.
Not Synced

In fact, if it's not, something is
terribly wrong with what you're measuring.
Not Synced

You should look at that instead.
Not Synced

So don't do this.
Not Synced

Try to minimize the number of times
that you actually run red cars
Not Synced

into poles in your testing.
Not Synced

I'm not saying don't do it, but use it
to establish the end.
Not Synced

And then you need to test all the speeds,
and we need to see when you hit the pole.
Not Synced

Maybe you hit the pole at 100 mph,
but maybe you also hit the pole at 70 mph.
Not Synced

Maybe you don't hit it at 20.
Not Synced

We should find out how fast is safe.
Not Synced

When you have data, you can compare
it like this.
Not Synced

This is what I would say a recommended
way to look at it.
Not Synced

Plot requirements, that's the hitting
the pole.
Not Synced

And some things hit the pole,
and some things don't.
Not Synced

And you run different scenarios,
different loads,
Not Synced

different configurations,
Not Synced

different settings,
Not Synced

and see what works, and what doesn't.
Not Synced

Your goal is to stay here, and carry
more while staying there.
Not Synced

Usually.
Not Synced

It's very useful for figuring out how many
machines I need to carry a certain thing.
Not Synced

If you don't know this, you don't know
how many machines to deploy.
Not Synced

Okay, I'm going to run through
some comparisons of
Not Synced

latency or response time behaviors
between different configurations
Not Synced

to show you some of the places
people look, and some of the
Not Synced

intuitive and non-intuitive
things to do with them.
Not Synced

The common thing,
Not Synced

and again, this is that Casandra thing,
Not Synced

comparing two systems, A and B.
Not Synced

I'll let you guess which one is A,
and which one is B.
Not Synced

It's two systems, and saying
which is better, what can I do with this?
Not Synced

And we're measuring here at two
throughputs, 85 and 90k.
Not Synced

As I said in here, 90k is past the
capability of the system.
Not Synced

You can sort of see it here.
Not Synced

See, 85 for both of them is here,
and 90k is here.
Not Synced

So you could look at this and say,
Not Synced

"Look. when the car hits the pole,
the blue system is better."
Not Synced

It's half as bad, but that's just
the wrong place to look.
Not Synced

They both suck.
Not Synced

You do not want to be doing this.
Not Synced

The fact that this system is better
than that system
Not Synced

doesn't make you want to use it.
Not Synced

This is the wrong place to measure.
Not Synced

This is where latency is irrelevant.
Not Synced

How they behave past this point
doesn't matter.
Not Synced

What we should be doing is saying,
Not Synced

"Well, then don't measure here.
Let's look there."
Not Synced

So if we zoom just at the 85k's on these
two systems, okay, they're different.
Not Synced

And now...
Not Synced

The red and the blue alternate here,
whatever that is.
Not Synced

And now you look at this,
and okay, it's better.
Not Synced

But we're still in the wrong place,
because we are 1.5% from hitting the pole.
Not Synced

It is not where you will be
running in production.
Not Synced

It's not the interesting place
to study latency.
Not Synced

That's the place that if you're anywhere
close to that, you should be on the phone
Not Synced

getting more servers now, rather than
trying to figure out the latency behaves.
Not Synced

You know it's going to collapse if just
a little bit of noise happens.
Not Synced

What you should be doing is looking
far away from the need,
Not Synced

far away from that.
Not Synced

For example, let's go to half the
throughput that causes collapse,
Not Synced

and see what things happen there.
Not Synced

And here you can see,
Not Synced

okay, these are two systems, and one
of them does better.
Not Synced

You can say that this percentile is better,
Not Synced

that percentile, whatever these are.
Not Synced

It is interesting, but what
can we do with this?
Not Synced

How do we tell our boss what this means?
Not Synced

Or how do we translate this into,
how many machines do I need?
Not Synced

Now so far, I've been comparing
things at the same throughput,
Not Synced

and looking at latencies.
Not Synced

And that's good for pass/fail kind of
things, or getting quantitative things,
Not Synced

but once you get to this point,
you can start saying,
Not Synced

"Wait, what if I do it at
different throughputs?"
Not Synced

How slow do I need to make this blue thing
to make it look closer to the red thing,
Not Synced

or the other way around.
Not Synced

I don't want to move this fast to 3-L too,
I want to move this to be there.
Not Synced

For example, slow that one up by 4X,
and look,
Not Synced

the two 9's are actually starting
to look similar.
Not Synced

If you slow it by...
Not Synced

So you can make a statement like this:
Not Synced

The 99th percentile, if you had a goal
like this,
Not Synced

and now you've passed the goal,
Not Synced

You'd say, "Both of them passed the goal,
but system B does it at 4 times the load."
Not Synced

That drives a choice, right?
Not Synced

You can make a harsher goal, and say,
Not Synced

I need the three 9's to be below
10 milliseconds,
Not Synced

so you'll slow these down even further.
Not Synced

At this point, you can make this statement:
Not Synced

If you want those, one of them is
10 times better.
Not Synced

Meaning, not that the system is
10 times faster,
Not Synced

but I can carry 10 times the load
before I fail, before I have to pull.
Not Synced

What I'm trying to demonstrate here,
is that how much more, or not,
Not Synced

you can can get out of a system depends
on you're requirements,
Not Synced

and whether or not you need to meet them.
Not Synced

Without setting those requirements,
looking at the percentile spectrum
Not Synced

of response time, not service time,
Not Synced

you'll never know how much you
need or not.
Not Synced

You can do a lot of other things,
these are just demonstrations
Not Synced

of how to look at data sets.
Not Synced

You make measure at a lot of levels.
Not Synced

You can look for systemic behaviors.
Not Synced

For example, this is one system, but
at varying levels.
Not Synced

You can sort of see that as you increase
the load, the percentiles move to the left.
Not Synced

That's a good observation.
Not Synced

It's not all systems that'll do it, but
for this system it'll be that.
Not Synced

You can also see that even though
this didn't totally collapse,
Not Synced

it's completely out of whack with
the rest,
Not Synced

so that kind of tells you let's not
look there.
Not Synced

So throw away the behavior...
Not Synced

You just know not to go to 80.
Not Synced

No need to study it much.
Not Synced

Now that's the remaining set.
Not Synced

You could look at that.
Not Synced

You could look at the set from the other
system and compare them.
Not Synced

Maybe put them next to
each other like this.
Not Synced

Or if you actually can fit enough lines,
with enough colors on a chart,
Not Synced

you can try and do stuff like that.
Not Synced

These are all good ways to actually
look at latencies,
Not Synced

actually study them.
Not Synced

And notice that in all these cases,
I didn't pick a number.
Not Synced

"Oh, let's compare the 99.9 percentile,"
because I won't get
Not Synced

any feeling for the shapes if I did that.
Not Synced

You want to look at the entire spectrum.
Not Synced

And that is what and HDR histogram
is very good for.
Not Synced

So, you know... You get those.
Not Synced

Now...
Not Synced

Wow, we're actually doing okay on time.
Not Synced

Now, this is one of my favorite ways to
depict things.
Not Synced

Remember I told you that if you don't plot
the max, what are you hiding?
Not Synced

It turns out that if you plot the max,
Not Synced

usually it's the number one
signal to look at over time,
Not Synced

these are just those two systems.
Not Synced

And with a simple visual, you get
a great intuition.
Not Synced

Same load, one of them's noisy, one's not.
Not Synced

You can look at the response time
and service time,
Not Synced

and all of the numbers of different
samples of percentiles,
Not Synced

but if you actually want to show a CEO
something,
Not Synced

this is a pretty good thing to show them.
Not Synced

"Look what I did over the weekend."
Not Synced

Before the weekend it looked like that,
and I fixed it."
Not Synced

"I deserve a prize."
Not Synced

With that, a simple thing to remember
is that this is your load on system A,
Not Synced

this your load on system B.
Not Synced

Any questions?
Not Synced

This is from an anti-drug commercial
in the 80's,
Not Synced

I don't know if anybody can remember.
Not Synced

So with that, we're ready
for any questions.
Not Synced

Any questions?
Not Synced

Wow, that bad?
Not Synced

(Laughing) Dreadful.
Not Synced

Okay, I have one here, and
one back there.
Not Synced

Let's start with the back.
Not Synced

(Audience Member): You said that there are
all these tools that you could use
Not Synced

that give you reasonable numbers,
and reasonable answers as far as
Not Synced

latency is concerned, so what are those
tools that you use?
Not Synced

So the question was, there a couple of
tools I mentioned that could give you
Not Synced

better information, and I used some
to chart here,
Not Synced

let me see, there are a lot of tools.
Not Synced

I used HDR histogram to plot
all these charts
Not Synced

with the continuous percentile curves.
Not Synced

I highly recommend you look at using it.
Not Synced

Just go to HDR histogram.org
and read stuff.
Not Synced

Or google it.
Not Synced

There's a bunch of people using it.
Not Synced

And the basic thing it does, is that it
gives you a tool that
Not Synced

allows you a practical way to have
this kind of
Not Synced

fidelity, dynamic range, and resolution
to even look at the shapes.
Not Synced

The other way to do it is to keep
all the data.
Not Synced

You don't have to have histograms
if you kept every single result,
Not Synced

but it many places that's not practical,
or makes it harder for the system to run.
Not Synced

If you can do that, that's even better.
Not Synced

And then run it through an HDR histogram
for analysis.
Not Synced

So that's as far as viewing things.
Not Synced

If you have data viewing it.
Not Synced

Unfortunately, HDR histogram is not
going to make the data good.
Not Synced

It's just going to show you
the data you have.
Not Synced

One of the things I would highly
recommend you try to do,
Not Synced

I'm going backwards, and hopefully
I'll hit what I wanted.
Not Synced

I highly recommend you look at your
data sets, and remember
Not Synced

that in this visual,
Not Synced

one strong tip I will give you, is that
any time you see a vertical
Not Synced

rise like that, you have a 99.9% chance
of looking at coordinated omission.
Not Synced

This is what coordinated omission
looks like.
Not Synced

There's a couple of other things that
can also look like that.
Not Synced

I haven't seen them in awhile, but
I can make them artificially happen,
Not Synced

so it's not conclusive that this is
coordinated omission,
Not Synced

but suspect it.
Not Synced

Suspect it hard.
Not Synced

So if you plot your data with
coordinated omission,
Not Synced

you will get a view of whether or not
you have this other problem.
Not Synced

But honestly, there's a much simpler
way to do it.
Not Synced

Run your control Z test, and see
if you have the problem.
Not Synced

This will just show you how it works.
Not Synced

A non-omitted, a sane response time
test, or latency test,
Not Synced

tends to have these more smooth humps
of curves transitioning between numbers.
Not Synced

Any vertical rise tends to
indicate omission.
Not Synced

So that's one thing there.
Not Synced

As far as the tools actually
measuring correctly,
Not Synced

remember I told you what the name of
the talk is,
Not Synced

so let me rattle off some tools.
Not Synced

Actually, let's do this.
Not Synced

You guys measure stuff here,
I assume.
Not Synced

Could you rattle off some tools
that you use?
Not Synced

What do you use for load generation
and measurement right now?
Not Synced

Volunteers?
Not Synced

JMeter?
Not Synced

Okay, JMeter.
Not Synced

Gatling.
Not Synced

Anybody else?
Not Synced

Okay, anybody with Grinder, WRK,
some of the commercial...
Not Synced

Oh, well yeah.
Not Synced

Gatling is the only tool I know of
right now, that is an actual tool
Not Synced

people use, not a demo, that has fixed
a coordinated omission
Not Synced

problem in its measurement.
Not Synced

There was actually a bug filed against it,
Not Synced

and the control Z edition in it
was fixed.
Not Synced

It is actually possible to perfectly
fix this.
Not Synced

You don't have to correct your guess,
you can actually correctly compute
Not Synced

the exact response time in any load
generator on earth,
Not Synced

if you just do it right.
Not Synced

All the other tools,
Not Synced

JMeter, Grinder, WRK,
Not Synced

the commercial tools that I won't mention,
Not Synced

they all do this wrong, unfortunately.
Not Synced

Cassandra stress, YCSB.
Not Synced

Oh, I'll take it back,
Not Synced

there's one more tool.
Not Synced

YCSB that is now on GitHub has been
fixed by my colleague,
Not Synced

and added a voidance of
coordinated omission.
Not Synced

So if you use the one on GitHub,
YCSB is actually correct now.
Not Synced

But all the previous parts are wrong.
Not Synced

Now, that's the really bad news,
Not Synced

because there are very few tools any of
you use that actually work right.
Not Synced

Somebody here mentioned WRK-2.
Not Synced

WRK-2 is something I built by taking WRK,
Not Synced

which is a really cool load generator and
just fixing it's measurement technique,
Not Synced

and adding a rate limiter to it, so you
can actually measure without
Not Synced

hitting the pole all the time.
Not Synced

I did that as a demonstration,
Not Synced

and the tool is useful probably.
Not Synced

People actually use it, and I think that
some people have actually
Not Synced

forked it and went further with it.
Not Synced

But I'm not maintaining it.
Not Synced

I'm not paying much attention to it,
so if you want to take it over, go ahead.
Not Synced

It's not like I can tell you there's a
good tool out there.
Not Synced

I wouldn't endorse my own demo as
the thing you need to
Not Synced

run in production necessarily.
Not Synced

For example, it's a little stale.
Not Synced

I did this almost a year ago, and WRK
added a lot of cool features since then,
Not Synced

I wish was more merged.
Not Synced

I similarly created Cassandra-Stress 2,
Not Synced

which is what generated all of these,
Not Synced

which Cassandra-Stress corrected.
Not Synced

It reports both response time,
and service time.
Not Synced

What is used to call latency
was service time.
Not Synced

So you see the data, and this is
what it would report,
Not Synced

and now it reports both.
Not Synced

Again, right now I would consider that
a cool demo,
Not Synced

not something I would say, "Go do this,
and I'll help you if it goes wrong."
Not Synced

I'm hoping to affect other people
actually doing it right.
Not Synced

And I'd appreciate any help from people
that want to do that too.
Not Synced

Unfortunately, to the original question,
Not Synced

there are tools that'll show you the data,
Not Synced

there are tools that'll hint that
the data is really bad.
Not Synced

As far as having better data,
Not Synced

that's a much harsher answer.
Not Synced

You're in trouble.
Not Synced

That's the reality, and I'm sorry.
Not Synced

There was a question up here?
Not Synced

(Audience Member): "I'm actually having
trouble phrasing that question,
Not Synced

so I may just want to come talk to
you after."
Not Synced

Okay, so the question will be later.
Not Synced

Any others?
Not Synced

Okay, well thanks everyone.
Not Synced

Hopefully this was useful.
Not Synced

(Applause)

Title:: "How NOT to Measure Latency" by Gil Tene
Description:: more » « less
Video Language:: English
Team:: Captions Requested
Duration:: 42:59

	Joseph Wickham edited English subtitles for "How NOT to Measure Latency" by Gil Tene
	Joseph Wickham edited English subtitles for "How NOT to Measure Latency" by Gil Tene
	Joseph Wickham edited English subtitles for "How NOT to Measure Latency" by Gil Tene
	Joseph Wickham edited English subtitles for "How NOT to Measure Latency" by Gil Tene
	Joseph Wickham edited English subtitles for "How NOT to Measure Latency" by Gil Tene
	Joseph Wickham edited English subtitles for "How NOT to Measure Latency" by Gil Tene
	Joseph Wickham edited English subtitles for "How NOT to Measure Latency" by Gil Tene
	Joseph Wickham edited English subtitles for "How NOT to Measure Latency" by Gil Tene

Show all

English subtitles

Incomplete

Revisions Compare revisions

Revision 19 Edited

Joseph Wickham
Revision 18 Edited

Joseph Wickham
Revision 17 Edited

Joseph Wickham
Revision 16 Edited

Joseph Wickham
Revision 15 Edited

Joseph Wickham
Revision 14 Edited

Joseph Wickham
Revision 13 Edited

Joseph Wickham
Revision 12 Edited

Joseph Wickham
Revision 11 Edited

Joseph Wickham
Revision 10 Edited

Joseph Wickham
Revision 9 Edited

Joseph Wickham
Revision 8 Edited

Joseph Wickham
Revision 7 Edited

Joseph Wickham
Revision 6 Edited

Joseph Wickham
Revision 5 Edited

Joseph Wickham
Revision 4 Edited

Joseph Wickham
Revision 3 Edited

Joseph Wickham
Revision 2 Edited

Joseph Wickham
Revision 1 Edited

Joseph Wickham

	Revision Number	Author	Created
	19	Joseph Wickham
	18	Joseph Wickham
	17	Joseph Wickham
	16	Joseph Wickham
	15	Joseph Wickham
	14	Joseph Wickham
	13	Joseph Wickham
	12	Joseph Wickham
	11	Joseph Wickham
	10	Joseph Wickham
	9	Joseph Wickham
	8	Joseph Wickham
	7	Joseph Wickham
	6	Joseph Wickham
	5	Joseph Wickham
	4	Joseph Wickham
	3	Joseph Wickham
	2	Joseph Wickham
	1	Joseph Wickham

"How NOT to Measure Latency" by Gil Tene

Revisions Compare revisions

Our website uses cookies

Operating cookies (Required)