Return to Video

"How NOT to Measure Latency" by Gil Tene

  • 0:05 - 0:08
    Hi everyone, I'm Gil Tene.
  • 0:08 - 0:12
    I'm going to be talking about this subject
    that I call "How NOT to Measure Latency".
  • 0:14 - 0:17
    It's a subject that I've been talking
    about now for 3 years or so.
  • 0:17 - 0:20
    I keep the title and change all
    the slides every time.
  • 0:21 - 0:22
    A bunch of this stuff is new.
  • 0:22 - 0:28
    So if you've seen any of my previous "How NOT to",
    you'll see only some things that are common.
  • 0:28 - 0:31
    A nickname for the subject is this...
  • 0:31 - 0:36
    Because I often will get that reaction
    from some people in the audience.
  • 0:37 - 0:39
    Ever since I've told people that it's a
    nickname,
  • 0:39 - 0:42
    They feel free to actually exclaim,
    "Oh S@%#!".
  • 0:42 - 0:44
    And feel free to do that here in this talk.
  • 0:45 - 0:47
    I'll prompt you in a couple of places
    where it is natural.
  • 0:48 - 0:51
    But if just have the urge, go ahead.
  • 0:51 - 0:53
    Ummm...
  • 0:53 - 0:55
    So just a tiny bit about me.
  • 0:55 - 0:57
    I am the co-founder of Azul Systems.
  • 0:58 - 1:00
    I play around with garbage collection a lot.
  • 1:00 - 1:03
    Here is some evidence of me playing around
    with garbage collection in my kitchen.
  • 1:03 - 1:05
    That's a trash compactor.
  • 1:05 - 1:10
    The compaction function wasn't working right,
    so I had to fix it.
  • 1:10 - 1:17
    I thought it'd be funny to take a picture
    with a book.
  • Not Synced
    I've also built a lot of things.
  • Not Synced
    I've been playing with computers since
    the early 80's.
  • Not Synced
    I've built hardware.
  • Not Synced
    I've helped design chips.
  • Not Synced
    I've built software at many
    different levels.
  • Not Synced
    Operating systems, drivers...
    JVM's obviously.
  • Not Synced
    And lots of big systems at the system level.
  • Not Synced
    Built our own app server in the late 90's
    because web logic wasn't around yet.
  • Not Synced
    So, I've made a lot of mistakes,
    and I've learned from a few of them.
  • Not Synced
    This is actually a combination of a bunch
    of those mistakes looking at latency.
  • Not Synced
    I do have this hobby of depressing people
    by pulling the wool up from over your eyes,
  • Not Synced
    and this is what this talk is about.
  • Not Synced
    So, I need to give you a choice right here.
  • Not Synced
    There's the door.
  • Not Synced
    You can take the blue pill,
    and you can leave.
  • Not Synced
    Tomorrow you can keep believing whatever
    it is you want to believe.
  • Not Synced
    But if you stay here and take the red pill,
    I will show you a glimpse of how
  • Not Synced
    far down the rabbit hole goes,
    and it will never be the same again.
  • Not Synced
    Let's talk about latency.
  • Not Synced
    And when I say latency, I'm talking about
    latency response time, any of those things
  • Not Synced
    where you measure time from 'here to here',
    and you're interested in how long it took.
  • Not Synced
    We do this all the time, but I see a lot
    of mish-mash in how people
  • Not Synced
    treat the data, or think about it.
  • Not Synced
    Latency is basically the time it took
    something to happen once.
  • Not Synced
    That one time, how long did it take.
  • Not Synced
    And when we measure stuff, like we did
    a million operations in the last hour,
  • Not Synced
    we have a million latencies. Not one,
    we have a million of them.
  • Not Synced
    Our actual goal is to figure out how to
    describe that million.
  • Not Synced
    How did the million behave?
  • Not Synced
    For example, 'they're all really good, and
    they're all exactly the same', would be a
  • Not Synced
    behavior that you will never see,
    but that would be a great behavior.
  • Not Synced
    So we need to talk about how things behave,
    communicate, think, evaluate,
  • Not Synced
    set requirements for, talk to other people,
    but these are all common things around that.
  • Not Synced
    To do that, we have to describe the
    distribution, the set, the behavior,
  • Not Synced
    but not the one.
  • Not Synced
    For example, the behavior that says "the
    the common case was x" is a piece of
  • Not Synced
    information about the behavior,
    but it's a tiny sliver.
  • Not Synced
    Usually the least relevant one.
  • Not Synced
    Well, there's some less relevant ones,
    but not a strongly relevant one,
  • Not Synced
    and one that people often focus on.
  • Not Synced
    To take a look at what we actually do
    with this stuff, almost on a daily basis,
  • Not Synced
    this is a snapshot from a monitoring system.
  • Not Synced
    A small dashboard on a big screen
    in a monitoring system.
  • Not Synced
    Where you're watching the response time of
    a system over time.
  • Not Synced
    This is a two hour window.
  • Not Synced
    These lines that are 95th percentile,
    90, 75, 50, and 25th percentiles,
  • Not Synced
    you can look at how they behave over time.
  • Not Synced
    We're a small audience here, if you look at
    this picture, what draws your eye?
  • Not Synced
    What do you want to go investigate here
    or pay attention to ?
  • Not Synced
    It's the big red spike there, right?
  • Not Synced
    So we could look at the red spike,
    cause it's different,
  • Not Synced
    and say, "Woah, the 95th percentile shot up
    here. And look, the 90th percentile
  • Not Synced
    shot up at about the same time.
  • Not Synced
    The rest of them didn't shoot up,
    so maybe something happened here
  • Not Synced
    that affected that much, I should probably
    pay attention to it
  • Not Synced
    because it's a monitoring system, and
    I like things to be calm."
  • Not Synced
    You could go investigate the why.
  • Not Synced
    At this point, I've managed to waste
    about 90 seconds of your life,
  • Not Synced
    looking at a completely meaningless chart,
    which unfortunately you do
  • Not Synced
    every day, all the time.
  • Not Synced
    This chart is the chart you want to show
    somebody if you want to
  • Not Synced
    hide the truth from them.
  • Not Synced
    If you want to pull the wool
    over their eyes.
  • Not Synced
    This is the chart of the good stuff.
  • Not Synced
    What's not on this chart?
  • Not Synced
    The 5% worse things that happened during
    this two hours.
  • Not Synced
    They're not here.
  • Not Synced
    This is only the good things that happened
    during the things.
  • Not Synced
    And to get this spike, that 5% had to be
    so bad that it even pulled
  • Not Synced
    the 95th percentile all up.
  • Not Synced
    There is zero information here at all about
    what happened bad during this two hours,
  • Not Synced
    which makes it a bad fit for
    a monitoring system.
  • Not Synced
    It's a really good thing for
    a marketing system.
  • Not Synced
    It's a great way to get the bonus from your boss, even though you didn't do the work.
  • Not Synced
    If you want to learn how to do that,
    we can do another talk about that.
  • Not Synced
    But this is not a good way to look at latency.
  • Not Synced
    It's the opposite of good.
  • Not Synced
    Unfortunately, this is one of the most
    common tools used for
  • Not Synced
    server monitoring on earth right now.
  • Not Synced
    That's where the snapshot is from,
    and this is what people look at.
  • Not Synced
    I find this chart to be a goldmine
    of information.
  • Not Synced
    When I first showed it in another talk
    like this, I had this really cool experience.
  • Not Synced
    Somebody came up to me and said, "Hey,
    as I was sitting here, I was texting one
  • Not Synced
    of our guys, and he was saying,
  • Not Synced
    'look, we have this issue with
    our 95th percentile'."
  • Not Synced
    And I got this chart from him!
  • Not Synced
    So I went and said, "Hey, what does the
    rest of the spectrum look like?"
  • Not Synced
    This is the actual chart they got.
  • Not Synced
    And when they look at the rest of the
    spectrum, it looked like that.
  • Not Synced
    That's what was hiding.
  • Not Synced
    I noticed the scales are a little different.
  • Not Synced
    That yellow line is that yellow line.
  • Not Synced
    So that's a much more representative number.
  • Not Synced
    Is it? Is that good enough?
  • Not Synced
    That's the 99th percentile.
  • Not Synced
    We still have another 1% of really bad
    stuff that's hiding above the blue line.
  • Not Synced
    I wonder how big that is?
  • Not Synced
    I don't know because he didn't have the data.
  • Not Synced
    So a common problem that we have is that
    we only plot what's convenient.
  • Not Synced
    We only plot what gives us nice,
    colorful graphs.
  • Not Synced
    And often, when we have to choose between
    the stuff that hides the rest of the data,
  • Not Synced
    and the stuff that is noise, we choose
    the noise to display.
  • Not Synced
    I like to rant about latency.
  • Not Synced
    This is from a blog that I don't write
    enough in, but the format for it was simple.
  • Not Synced
    I tweet a single tweet about latency,
    latency tip of the day,
  • Not Synced
    and then I rant about my own tweet.
  • Not Synced
    As an example, this chart is a goldmine
    of information because it has so many
  • Not Synced
    different things that are wrong in it,
    but we won't get into all of them.
  • Not Synced
    You can read it online.
  • Not Synced
    Anyway, this is one to take away from
    what we just said.
  • Not Synced
    If you are not measuring and showing the
    maximum value, what is it you are hiding?
  • Not Synced
    And from whom?
  • Not Synced
    If you're job is to hide the truth from
    others, this is a good way to do it.
  • Not Synced
    But if actually are interested in what's
    going on, the number one indicator
  • Not Synced
    you should never get rid of is the
    maximum value.
  • Not Synced
    That is not noise, that is the signal.
  • Not Synced
    The rest of it is noise.
  • Not Synced
    Okay, let's look at this chart for some
    more cool stuff.
  • Not Synced
    I'm gonna zoom in to a small part
    of the chart, and ask you what that means.
  • Not Synced
    What is the average of the 95th percentile
    over 2 hours mean?
  • Not Synced
    What is the math that does that?
  • Not Synced
    What does it do?
  • Not Synced
    Let's look at that, and I'll give you
    an example with another percentile.
  • Not Synced
    The 100th percentile. The max, right?
  • Not Synced
    Let's take a data set.
  • Not Synced
    Suppose this was the maximum every minute
    for 15 minutes.
  • Not Synced
    What does it mean to say that the average
    max over the last 15 minutes was 42?
  • Not Synced
    I specifically chose the data to
    make that happen.
  • Not Synced
    It's a meaningless statement.
  • Not Synced
    It's a completely meaningless statement.
  • Not Synced
    But when you see 95th percentile,
    average 184, you think that the 95th
  • Not Synced
    percentile for the last two hours
    was around 184.
  • Not Synced
    It makes you think that.
  • Not Synced
    Putting this on a piece of paper is not
    just noise and irrelevant,
  • Not Synced
    it's a way to mislead people.
  • Not Synced
    It's a way to mislead yourself, because
    you'll start to believe your own mistruths.
  • Not Synced
    This is true for any percentile.
  • Not Synced
    There is no percentile that you could do
    this math on.
  • Not Synced
    Another tip, you cannot average percentiles.
  • Not Synced
    That math doesn't happen.
  • Not Synced
    But percentiles do matter. You really
    want to know about them.
  • Not Synced
    And a common misperception is that we want
    to look at the main part of the spectrum,
  • Not Synced
    not those outliers and perfection stuff.
  • Not Synced
    Only people that actually bet their house
    every day, or the bank on it,
  • Not Synced
    need to know about the "five-nine's",
    and all those.
  • Not Synced
    The 99th percentile is a pretty
    good number.
  • Not Synced
    Is 99% really rare?
  • Not Synced
    Let's look at some stuff, because we can
    ask questions like, "If I were looking
  • Not Synced
    at a webpage, what is the chance of me
    hitting the 99th percentile?"
  • Not Synced
    Of things like this: a search engine node,
    or a key value store,
  • Not Synced
    or a database, or a CDN, right?
  • Not Synced
    Because they will report their 99th percentile.
  • Not Synced
    They won't tell you anything above that,
    but how many of the
  • Not Synced
    webpages that we go to
    actually experience this?
  • Not Synced
    You want to say 1%, right?
  • Not Synced
    Well, I went to some webpages and I counted
    how many "http" requests were generated
  • Not Synced
    by one click into that webpage,
    and here are the numbers.
  • Not Synced
    I ended that about a year ago.
  • Not Synced
    They've probably gone up since then.
  • Not Synced
    Now that translates into this math.
  • Not Synced
    This is the likelihood of one click seeing
    the 99th percentile.
  • Not Synced
    And the only page where that is less than
    50% is the clean google search page.
  • Not Synced
    Where only a quarter will see the
    99th percentile.
  • Not Synced
    The 99th percentile is the thing that most
    of your webpages will see.
  • Not Synced
    Most of them will be there.
  • Not Synced
    Now, we could look at other things.
  • Not Synced
    We can pick which things to focus on.
  • Not Synced
    Let's say I had to pick between the 95th
    percentile, and the three 9's (99.9%).
  • Not Synced
    The three 9's is way into perfection mode
    for most people, or they think.
  • Not Synced
    Which one of those represents our
    community better?
  • Not Synced
    Our population?
  • Not Synced
    Our users?
  • Not Synced
    Our experience?
  • Not Synced
    Let's run a hypothetical.
  • Not Synced
    Suppose we don't have that many pages,
    and that many resources like we said before.
  • Not Synced
    We'll be much more conservative.
  • Not Synced
    A user session will only go through five
    clicks, and each click will only bring up
  • Not Synced
    up to 40 things.
  • Not Synced
    A lot less, and they're all as clean
    as the google page.
  • Not Synced
    How many of the users will not experience
    something worse than the 95th percentile?
  • Not Synced
    Because that's what the 95th percentile
    is good for, the people who see that.
  • Not Synced
    Anybody above that, is that.
  • Not Synced
    What are the chances of not seeing it?
  • Not Synced
    That's an interesting number.
  • Not Synced
    So you're watching a number that is
    relevant to 0.003% of your users.
  • Not Synced
    99.997% of your users are going to
    see worse than this number.
  • Not Synced
    Why are you looking at it?
  • Not Synced
    Why are you spending time
    thinking about it?
  • Not Synced
    In reverse, we could say how many people
    are going to see something
  • Not Synced
    worse than the three 9's (99.9%)?
  • Not Synced
    That's going to be 18%.
  • Not Synced
    In reverse, 82% of the people will see
    the three 9's (99.9%) or better.
  • Not Synced
    That's a slightly better representation.
  • Not Synced
    Probably not good enough either.
  • Not Synced
    We could look at some more math with them,
    same kind of scenario.
  • Not Synced
    What percentile of http response time
    will be the thing that 95%
  • Not Synced
    of people experience in this scenario?
  • Not Synced
    It's the 99.97 percentile that 95%
    of people see.
  • Not Synced
    And if you want to know what 99%
    of the people see,
  • Not Synced
    that's four and a half 9's (99.995%).
  • Not Synced
    You want to know that number from Akamai
    if you want to predict what 1%
  • Not Synced
    of your users are going to experience.
  • Not Synced
    When you know the 99th percentile,
    you kind of know a tiny bit.
  • Not Synced
    So here's another tip.
  • Not Synced
    And this is not an exaggeration,
    by the way.
  • Not Synced
    The median, which is a much smaller
    percentile, has that minuscule a chance
  • Not Synced
    of ever being the number that
    anybody experiences.
  • Not Synced
    This is the chance of getting worse
    than the median.
  • Not Synced
    Which makes the median an irrelevant
    number to look at.
  • Not Synced
    Unfortunately, it's probably the most
    common one looked at.
  • Not Synced
    When people say "the typical",
    they look at the thing that
  • Not Synced
    everything will be worse than.
  • Not Synced
    Okay, I'm sorry about that part.
  • Not Synced
    We'll do some other parts.
  • Not Synced
    Now, why is it that when we look at these
    monitoring systems, we don't see
  • Not Synced
    data with a lot of 9's?
  • Not Synced
    Why do we stop at the
    90, 95, 99th percentile?
  • Not Synced
    Why don't we look further?
  • Not Synced
    Now, some of it is because people think,
    "Well that's perfection, I don't need it."
  • Not Synced
    The other part is that it's hard.
  • Not Synced
    It's hard because you can't
    average percentiles.
  • Not Synced
    We already talked about that.
  • Not Synced
    But you also can't derive your
    five 9's (99.999%) out of a lot
  • Not Synced
    of 10 second samples of percentiles.
  • Not Synced
    And the reason for that is, "Hey, in 10
    seconds, maybe I only had 1,000 things."
  • Not Synced
    I could take all the 10 seconds in the
    world, there's no way to say what the
  • Not Synced
    hour five 9's (99.999%) were, what the
    minutes five 9's were
  • Not Synced
    if I'm collecting just this data.
  • Not Synced
    And unfortunately, the data being collected
    and reported to the back ends of monitoring
  • Not Synced
    is usually summarized at a second,
    5 seconds, 10 seconds, etc.
  • Not Synced
    Basically throwing away all the good data,
    and leaving you with absolutely no way
  • Not Synced
    to compute large 9's for longer
    periods of time.
  • Not Synced
    So, this is where you might want to look
    at HDR Histogram.
  • Not Synced
    It's an open source thing I've created
    a few years ago.
  • Not Synced
    I did it in Java, and know there's a
    C, C-Sharp, Python, Erlang,
  • Not Synced
    and Go ports of this that I didn't create.
  • Not Synced
    And it lets you actually get an entire
    percentile spectrum.
  • Not Synced
    Some of you here I know are
    already using it.
  • Not Synced
    And you can look at all the percentiles.
  • Not Synced
    Any number of 9's that's in the data, if
    you just keep it right and report it right,
  • Not Synced
    it's got a log format, you can
    store things forever.
  • Not Synced
    Well, for a long time.
  • Not Synced
    Okay, so it lets you have nice things.
  • Not Synced
    Enough for that advertisement.
  • Not Synced
    Now, latency... Well, I think this is
    slightly out of order.
  • Not Synced
    Yeah, sorry.
  • Not Synced
    This is the red/blue pill part, so I warn
    you, this is your last chance.
  • Not Synced
    There's a problem I call the
    coordinated omission problem.
  • Not Synced
    The coordinated omission problem is
    basically a conspiracy.
  • Not Synced
    It's a conspiracy that we're all part of.
  • Not Synced
    I don't think anybody actually meant
    to do it, but once I've noticed it,
  • Not Synced
    everywhere I look, there it is.
  • Not Synced
    Now, I've been using a specific way of
    showing you numbers so far.
  • Not Synced
    Has anybody here noticed how
    I spell percentile?
  • Not Synced
    (Audience Member): "You put lie at the
    end of the percent sign."
  • Not Synced
    Yeah, good.
  • Not Synced
    So coordinated omission problem is the
    "lie" in %lies.
  • Not Synced
    And this is how it works.
  • Not Synced
    One common way to do this is
    to use a load generator.
  • Not Synced
    Pretty much all load generator's
    have this problem.
  • Not Synced
    There are two that I know of that don't.
  • Not Synced
    What you do with a load generator,
    is you test.
  • Not Synced
    You issue requests, or send packets.
  • Not Synced
    And you measure how long something took.
  • Not Synced
    And as long as the numbers go right,
    measure them, put them in a bucket,
  • Not Synced
    study them later, and get your
    percentiles from it.
  • Not Synced
    But what if the thing that you are
    measuring took longer than the time
  • Not Synced
    it would've taken until you send
    the next thing?
  • Not Synced
    You're supposed to send something
    every second,
  • Not Synced
    but this one took a second and a half.
  • Not Synced
    Well you've got to wait before
    you send the next one.
  • Not Synced
    You just avoided measuring something
    when the system was problematic.
  • Not Synced
    You've coordinated with it.
  • Not Synced
    You weren't looking at it then.
  • Not Synced
    That's common scenario A: You've backed
    off, and avoided measuring when it was bad.
  • Not Synced
    Another way, is you measure inside your code.
  • Not Synced
    We all do this. We all have to do this,
  • Not Synced
    where we measure time, do something,
    then measure time.
  • Not Synced
    The delta between them is how long it took.
  • Not Synced
    We can then put it in a stats bucket,
    and then do the percentiles in that.
  • Not Synced
    Unfortunately, if the system freezes right
    here, for any reason,
  • Not Synced
    an interrupted contact switch,
  • Not Synced
    a cash buffer flushed to disk,
  • Not Synced
    a garbage collection,
  • Not Synced
    a re-indexing of your database,
    this is a database.
  • Not Synced
    This is Cassandra by the way,
    measuring itself.
  • Not Synced
    In any of the above, then you will
    have one bad report
  • Not Synced
    while 10,000 things are waiting in line.
  • Not Synced
    And when they come in, they will look
    really, really good.
  • Not Synced
    Even though each one of them has had
    a really bad experience.
  • Not Synced

    It can even get worse, where maybe the
    freeze happened outside the timing,
  • Not Synced
    and you won't even know there was a freeze.
  • Not Synced
    Now these are examples of admitting data
    that is bad on a very selective basis.
  • Not Synced
    It's not random sampling.
  • Not Synced
    It's, "I don't like bad data",
  • Not Synced
    or "I couldn't handle it",
  • Not Synced
    or "I don't know about it",
  • Not Synced
    so we'll just talk about the good.
  • Not Synced
    What does that do to your data?
  • Not Synced
    Because it often makes people feel like,
  • Not Synced
    "Okay, yeah, I understand,
    but it's a little bit of noise."
  • Not Synced
    Let's run some hypotheticals,
    and I'll show you some real numbers.
  • Not Synced
    Imagine a perfect system.
  • Not Synced
    It's doing 100 requests a second,
    at exactly a millisecond each.
  • Not Synced
    But we go and freeze the system,
    after 100 seconds of perfect operations
  • Not Synced
    for 100 seconds, and then repeat.
  • Not Synced
    Now, I'm going to describe how the system
    behaves in terms that should mean something,
  • Not Synced
    and then we'll measure it.
  • Not Synced
    If we actually wanted to describe the
    system,
  • Not Synced
    on the left we have an average
    of one millisecond by the finish,
  • Not Synced
    and on the right we have an
    average of 50 seconds.
  • Not Synced
    Why 50? Because if I randomly came in
    in that 100 seconds,
  • Not Synced
    I'll get anything from 0 to 100
    with even distribution.
  • Not Synced
    The overall average over 200 seconds
    is 25 seconds.
  • Not Synced
    If I just came in here and said,
    "Surprise, how long did this take?"
  • Not Synced
    On average, it will be 25.
  • Not Synced
    I can also do the percentiles.
  • Not Synced
    50th percentile will be really good,
    and then it'll get really bad.
  • Not Synced
    The four 9's is terrible.
  • Not Synced
    This is a fair honest description of
    this system if this is what it did.
  • Not Synced
    And you can make the system do that.
  • Not Synced
    That's what Control Z is good for.
  • Not Synced
    You can make any of your systems do that.
  • Not Synced
    Now lets go measure this system with
    a load generator,
  • Not Synced
    or with a monitoring system.
  • Not Synced
    The common ones.
  • Not Synced
    The ones everybody does.
  • Not Synced
    On the left, we're going to get 10,000
    results of one millisecond each.
  • Not Synced
    Great.
  • Not Synced
    And we're going to get one result of
    100 seconds.
  • Not Synced
    Wow, really big response time.
  • Not Synced
    This is our data.
  • Not Synced
    This is OUR data.
  • Not Synced
    So now you go do math with it.
  • Not Synced
    The average of that is 10.9 milliseconds.
  • Not Synced
    A little less than 25 seconds.
  • Not Synced
    And here are the percentiles.
  • Not Synced
    Your load generator monitoring system
    will tell you that this system is perfect.
  • Not Synced
    You could go to production with it.
  • Not Synced
    You like what you see.
  • Not Synced
    Look at that, four 9's.
  • Not Synced
    It is lying to you.
  • Not Synced
    To your face.
  • Not Synced
    And you can catch it doing that with a
    Control Z-Test.
  • Not Synced
    But people tend to not want to do that,
    because then what are they going to do?
  • Not Synced
    If you just do that test, and calibrate
    your system, and you find it
  • Not Synced
    telling you that, about this, the next
    step should be to throw all the numbers away.
  • Not Synced
    Don't believe anything else it says.
  • Not Synced
    If it lies this big, what else did it do?
  • Not Synced
    Don't waste your time on numbers
    from uncalibrated systems.
  • Not Synced
    Now the problem here was, that if you
    want to measure the system,
  • Not Synced
    you have to measure at random rates,
    or same rates.
  • Not Synced
    If you measure 10,000 things in 100 seconds,
    there should be another 10,000 things here.
  • Not Synced
    If you measure them, you would've gotten
    all the right numbers.
  • Not Synced
    Coordinated omission is the simple act of
    erasing all that bad stuff.
  • Not Synced
    The conspiracy here is that we all do it
    without meaning to.
  • Not Synced
    I don't know who put that in our systems,
    but it happens to all of us .
  • Not Synced
    Now, I often get people saying,
    "Okay, I get it. All the numbers are wrong,
  • Not Synced
    but at least for my job where I tune
    performance, and I try to make things
  • Not Synced
    faster, I can use the numbers to figure
    out if I'm going in the right direction."
  • Not Synced
    Is it better, or is it worse? Let me
    dispel that for you for a second.
  • Not Synced
    Suppose I went and took this system,
    and improved it dramatically.
  • Not Synced
    Rather than freezing for 100 seconds,
    it will now answer every question.
  • Not Synced
    It'll take a little longer,
    5 milliseconds instead of one,
  • Not Synced
    but it's much better than freezing, right?
  • Not Synced
    So let's measure that system that we spent
    weeks and weeks improving,
  • Not Synced
    and see if it's better.
  • Not Synced
    That's the data.
  • Not Synced
    If we do the percentiles, it'll tell us
    that we just really hurt the four 9's.
  • Not Synced
    We made it go 5 times worse than before.
  • Not Synced
    We should revert this change, go back to
    that much better system we had before.
  • Not Synced
    So this is just to make sure that you
    don't think that you can have
  • Not Synced
    any intuition based on any of these numbers.
  • Not Synced
    They go backwards sometimes.
  • Not Synced
    You don't know which way is good or bad.
  • Not Synced
    And you'll never know which way is good
    or bad with a system that lies like that.
  • Not Synced
    The other cool technique is
    what I call "Cheating Twice".
  • Not Synced
    You have a constant load generator,
    and it needs to do 100 per second.
  • Not Synced
    When it woke up after 200 seconds,
    it says,
  • Not Synced
    "Woah, were 9,999 behind.
    We've got to issue those requests."
  • Not Synced
    So it issues those requests.
  • Not Synced
    At this point, not only did it get rid of
    all the bad requests,
  • Not Synced
    it replaced every one of them with
    a perfect request.
  • Not Synced
    Coining the four 9's (99.99%), all the way
    to four and a half 9's (99.995%),
  • Not Synced
    it's twice as wrong as dropping them.
  • Not Synced
    So these are all cool things that
    happen to you.
  • Not Synced
    I'm not going to spend much time on how
    to fix those and avoid those.
  • Not Synced
    There's a lot of other material that you
    can find with me
  • Not Synced
    talking about that, in longer talks.
  • Not Synced
    But this is pretty bad.
  • Not Synced
    And like I said...
  • Not Synced
    That should've been up there before.
  • Not Synced
    How did this repeat itself?
  • Not Synced
    Did I create a loop in the
    presentation somehow?
  • Not Synced
    I don't know how to do that.
  • Not Synced
    Let's see if I can get through here.
  • Not Synced
    Hopefully editing later will take it out.
  • Not Synced
    So we have the cheats twice.
  • Not Synced
    There, okay.
  • Not Synced
    So, after we look at coordinated
    omission that way,
  • Not Synced
    we should also look at response time,
    and service time.
  • Not Synced
    Coordinated omission, what it really is
    achieving for you, unfortunately,
  • Not Synced
    is that it makes something that you think
    is response time, and only shows you
  • Not Synced
    the service time component of latency.
  • Not Synced
    This is a simple depiction of what service
    time and response times are.
  • Not Synced
    This guy is taking a certain amount of
    time to take payment
  • Not Synced
    or make a cup of coffee.
  • Not Synced
    That's service time.
  • Not Synced
    How long does it take to do the work?
  • Not Synced
    This person has experienced
    the response time,
  • Not Synced
    which includes the amount of time they
    have to wait before they
  • Not Synced
    get to the person that does the work.
  • Not Synced
    And the difference between those
    two is immense.
  • Not Synced
    The coordinated omission problem makes
    something that you think is
  • Not Synced
    response time, only measure the
    service time,
  • Not Synced
    and basically hide the fact that things
    stalled, waited in line,
  • Not Synced
    that this guy might've taken a lunch break,
  • Not Synced
    and now we have line around,
    building three times.
  • Not Synced
    Service time stays the same.
  • Not Synced
    This is the backwards part...
  • Not Synced
    Now, let's look at what it
    actually looks like.
  • Not Synced
    In a load generator that I fixed,
    I measured both
  • Not Synced
    response time and service time,
  • Not Synced
    this happens to be Casandra,
  • Not Synced
    at a very low load.
  • Not Synced
    And you can see that they're very very
    similar, at a very low load.
  • Not Synced
    Why? Because there's nobody in line.
  • Not Synced
    This thing is really fast.
  • Not Synced
    We're not asking for too much.
  • Not Synced
    Casandra's pretty fast,
    so they're the same.
  • Not Synced
    But if I increase the load, we
    start seeing gaps.
  • Not Synced
    If I increase the load a little more,
    the gap grows.
  • Not Synced
    If I increase the load a little more,
    the gap grows.
  • Not Synced
    Now this is not the failure point yet.
  • Not Synced
    If I actually increase it all the way past
    the point where the system
  • Not Synced
    can't even do the work I want,
    service time stays the same,
  • Not Synced
    response time goes through the roof.
  • Not Synced
    This was when it was 100 and something
    milliseconds, now it's 7 and a half seconds.
  • Not Synced
    Why 7 and a half seconds?
  • Not Synced
    Cause you're waiting in line that long
    to go around the block.
  • Not Synced
    The guy just can't serve as many people
    as are showing up in line, you fall behind.
  • Not Synced
    This is a virtual world reaction to this.
  • Not Synced
    I really like this slide, it's where I came
    up with the notion of a blue/red pill.
  • Not Synced
    When you actually measure reality, people
    tend to have this reaction when
  • Not Synced
    they compare the two.
  • Not Synced
    And if we actually look at these on the
    two sides of a collapse point of a system,
  • Not Synced
    this specific system can only do 87,000
    things a second.
  • Not Synced
    No matter how hard you press it,
    that's all it'll do.
  • Not Synced
    The service time on the two sides of
    the collapse looks virtually identical,
  • Not Synced
    which it would.
  • Not Synced
    But if you compare the response time,
    you have a very different picture.
  • Not Synced
    And I'm showing this picture so you get
    a feeling for what to look at
  • Not Synced
    on whether or not you're measuring
    the right one.
  • Not Synced
    Whenever you push, you try and push load
    beyond what the system can do,
  • Not Synced
    you are falling behind over time.
  • Not Synced
    This is a 250 second run,
  • Not Synced
    where at the end of it
    you are waiting for 8 seconds in line.
  • Not Synced
    Why? Because for every second
    that goes by, there are
  • Not Synced
    3,000 more things that are
    added to the line.
  • Not Synced
    The interesting thing that happens when
    you cross the threshold limit,
  • Not Synced
    or capability of the system, is that
    response time grows over time linearly.
  • Not Synced
    It doesn't happen if you're below.
  • Not Synced
    Only if you're above.
  • Not Synced
    It's the point where that happens, and
    any load generator that doesn't show
  • Not Synced
    that line when you try pushing harder
    than you can, is lying to you.
  • Not Synced
    It's a simple sanity check.
  • Not Synced
    If your load generator shows you that,
    it didn't push.
  • Not Synced
    Or it pushed, but it didn't
    report correctly,
  • Not Synced
    whichever it is.
  • Not Synced
    If we draw that to scale...
  • Not Synced
    Just to make sure, this was not to scale,
    this is the scale, I just zoomed in
  • Not Synced
    so you could see that it was
    relatively stable.
  • Not Synced
    So... I don't know what happened to the
    order of the slides.
  • Not Synced
    It's like looping and randoming.
  • Not Synced
    There's some conspiracy going on there.
  • Not Synced
    Now, latency doesn't live on it's own.
  • Not Synced
    You do need to look at latency in the
    context of load.
  • Not Synced
    Cause as I showed you, as you're nearly
    idle, things are nearly perfect.
  • Not Synced
    Even these mistakes won't show up.
  • Not Synced
    But as you start pressing, things start
    cracking or behaving differently.
  • Not Synced
    And usually when you want to know how much
    your system can handle,
  • Not Synced
    the answer is not 87,000 things a second,
    because nobody wants the
  • Not Synced
    response time that comes with that.
  • Not Synced
    It's how many things can I handle so
    that I don't get angry phone calls.
  • Not Synced
    So I do get my bonus, and so my
    company stays above ground.
  • Not Synced
    This is not sustainable speed.
  • Not Synced
    Running this experiment is really
    interesting with software,
  • Not Synced
    because it actually doesn't hurt, but
    spending the next 6 months of your time
  • Not Synced
    repeating this experiment, trying to
    change the shape of the bumper
  • Not Synced
    every time you hit the thing
    is a waste of your time.
  • Not Synced
    Your goal when you're trying to figure
    out sustainable speed throughput,
  • Not Synced
    whatever it is, is to see how fast you can
    go without this happening,
  • Not Synced
    and then to try and engineer
    to improve that.
  • Not Synced
    Meaning, can I make it go faster
    without this happening?
  • Not Synced
    Measuring what happens after you
    hit the pole is useless for that exercise.
  • Not Synced
    The only thing that matters about hitting
    the pole, is that you hit the pole.
  • Not Synced
    When you go and study the behavior
    of latency, at saturation,
  • Not Synced
    you are doing this.
  • Not Synced
    You're looking at this and saying, "That
    bumper, I don't like the shape of that.
  • Not Synced
    Let's measure it closely and do this 100
    times to see if we can vary it."
  • Not Synced
    That's what it means to look at latency
    at saturation,
  • Not Synced
    and repeat, and repeat, and change,
    and tune, and see if you can do it again.
  • Not Synced
    If you're pressing it to the wall,
    it should look like this.
  • Not Synced
    And it shouldn't be a surprise that it's
    a 7 and a half second response time.
  • Not Synced
    In fact, if it's not, something is
    terribly wrong with what you're measuring.
  • Not Synced
    You should look at that instead.
  • Not Synced
    So don't do this.
  • Not Synced
    Try to minimize the number of times
    that you actually run red cars
  • Not Synced
    into poles in your testing.
  • Not Synced
    I'm not saying don't do it, but use it
    to establish the end.
  • Not Synced
    And then you need to test all the speeds,
    and we need to see when you hit the pole.
  • Not Synced
    Maybe you hit the pole at 100 mph,
    but maybe you also hit the pole at 70 mph.
  • Not Synced
    Maybe you don't hit it at 20.
  • Not Synced
    We should find out how fast is safe.
  • Not Synced
    When you have data, you can compare
    it like this.
  • Not Synced
    This is what I would say a recommended
    way to look at it.
  • Not Synced
    Plot requirements, that's the hitting
    the pole.
  • Not Synced
    And some things hit the pole,
    and some things don't.
  • Not Synced
    And you run different scenarios,
    different loads,
  • Not Synced
    different configurations,
  • Not Synced
    different settings,
  • Not Synced
    and see what works, and what doesn't.
  • Not Synced
    Your goal is to stay here, and carry
    more while staying there.
  • Not Synced
    Usually.
  • Not Synced
    It's very useful for figuring out how many
    machines I need to carry a certain thing.
  • Not Synced
    If you don't know this, you don't know
    how many machines to deploy.
  • Not Synced
    Okay, I'm going to run through
    some comparisons of
  • Not Synced
    latency or response time behaviors
    between different configurations
  • Not Synced
    to show you some of the places
    people look, and some of the
  • Not Synced
    intuitive and non-intuitive
    things to do with them.
  • Not Synced
    The common thing,
  • Not Synced
    and again, this is that Casandra thing,
  • Not Synced
    comparing two systems, A and B.
  • Not Synced
    I'll let you guess which one is A,
    and which one is B.
  • Not Synced
    It's two systems, and saying
    which is better, what can I do with this?
  • Not Synced
    And we're measuring here at two
    throughputs, 85 and 90k.
  • Not Synced
    As I said in here, 90k is past the
    capability of the system.
  • Not Synced
    You can sort of see it here.
  • Not Synced
    See, 85 for both of them is here,
    and 90k is here.
  • Not Synced
    So you could look at this and say,
  • Not Synced
    "Look. when the car hits the pole,
    the blue system is better."
  • Not Synced
    It's half as bad, but that's just
    the wrong place to look.
  • Not Synced
    They both suck.
  • Not Synced
    You do not want to be doing this.
  • Not Synced
    The fact that this system is better
    than that system
  • Not Synced
    doesn't make you want to use it.
  • Not Synced
    This is the wrong place to measure.
  • Not Synced
    This is where latency is irrelevant.
  • Not Synced
    How they behave past this point
    doesn't matter.
  • Not Synced
    What we should be doing is saying,
  • Not Synced
    "Well, then don't measure here.
    Let's look there."
  • Not Synced
    So if we zoom just at the 85k's on these
    two systems, okay, they're different.
  • Not Synced
    And now...
  • Not Synced
    The red and the blue alternate here,
    whatever that is.
  • Not Synced
    And now you look at this,
    and okay, it's better.
  • Not Synced
    But we're still in the wrong place,
    because we are 1.5% from hitting the pole.
  • Not Synced
    It is not where you will be
    running in production.
  • Not Synced
    It's not the interesting place
    to study latency.
  • Not Synced
    That's the place that if you're anywhere
    close to that, you should be on the phone
  • Not Synced
    getting more servers now, rather than
    trying to figure out the latency behaves.
  • Not Synced
    You know it's going to collapse if just
    a little bit of noise happens.
  • Not Synced
    What you should be doing is looking
    far away from the need,
  • Not Synced
    far away from that.
  • Not Synced
    For example, let's go to half the
    throughput that causes collapse,
  • Not Synced
    and see what things happen there.
  • Not Synced
    And here you can see,
  • Not Synced
    okay, these are two systems, and one
    of them does better.
  • Not Synced
    You can say that this percentile is better,
  • Not Synced
    that percentile, whatever these are.
  • Not Synced
    It is interesting, but what
    can we do with this?
  • Not Synced
    How do we tell our boss what this means?
  • Not Synced
    Or how do we translate this into,
    how many machines do I need?
  • Not Synced
    Now so far, I've been comparing
    things at the same throughput,
  • Not Synced
    and looking at latencies.
  • Not Synced
    And that's good for pass/fail kind of
    things, or getting quantitative things,
  • Not Synced
    but once you get to this point,
    you can start saying,
  • Not Synced
    "Wait, what if I do it at
    different throughputs?"
  • Not Synced
    How slow do I need to make this blue thing
    to make it look closer to the red thing,
  • Not Synced
    or the other way around.
  • Not Synced
    I don't want to move this fast to 3-L too,
    I want to move this to be there.
  • Not Synced
    For example, slow that one up by 4X,
    and look,
  • Not Synced
    the two 9's are actually starting
    to look similar.
  • Not Synced
    If you slow it by...
  • Not Synced
    So you can make a statement like this:
  • Not Synced
    The 99th percentile, if you had a goal
    like this,
  • Not Synced
    and now you've passed the goal,
  • Not Synced
    You'd say, "Both of them passed the goal,
    but system B does it at 4 times the load."
  • Not Synced
    That drives a choice, right?
  • Not Synced
    You can make a harsher goal, and say,
  • Not Synced
    I need the three 9's to be below
    10 milliseconds,
  • Not Synced
    so you'll slow these down even further.
  • Not Synced
    At this point, you can make this statement:
  • Not Synced
    If you want those, one of them is
    10 times better.
  • Not Synced
    Meaning, not that the system is
    10 times faster,
  • Not Synced
    but I can carry 10 times the load
    before I fail, before I have to pull.
  • Not Synced
    What I'm trying to demonstrate here,
    is that how much more, or not,
  • Not Synced
    you can can get out of a system depends
    on you're requirements,
  • Not Synced
    and whether or not you need to meet them.
  • Not Synced
    Without setting those requirements,
    looking at the percentile spectrum
  • Not Synced
    of response time, not service time,
  • Not Synced
    you'll never know how much you
    need or not.
  • Not Synced
    You can do a lot of other things,
    these are just demonstrations
  • Not Synced
    of how to look at data sets.
  • Not Synced
    You make measure at a lot of levels.
  • Not Synced
    You can look for systemic behaviors.
  • Not Synced
    For example, this is one system, but
    at varying levels.
  • Not Synced
    You can sort of see that as you increase
    the load, the percentiles move to the left.
  • Not Synced
    That's a good observation.
  • Not Synced
    It's not all systems that'll do it, but
    for this system it'll be that.
  • Not Synced
    You can also see that even though
    this didn't totally collapse,
  • Not Synced
    it's completely out of whack with
    the rest,
  • Not Synced
    so that kind of tells you let's not
    look there.
  • Not Synced
    So throw away the behavior...
  • Not Synced
    You just know not to go to 80.
  • Not Synced
    No need to study it much.
  • Not Synced
    Now that's the remaining set.
  • Not Synced
    You could look at that.
  • Not Synced
    You could look at the set from the other
    system and compare them.
  • Not Synced
    Maybe put them next to
    each other like this.
  • Not Synced
    Or if you actually can fit enough lines,
    with enough colors on a chart,
  • Not Synced
    you can try and do stuff like that.
  • Not Synced
    These are all good ways to actually
    look at latencies,
  • Not Synced
    actually study them.
  • Not Synced
    And notice that in all these cases,
    I didn't pick a number.
  • Not Synced
    "Oh, let's compare the 99.9 percentile,"
    because I won't get
  • Not Synced
    any feeling for the shapes if I did that.
  • Not Synced
    You want to look at the entire spectrum.
  • Not Synced
    And that is what and HDR histogram
    is very good for.
  • Not Synced
    So, you know... You get those.
  • Not Synced
    Now...
  • Not Synced
    Wow, we're actually doing okay on time.
  • Not Synced
    Now, this is one of my favorite ways to
    depict things.
  • Not Synced
    Remember I told you that if you don't plot
    the max, what are you hiding?
  • Not Synced
    It turns out that if you plot the max,
  • Not Synced
    usually it's the number one
    signal to look at over time,
  • Not Synced
    these are just those two systems.
  • Not Synced
    And with a simple visual, you get
    a great intuition.
  • Not Synced
    Same load, one of them's noisy, one's not.
  • Not Synced
    You can look at the response time
    and service time,
  • Not Synced
    and all of the numbers of different
    samples of percentiles,
  • Not Synced
    but if you actually want to show a CEO
    something,
  • Not Synced
    this is a pretty good thing to show them.
  • Not Synced
    "Look what I did over the weekend."
  • Not Synced
    Before the weekend it looked like that,
    and I fixed it."
  • Not Synced
    "I deserve a prize."
  • Not Synced
    With that, a simple thing to remember
    is that this is your load on system A,
  • Not Synced
    this your load on system B.
  • Not Synced
    Any questions?
  • Not Synced
    This is from an anti-drug commercial
    in the 80's,
  • Not Synced
    I don't know if anybody can remember.
  • Not Synced
    So with that, we're ready
    for any questions.
  • Not Synced
    Any questions?
  • Not Synced
    Wow, that bad?
  • Not Synced
    (Laughing) Dreadful.
  • Not Synced
    Okay, I have one here, and
    one back there.
  • Not Synced
    Let's start with the back.
  • Not Synced
    (Audience Member): You said that there are
    all these tools that you could use
  • Not Synced
    that give you reasonable numbers,
    and reasonable answers as far as
  • Not Synced
    latency is concerned, so what are those
    tools that you use?
  • Not Synced
    So the question was, there a couple of
    tools I mentioned that could give you
  • Not Synced
    better information, and I used some
    to chart here,
  • Not Synced
    let me see, there are a lot of tools.
  • Not Synced
    I used HDR histogram to plot
    all these charts
  • Not Synced
    with the continuous percentile curves.
  • Not Synced
    I highly recommend you look at using it.
  • Not Synced
    Just go to HDR histogram.org
    and read stuff.
  • Not Synced
    Or google it.
  • Not Synced
    There's a bunch of people using it.
  • Not Synced
    And the basic thing it does, is that it
    gives you a tool that
  • Not Synced
    allows you a practical way to have
    this kind of
  • Not Synced
    fidelity, dynamic range, and resolution
    to even look at the shapes.
  • Not Synced
    The other way to do it is to keep
    all the data.
  • Not Synced
    You don't have to have histograms
    if you kept every single result,
  • Not Synced
    but it many places that's not practical,
    or makes it harder for the system to run.
  • Not Synced
    If you can do that, that's even better.
  • Not Synced
    And then run it through an HDR histogram
    for analysis.
  • Not Synced
    So that's as far as viewing things.
  • Not Synced
    If you have data viewing it.
  • Not Synced
    Unfortunately, HDR histogram is not
    going to make the data good.
  • Not Synced
    It's just going to show you
    the data you have.
  • Not Synced
    One of the things I would highly
    recommend you try to do,
  • Not Synced
    I'm going backwards, and hopefully
    I'll hit what I wanted.
  • Not Synced
    I highly recommend you look at your
    data sets, and remember
  • Not Synced
    that in this visual,
  • Not Synced
    one strong tip I will give you, is that
    any time you see a vertical
  • Not Synced
    rise like that, you have a 99.9% chance
    of looking at coordinated omission.
  • Not Synced
    This is what coordinated omission
    looks like.
  • Not Synced
    There's a couple of other things that
    can also look like that.
  • Not Synced
    I haven't seen them in awhile, but
    I can make them artificially happen,
  • Not Synced
    so it's not conclusive that this is
    coordinated omission,
  • Not Synced
    but suspect it.
  • Not Synced
    Suspect it hard.
  • Not Synced
    So if you plot your data with
    coordinated omission,
  • Not Synced
    you will get a view of whether or not
    you have this other problem.
  • Not Synced
    But honestly, there's a much simpler
    way to do it.
  • Not Synced
    Run your control Z test, and see
    if you have the problem.
  • Not Synced
    This will just show you how it works.
  • Not Synced
    A non-omitted, a sane response time
    test, or latency test,
  • Not Synced
    tends to have these more smooth humps
    of curves transitioning between numbers.
  • Not Synced
    Any vertical rise tends to
    indicate omission.
  • Not Synced
    So that's one thing there.
  • Not Synced
    As far as the tools actually
    measuring correctly,
  • Not Synced
    remember I told you what the name of
    the talk is,
  • Not Synced
    so let me rattle off some tools.
  • Not Synced
    Actually, let's do this.
  • Not Synced
    You guys measure stuff here,
    I assume.
  • Not Synced
    Could you rattle off some tools
    that you use?
  • Not Synced
    What do you use for load generation
    and measurement right now?
  • Not Synced
    Volunteers?
  • Not Synced
    JMeter?
  • Not Synced
    Okay, JMeter.
  • Not Synced
    Gatling.
  • Not Synced
    Anybody else?
  • Not Synced
    Okay, anybody with Grinder, WRK,
    some of the commercial...
  • Not Synced
    Oh, well yeah.
  • Not Synced
    Gatling is the only tool I know of
    right now, that is an actual tool
  • Not Synced
    people use, not a demo, that has fixed
    a coordinated omission
  • Not Synced
    problem in its measurement.
  • Not Synced
    There was actually a bug filed against it,
  • Not Synced
    and the control Z edition in it
    was fixed.
  • Not Synced
    It is actually possible to perfectly
    fix this.
  • Not Synced
    You don't have to correct your guess,
    you can actually correctly compute
  • Not Synced
    the exact response time in any load
    generator on earth,
  • Not Synced
    if you just do it right.
  • Not Synced
    All the other tools,
  • Not Synced
    JMeter, Grinder, WRK,
  • Not Synced
    the commercial tools that I won't mention,
  • Not Synced
    they all do this wrong, unfortunately.
  • Not Synced
    Cassandra stress, YCSB.
  • Not Synced
    Oh, I'll take it back,
  • Not Synced
    there's one more tool.
  • Not Synced
    YCSB that is now on GitHub has been
    fixed by my colleague,
  • Not Synced
    and added a voidance of
    coordinated omission.
  • Not Synced
    So if you use the one on GitHub,
    YCSB is actually correct now.
  • Not Synced
    But all the previous parts are wrong.
  • Not Synced
    Now, that's the really bad news,
  • Not Synced
    because there are very few tools any of
    you use that actually work right.
  • Not Synced
    Somebody here mentioned WRK-2.
  • Not Synced
    WRK-2 is something I built by taking WRK,
  • Not Synced
    which is a really cool load generator and
    just fixing it's measurement technique,
  • Not Synced
    and adding a rate limiter to it, so you
    can actually measure without
  • Not Synced
    hitting the pole all the time.
  • Not Synced
    I did that as a demonstration,
  • Not Synced
    and the tool is useful probably.
  • Not Synced
    People actually use it, and I think that
    some people have actually
  • Not Synced
    forked it and went further with it.
  • Not Synced
    But I'm not maintaining it.
  • Not Synced
    I'm not paying much attention to it,
    so if you want to take it over, go ahead.
  • Not Synced
    It's not like I can tell you there's a
    good tool out there.
  • Not Synced
    I wouldn't endorse my own demo as
    the thing you need to
  • Not Synced
    run in production necessarily.
  • Not Synced
    For example, it's a little stale.
  • Not Synced
    I did this almost a year ago, and WRK
    added a lot of cool features since then,
  • Not Synced
    I wish was more merged.
  • Not Synced
    I similarly created Cassandra-Stress 2,
  • Not Synced
    which is what generated all of these,
  • Not Synced
    which Cassandra-Stress corrected.
  • Not Synced
    It reports both response time,
    and service time.
  • Not Synced
    What is used to call latency
    was service time.
  • Not Synced
    So you see the data, and this is
    what it would report,
  • Not Synced
    and now it reports both.
  • Not Synced
    Again, right now I would consider that
    a cool demo,
  • Not Synced
    not something I would say, "Go do this,
    and I'll help you if it goes wrong."
  • Not Synced
    I'm hoping to affect other people
    actually doing it right.
  • Not Synced
    And I'd appreciate any help from people
    that want to do that too.
  • Not Synced
    Unfortunately, to the original question,
  • Not Synced
    there are tools that'll show you the data,
  • Not Synced
    there are tools that'll hint that
    the data is really bad.
  • Not Synced
    As far as having better data,
  • Not Synced
    that's a much harsher answer.
  • Not Synced
    You're in trouble.
  • Not Synced
    That's the reality, and I'm sorry.
  • Not Synced
    There was a question up here?
  • Not Synced
    (Audience Member): "I'm actually having
    trouble phrasing that question,
  • Not Synced
    so I may just want to come talk to
    you after."
  • Not Synced
    Okay, so the question will be later.
  • Not Synced
    Any others?
  • Not Synced
    Okay, well thanks everyone.
  • Not Synced
    Hopefully this was useful.
  • Not Synced
    (Applause)
Title:
"How NOT to Measure Latency" by Gil Tene
Description:

more » « less
Video Language:
English
Team:
Captions Requested
Duration:
42:59

English subtitles

Incomplete

Revisions Compare revisions