-
Not Synced
So, um, the title of this talk is, "Jepsen IV:
Hope Springs Eternal."
-
Not Synced
There is in fact no Jepsen III. Don't go
looking for it. It doesn't exist. That
-
Not Synced
should tell you a little bit about what
we're going to discuss in the talk today.
-
Not Synced
My name is Kyle Kingsbury. You might know
me as "aphyr."
-
Not Synced
Um, I am scared and confused and excited
about computers all the time.
-
Not Synced
Uh, and today, I work at a new company
called Stripe.
-
Not Synced
Uh, we are a credit card company. Ah, we
help you sell goods and services via some
-
Not Synced
API, um, which is phenomenal, right?
Developers say, "Oh, I wanna take credit
-
Not Synced
cards. How do I do it? "Well, I could
build all this [PCI-complying]
-
Not Synced
infrastructure, or I could just make
[an HTP request] to some magical
-
Not Synced
rainbow in the sky, and everything will
work out just great.
-
Not Synced
We've all built APIs, right? They're not,
They're not real things. They're
-
Not Synced
abstractions. They're imaginary
constructs. So that's this rainbow up top.
-
Not Synced
That's our, that's our API, right?
But it's actually supported on this
-
Not Synced
steel girder framework, which is our code.
That's written in Ruby.
-
Not Synced
Um, and then that code rests on top of all
these libraries and the [Ruby VM].
-
Not Synced
Um, that's represented here by this big
pile of sticks. And those things in turn
-
Not Synced
rest on a pile of tires, which symbolize
our databases and [queues and
-
Not Synced
other associated systems]. Now does
anybody who's used a database--
-
Not Synced
Ah, how many of you have, actually?
Oh, very good. Less than I expected.
-
Not Synced
(Audience laughs as slide is changed}
-
Not Synced
Um, anybody who's used a database knows
everything is on fire all the time. Um,
-
Not Synced
and our job is to continue building the
tower higher and higher to try and get
-
Not Synced
away from the flames. Ah...
(Audience laughs again)
-
Not Synced
This is called the hierarchy of
abstraction. Ah, so as good information
-
Not Synced
hiders, we're trying to isolate the
terrible things happening down below
-
Not Synced
from the good things which our customers
think are happening.
-
Not Synced
Ah, and ideally it works. Ideally
customers continue to come to the
-
Not Synced
site. They make requests. Data flows
properly. Even though nodes internally are
-
Not Synced
failing and bad things are happening to
the network. That's the goal.
-
Not Synced
What kind of bad things am I talking about?
Well, if you've ever used a queue or a
-
Not Synced
database or any consistency system
whatsoever, ah, you know they can get into
-
Not Synced
split brain scenarios, where different
nodes believe different truths.
-
Not Synced
Or there might be foreign key problems,
where you go to look up a record and
-
Not Synced
it's gone. Or maybe you see read anomalies,
where you go to examine a record and,
-
Not Synced
ah, maybe you do a write, and you do a
read, and it's not there. Then you do
-
Not Synced
another read, and it comes back. You do
another, and it's gone. Right? Interesting
-
Not Synced
flip-flops in state can occur.
So, in the face of these problems, we want
-
Not Synced
to know, how likely are they to actually
affect our systems? Our--is our use of the
-
Not Synced
database correct? Is--is our software
likely to show these problems to users?
-
Not Synced
And the only way to do that is to measure
them. So Jepsen is a project I'm working
-
Not Synced
on to understand distributive systems
better. Ah, it's--it's code-based, it's
-
Not Synced
articles, it's talks, um, trying to put
pressure on vendors, trying to educate
-
Not Synced
users, uh, and figure out for myself what
consistency actually means.
-
Not Synced
In Jepsen, we take a systems-based
approach. Um, what is a system?
-
Not Synced
Well, anything you can put a box around.
Ah, which means pretty much everything
-
Not Synced
is a system. Ah, inside of the system
we're gonna have some interacting
-
Not Synced
components. Ah, they're gonna have
complicated and nuanced, um, things
-
Not Synced
happen between them. But our goal is that
if we draw a box around the system as a
-
Not Synced
whole, we should be able to characterize
what goes on inside the box, just based
-
Not Synced
on the interactions it has in its
environment. And hopefully these
-
Not Synced
interactions are simple. So at the edge,
when we're pushing things into a system
-
Not Synced
and taking things out, we should have
simple rules, like, maybe matter is
-
Not Synced
conserved, or maybe [format] is conserved,
or maybe if I put things into a cube they
-
Not Synced
come out again. Maybe if I write data into
a database, it should still be there in a
-
Not Synced
week. Those are simple, high-level rules.
And then internally, the database can do
-
Not Synced
whatever kind of locks and transformations
it wants to make those things happen.
-
Not Synced
Thinking about invariants for a minute--
like what sort of rules, what sort of
-
Not Synced
high-level properties could we want. We
might think about a simple program, right?
-
Not Synced
Like allocating a variable X set initially
to A will read the variable by [printing
-
Not Synced
it, that prints out] A, right, then we'll
say X equals B, print X. We all
-
Not Synced
know what this does, right? We've written
a program like this before.
-
Not Synced
What you think it does, is we start off
with a state A and we read it. And then
-
Not Synced
we'll change the state to B by performing
a write. And then we'll read B.
-
Not Synced
So we'll print A and then B.
Straightforward enough.
-
Not Synced
But this is not the only possible
interpretation of this program.
-
Not Synced
We could read B. We could read A. We could
do anything. The program doesn't have to
-
Not Synced
obey any laws unless we tell it to. So the
invariants in our head map the program's
-
Not Synced
structure--the sort of lexical structure
of the process--to the behaviors
-
Not Synced
which it's allowed to undergo.
Invariants are constraints that
-
Not Synced
apply to the history of the system. Like
what--what possible orderings of states
-
Not Synced
could we see? What possible ways could the
operations come together?
-
Not Synced
We might think about not just a single
thread of program, but a concurrent one,
-
Not Synced
where multiple processes interact with the
state at the same time. So here the top
-
Not Synced
process, ah, reads A and then B. That's--
that's crazy! Why did it change? Well,
-
Not Synced
it changed because a process on the bottom
performed a write with the value B. So
-
Not Synced
individual processes may not have the
whole story. But if we unify the whole
-
Not Synced
history from every process, we should get
something that follows the same rules.
-
Not Synced
But it's a little more complicated than
that because our systems aren't just
-
Not Synced
concurrent. They're also distributed. Now,
distributed systems have some
-
Not Synced
characteristic difference or some
characteristic distance between nodes.
-
Not Synced
It takes time for messages to propagate.
If I want to write a value to a register
-
Not Synced
in [D-Ram], I have to go off of the CPU,
into the memory controller, through a
-
Not Synced
bunch of caches, back onto some wire, like
eleven whole centimeters over in the
-
Not Synced
D-Ram, into the actual bit, and it flips.
And that takes, like, what?
-
Not Synced
A nanosecond? So then I have to come all
the way back, and that takes more time.
-
Not Synced
So whenever we're manipulating state
somewhere, it's gonna take a little bit of
-
Not Synced
time to go back and forth. And that time
means that things might be ambiguous.
-
Not Synced
We could see different orders. For
example, if I do a write of B, and I
-
Not Synced
concurrently begin a read, and then the
write completes, and the read completes,
-
Not Synced
one possible ordering, depending on how
those messages propagated, is that I see
-
Not Synced
the value B because the read arrives after
the write. But it could also see A if the
-
Not Synced
read arrives first. A or B. A or B. All
that changes is the ordering of the
-
Not Synced
messages. In an asynchronous network,
like the ones that we're usually working
-
Not Synced
with in IP, ah, we don't get to control
what those ordering are. So both,
-
Not Synced
both histories are valid. We have to
consider all possible interweavings with
-
Not Synced
the current operations. But it's not
infinitely bad because we're not allowed
-
Not Synced
to break the laws of physics. We can't go
back in time. If we're actually talking to
-
Not Synced
a single point of truth somewhere in the
system, then the earliest possible state
-
Not Synced
I could interact with is the one just
after I send my request. And the latest
-
Not Synced
possible state is the one just before I
receive a response. So you get some window
-
Not Synced
between invocation and response.
And that's the range of states we can
-
Not Synced
interact with. This property is
called linearizability.
-
Not Synced
And it's kind of the "gold standard" for,
um, both distributed and concurrent
-
Not Synced
systems. You want to have, ah, things like
atomic registers, like [u-textes], that
-
Not Synced
give you real-time guarantees on the
states you'll interact with.
-
Not Synced
And in u-text, when you lock it, nobody
else gets to claim it until you release it.
-
Not Synced
That's a really nice, strong property.
So when linearizability means that any
-
Not Synced
operations that are not concurrent are
ordered back to each other. It's not a
-
Not Synced
total order, right? We can still have
those windows of concurrency where
-
Not Synced
you could get A or B. But that only
happens when two operations overlap
-
Not Synced
in time. And that means that if I do a
write and the write completes and then
-
Not Synced
I begin a read, because those two aren't
concurrent, the read is guaranteed to see
-
Not Synced
the write, or some later state. So
everybody's gonna agree on the same order
-
Not Synced
events. We all see A, B, C, D. Everbody's
going to agree on when an operation is
-
Not Synced
visible. Once you've completed an
operation, you know everybody else is
-
Not Synced
gonna see it. And this means we get nice
invariants like no stale reads, like
-
Not Synced
mutual exclusion. Lots of systems rely on
these properties. But we don't have to be
-
Not Synced
that strong. This is a really rigid thing
to do. We could relax the time
-
Not Synced
constraints. We could let you go
backwards and forwards in time.
-
Not Synced
You could read something from the past and
write something from the future.
-
Not Synced
Well, how do you write to the future?
Well, you put your message in a
-
Not Synced
bottle and throw it in the ocean and
somebody else picks it up. Ah, if your
-
Not Synced
ocean is well-ordered, as mine is, um,
then this works just fine. Everything
-
Not Synced
arrives in order at your recipient, ah,
maybe on a different coast, and they apply
-
Not Synced
your operations, and they see the same
history of messages. They just see them a
-
Not Synced
week later. So all participants are going
to agree on the same order events.
-
Not Synced
They'll all see A, B, C. But they might
not agree on what time those things
-
Not Synced
happened. We can relax that invariant
still further. If you're thinking about a
-
Not Synced
transactional system, um, with lots of, um
different cells interacting, maybe like an
-
Not Synced
[SQL database], what do you give a
property like serializability?
-
Not Synced
Where we say that every transaction has
to fit somewhere into a single linear
-
Not Synced
history. But we don't have agree on when
it happened or even in what order.
-
Not Synced
Just as long as you can squeeze the
transaction somewhere.
-
Not Synced
So, in a serializable system, operations
happen in an order. We don't know
-
Not Synced
what it is. It's perfectly legal, for
example, in a register, to put all of
-
Not Synced
your writes to time infinity and all of
your reads to time zero and just have
-
Not Synced
[dev null] be your database. Um, that
doesn't seem very useful, right?
-
Not Synced
But it's a sort of trivial solution. If
you have a more complex system,
-
Not Synced
like an SQL database, where you're using
something like, "set this row, this
-
Not Synced
particular column, to be three if and
only if this thing plus that thing is
-
Not Synced
equal to this other thing when the time
is whatever. Those constraints are
-
Not Synced
actually a lot more difficult to preserve.
And so you get a stronger invariant out
-
Not Synced
of it. Finally we don't have to just have
a single line of history. We could
-
Not Synced
actually have a forking, diverging
history. In eventually consistent
-
Not Synced
systems, the constraint is that things
have to come together somehow.
-
Not Synced
So if we have a counter that's value is
zero, that counter could diverge. We could
-
Not Synced
have two distinct copies of it. And each
one of them could increment separately
-
Not Synced
and return one. And reads would show
one for a while. And then when the counter
-
Not Synced
has exchanged information, the value will
converge on two. If we do everything
-
Not Synced
right. So, eventually consistent systems
have to converge, and what that
-
Not Synced
convergence does, doesn't just mean they
have to agree on some value. Ideally they
-
Not Synced
should converge on the correct value. So
the correct value might be the most
-
Not Synced
recently-written value in the register.
The correct value might be a number
-
Not Synced
of increments that have occurred. It
depends on the semantics of the data
-
Not Synced
type that you're working with. Finally we
could imagine the variance to just order
-
Not Synced
two operations. Like a write and write or
a write and a read. And we get this family
-
Not Synced
of monotonic writes, monotonic reads,
writes follow reads, reads [yield] writes.