-
Thank you all for coming
-
and before I forget
-
thank you very much to Midwest
-
for having me back
-
I was privileged enough to speak
-
at their first iteration of this last year
-
and I had a great time
-
and was excited to submit again
-
although I told them not to pick me
-
because I had been here last year
-
and I figured they should branch
-
out a bit but here I am nonetheless
-
So we're here to talk about time today
-
and, specifically, we're not talking about
-
time zones and leap years and leap seconds
-
and September 1752 and Easter
-
and the other Easter
-
That's a different talk
-
In fact that's probably a whole
series of talks
-
We are here to talk today
about logical time.
-
I am John Daily
-
and I work for a company
named Basho Technologies
-
We write a distributed
database called Riak
-
and distributed databases are a big place
-
where time and causality come into play
-
thus I've been exposed to a lot of
this theory in practice
-
a lot of this theory in practice
of the last few years
-
Quick note: lunch is at the end of my talk
-
I will try not to get between you at lunch
-
So if I seem to be going a little fast
-
it's because I don't want to be
between you and lunch
-
but please feel free to interrupt
-
especially if I jump the rails
-
I'm very good at tangents
-
It's very easy to get me off track
-
Pop up, ask questions
-
Otherwise, what I plan on doing is
-
if you have general questions that
-
don't involve 'Hey John, I have no idea
-
what you're talking about'
-
try to save them for the end
-
and we can talk after while other people
-
are free to go get their lunch
-
and we can talk about your questions.
-
So, the question is
-
Why are we here at all?
-
I appreciate everyone who bothered to show up
-
So, causality and co-ordination are
-
such trivial concepts on single
machine applications
-
Something you'd never even think about really
-
However, as soon as you introduce
-
a second machine, and a third machine
-
causality and co-ordination become much more complicated
-
So complicated in fact that
-
we've been at this for 40 years or so
-
trying to address these problems
or even longer
-
and we're still finding new areas
-
of research and new ways to improve
-
our handling of this issue.
-
If I can use an astrophysics analogy here,
-
this is much like the two-body vs. three-body
-
problem in physics.
-
Two-body is trivial. Three-body's a whole mess.
-
In computers, one server is fine.
-
Two servers are more of a challenge.
-
Let's start by talking about the
-
simplest, non-trivial problem I can come up with.
-
Two clients, two servers
-
Key value store
-
and two concurrent attempts
-
to set the same value.
-
Or where they set the same key with different values
-
The question here is, who would win?
-
Now, you may say 'John, this is obvious'
-
'The last one to arrive would win.'
-
And that's a simple enough heuristic.
-
We can certainly apply that.
-
In this case, the 2:42pm
write would arrive.
-
After the 2:37 arrived, we'd say
-
'Well the balance must be $300.'
-
So then the question becomes
-
'Given that heuristic, what happens
-
when both arrive at the same time?'
-
RFC677,
-
one of those internet RFC's
-
you've never read,
-
is written by BBN, my former
-
employer way back in the day,
-
1975? or 73? Somewhere on that time frame.
-
They came up with a
-
set of rules for keeping a distributed
-
key value store in sync
-
and one of those rules is that
-
you implement a tie-breaker here.
-
Each object, each operation in your system,
-
is tagged with a logical time stamp.
-
It's tagged with non only the physical time stamp
-
but also the name of the server
-
and you pick an ordering on your servers,
-
and that one wins.
-
So, in this case, B is greater than A.
-
We pick the write that arrived on
server B
-
We call that the winner.
-
One of the questions that will arise
-
out of this is
-
when it comes to business logic
-
'Is just blindly picking the
-
last write to arrive the right choice?'
-
We'll talk more about that.
-
So there are any number of lists on the internet
-
of things that programmers believe that
-
are not true.
-
This is a subset of things they believe
-
about time that are not true.
-
I have personally been bitten by
-
several of these but my number one
-
was number 16
-
'Time on the server clock and time
-
on the client clock would never be
-
different by a matter of decades'
-
YES
-
(laughter)
-
Astonishingly enough.
-
My very first IT job, I was
-
among other things supporting
Windows 3.1
-
clients connecting up through dial-up.
-
And we had a customer in Northern Indiana,
-
and some of their desktops refused
-
to connect to our system.
-
And so I was setting up in Chicago anyway
-
so I dropped in and tried to figure out
-
what in the world's going on.
-
It turned out that some of their desktops
-
had somehow been set at least
ten years
-
in the future.
-
I don't remember the exact date
-
but it was far enough advanced that
-
DOS could not accurately represent
the date.
-
It was 2000-and-something
-
and nobody was quite sure what it was.
-
I figured it was at least ten years.
-
We reset the clocks back.
-
Everything worked fine.
-
So yes, server clocks, client clocks are rarely in sync.
-
Server clocks themselves are rarely in sync.
-
And this is a vital problem we're trying to solve.
-
You may say, 'We've got NTP,
-
why are we talking about this at all?'
-
Fact of the matter is,
-
NTP is a great tool
-
and like all tools, it fails.
-
Firewall rules are set up wrong.
-
Demons are not configured to set up a boot.
-
Demons give up if synchronisation
-
gets drifted too far.
-
NTP can also roll your clock backwards
-
which is a terrible, terrible thing.
-
We'll talk more about that.
-
Yes, I have heard of Spanner.
-
For anyone who hasn't,
-
Google published a paper on their
-
time synchronisation strategies
-
which involved bajillions of dollars,
-
specialised hardware, atomic clocks,
GPS clocks,
-
highly reliable networks,
-
their own server hardware.
-
All this for the goal of
-
trying to synchronise clocks across
-
their entire environment.
-
Even their true time API,
which they supply,
-
still gives you an error range
-
any time you ask for the current time.
-
Even Google throwing all these problems
-
still doesn't know what time it is.
-
Time is a hard problem.
-
This is not something that's
trivially solved.
-
Even with all the money and resources
-
in the world.
-
Lesley Lamport
-
Who here has heard of Lesley Lamport?
-
Okay, good percentage.
-
I'm embarassed to admit
-
the only reason I knew about Leslie
-
Lamport before this job was LaTeX.
-
He is the LA in LaTeX
-
which is the macro
-
programming language
-
for tech that a lot more of people use
-
Lesley Lamport is a lot
-
than the author of the tech LaTeX.
-
He's an Turing award winner
-
and he is a distributed
systems pioneer
-
In the mid-seventies he
wrote this paper
-
which has been sighted more than
-
all but one other computer science paper
-
in the history of computer science.
-
Time, Clocks and the
Ordering of Events in
-
a Distributed System.
-
And Lesley recognized that hardware
-
clocks were bad.
-
Of course, in the 70s I would imagine
-
hardware clocks were even worse
-
than they are today.
-
There was no NTP yet
and he was looking at
-
the general problem of
-
"How do you distinguish time in an
-
intelligible way under
-
the distributed system?"
-
Obviously enough.
-
And he had a key realisation.
-
In this case we have three
-
discreet processes.
-
Those processes are
characterized by
-
internal events.
Those internal events
-
are all running under
same hardware so
-
within each process you can
-
trivially order the
sequence of events.
-
On one computer we know
-
what happened first.
-
That's easy enough.
-
Between processes
-
the communication is messages.
-
And Lesley realized that these messages
-
provide us an event horizon
-
or event boundary.
-
A point in time where we can say
-
P1 happened before Q2 because
-
P1 sent a message and it arrived
-
at queue at Q2.
See you can not send
-
a message after it arrives, right?
-
We have some basic laws of physics
-
even though physics self-struggles
-
to tell what time is it exactly.
-
We do know that messages arrive
-
after the're sent.
-
That provides a strict ordering.
-
We know that P1 and Q1 are concurrent
-
not necessarily parallel.
-
Parallel implies things are happening
-
at the same time.
-
Concurrent just says they happened and
-
we don't know which happened first.
-
That's one primitive definition of
-
the two terms.
-
P1 and Q1 are concurrent.
-
We can't really see anything about their
-
ordering other than they happened
-
maybe roughly the same time.
-
But regardless Q1 happened before Q2 ,
-
P1 happened before Q2.
-
So, Lesley builds up this system in order
-
to share a global clock.
-
That global clock is no longer
-
a physical time stamp.
-
It's a simple counter.
-
Each time you send a message you send
-
your counter along with it and
-
the process that receives that message,
-
if its counter is behind yours because
-
it hasn't committed enough events
-
in its timeline,
-
it will advance that counter.
-
So in this case A is on 7,
B is on event 4.
-
When A sends its message,
B picks it up,
-
says "Well, A is in 7 events so I need to
-
move mine at least 8."
-
There is a vital vital word that
-
does not appear in Lesley's document but
-
shows up in a lot of other papers
-
around clocks and time and logical time.
-
Monotonicity in this context means -
-
Time always goes forward.
-
Time should never go backwards.
-
Obviously, when those three won
-
I'd have to set clocks back in time.
-
You don't want to do that
for database server.
-
If you set a relational database
-
that's actively processing transactions,
-
if you rewind that clock
-
bad things happen.
-
Never set a clock backwards.
-
Has anyone actually done that and
-
then been burn by it? Out of curiosity.
-
Hey, we have one lucky here.
-
I've never been burned, lucky enough
-
but I have to admit in my midspan youth
-
I would in fact set a clock backwards
-
cause I didn't know any better.
It's a bad idea.
-
All right, 1988.
-
Lamport's ideas have been
around for a bit.
-
We'll actually rewind to an earlier paper
-
here in a few minutes.
-
Colin Fidge in Australia takes
-
Lamport's ideas further
-
and he defines the concept
of vector clocks.
-
So, each process in your system
-
tracks a global state
-
not in terms of a SQL counter
-
but it's the last time stamp it knows
-
for each of the other processes
in the system.
-
And every time it sends a message,
-
it will share that global state
-
with whatever processes on the other end.
-
And the remote process can then
-
take a look at that compare to
-
it's notion of what the timestamps are
-
and push everything forward.
-
So, in this case,
-
B did not know that A had
-
advanced to event 6
-
so it pushes 6 forward
-
B's own event counter gets incremented
-
because the receipt of a message
is another event
-
in it's lifecycle.
-
and it pushes C's
-
clock forward to 13
-
because C sent the message
-
and that 'sending a message' constitutes
-
another event.
-
So, B can go ahead and advance
up to 13.
-
So now, at the...
-
Say we're troubleshooting or we're trying
-
to understand how the system
arrive in the state
-
We have a much more comprehensive history
-
and we can do a better job of analyzing
-
what the order of events across
the system was
-
even if the physical time stamps
-
themselves did not align.
-
Now, you may be asking yourself,
-
"How would I troubleshoot log files
that have
-
these monotonically
incrementing counters, right?"
-
"What does 12 mean?"
-
"What time did that happen?"
-
"Was there a solar flare yesterday
at 1:00?"
-
"Maybe I should be looking for something there?"
-
So, Strange Loop...
-
Anyone been to Strange Loop?
-
I hope some of you have...
-
Excellent.
-
So Strange Loop in St. Louis every year,
-
one of the best technical conferences
-
you will ever attend,
-
John Moore from ComCast gave a
-
great talk this year about clocks.
-
Specifically about hybrid clocks,
-
trying to marry logical clocks
-
and physical time stamps and
-
use that marriage in order to
-
A) keep physical clock better in synched
-
across your cluster, and
-
B) have something that operators can
-
actually look at in a log file and
figure out
-
what in the world is going on.
-
So, physical timestamps are
tremendously useful.
-
But, they're not the focus of today's talk.
-
Do, you may have noticed that
-
I pulled a fast one on you.
-
I started by talking about the problem of
-
data synchronization, of choosing a winner
-
in conflicting rights.
-
And I jumped to global-state order.
-
It turns out that these are very
similar problems,
-
they have very similar solutions, but they
-
are, in fact, separate streams of research
-
for the most part, although people
-
are trying to unify the two now.
-
So, let's go back in time a little
bit to 1983.
-
A team from UCLA was writing a network
-
operating system called LOCUS.
-
And LOCUS's two big ideas were this,
-
you take all the computers in the network,
-
and you expose a global process state
-
and a global file system state.
-
So, you can manipulate processes
-
on any machine in your network
-
and you can edit files that are stored on
-
any machine in your network.
-
Incidentally, I tried to work on this
-
problem many years ago and we got nowhere
-
near as far as they did.
-
There are some very hard problems
to solve here
-
So, this paper
-
was specifically focused on the problem
-
of network partitions.
-
The paper itself introduces
version vectors,
-
not to be confused with vector clocks,
-
although many people do.
-
Including REAC, our database
-
got that name wrong. We use version
-
vectors, but we call them vector clocks.
-
As importantly, it talks about what
-
partitions actually mean for a network.
-
Now, one of those myths that programmers
-
believe about networks is that
-
network partitions are rare.
-
That could not be further from the truth.
-
Network partitions are evil, they are
-
pernicious, they are ubiquitous.
-
And they're extremely difficult
to troubleshoot.
-
Network partitions don't have
to be bi-directional.
-
YOu can have Server A able to talk to B
-
and B can't talk to Server A.
-
Good luck troubleshooting that one.
-
And good luck writing the software that
-
can account for that, right?