Next talk
Chris and Holger are going to talk to us again
about reproducible builds and tell us
where they're up to.
Thanks very much
The outline of this talk is from last year
we realised there were a lot of questions.
The rough plan is to quickly go over
what reproducible builds are
I guess everyone is up to speed
but getting everyone on the same page
would be a good idea.
Then Holger's going to jump in
and give the status update
and then we're going to talk about
future work, questions etc
What is the actual problem we're
solving here?
You can always inspect the source code of
free software for malicious flaws
or just flaws as well.
Unfortunately distributions provide
precompiled binaries to end users.
So can you actually trust this
compilation process has not
introduced flaws of its own?
The problem is it seems very effective if
you want to go after end users
you can go after developers.
Because if you go infect a developers
machine you will then infect all the
users of the software they generate.
Financial incentives. There always were
but they are even more so these days
with mobile phone etc.
You can also have very subtle flaws.
This one in particular there was a
root exploit in OpenSSH just by changing
a compare equal.
That sort of assembler jump thing and it
gives you root
but with only a single bit difference in
the binary.
Which is not to shabby.
Then you have all sorts of cute demos
where you load up the source code in VIM
and it just looks like 'Hello world' but
when you compile it with GCC
your kernel rootkit just goes 'oh I'm
going to give you a different file'
and self replicates of them like that.
Difficult to trust the process.
And there's some recent history as well
around Xcodeghost and iOS
and adverts and things like that.
You can Google those things.
Really scary stuff.
The last example is actually coming from
a CIA design paper from 2012.
Which was then found in the wild in 2014.
So these exploits are actually happening.
People are targeting developers to get
users.
Xcodeghost had 20 milllion user
installations.
It was probably not the CIA or NSA but
we don't know who it was.
There are many people who do these
exploits in the wild.
Yeah it's not just 'Here's this cute
thing we can talk about'.
It's actually happening.
The motivation is to ensure no flaws are
introduced during the build process.
We do this by ensuring the build always
produces identical results.
Then multiple parties do the same thing.
I build it, you build it, your friends
build it etc
An an attacker would need to infect
everyone simultaneously
otherwise they'd be detected.
For example if my machine was compromised
I would suddenly come up with a
different result.
I would come up with different binaries.
And you'd be 'what's going on here' and
eventually we would discover
that my machine was rootkitted etc.
You probably know it but identically
means bit by bit identical.
As that is really the same.
Yeah, bit, SHA, MD5 whatever you want.
There are a bunch of challenges here.
The biggest one is timestamps.
A lot of software just loves to include
timestamps everywhere.
Documentation, underscore underscore date
and underscore underscore time macros
Just all over the place, in file names etc
Things like that.
Builds often vary by locale and timezone.
Different new lines, different sorting
orders for example collations.
Different versions of libraries. I'm not
sure what this refers to exactly.
Moving on.
Non-deterministic file ordering for
example Shell Globs are not really defined
to be, I say not really defined they
aren't defined to come out in normal order.
Also read syscall, it doesn't actually
promise any particular ordering.
Dictionary/hash key ordering. So this is
in things like Perl and python
you use a key or a hash. If you iterate over the keys with that it's a non-determinative order.
If your build system loops over such a
hash or a dictionary
then the results from this build could be
non-reproducible and non-deterministic.
And also things like files in the part of
the build process will just adsorb
stuff from the surrounding environment
like umask and all that kind of
stuff that lives outside there.
Build paths is a very interesting one which
we cover in greater detail on another slide.
Also specifying the environment, we'll also
cover this one in the build info slides.
So not only are there privacy and security
advantages of using,
moving towards reproducible builds there
are also technical advantages.
It's faster to build if you basically
keep hitting cache.
I'm pretty certain this is why Google are
interested in it.
Because of the amount of
compilation they do
they're just going to save a whole bucket
load of money just by
'Oh we don't need to rebuild this because
it's the same SHA' etc
It's very nice to test revisions and
changes I use all out tools
when doing QA uploads or NMUs you
rebuild a package
and then you compare to the previous one.
And as the only things that have changed
should be the things that you've changed,
there haven't been all sorts
of random other nonsense being
reorderd with timestamps added.
You can get rid of all that noise and
just be 'oh yeah brilliant I can see that
the patch I've applied here has actually
changed the behaviour of the program'
and only that. It hasn't done all sorts
of wierd wierd stuff.
So you have safer uploads in that sense.
Speaking of safety a reproducible build
won't go talking to the Internet
like a lot of modern package managers
like to do. Mathen style ones.
Also a reproducible build will typically
not have any
non-deterministic failure modes.
So there's a lot of tests and test suites
in Debian that will
try and test things like 'Oh is this
algorithm N squared or bigger than N'.
And it will try doing that by running some sort of bench mark
and fail if it doesn't meet some sort of
arbitrary time difference and
that's obviously that's not reliable. So we
get rid of all those nonsense things.
It also finds bugs in really weird
locales. We build in French,
Swiss-French, and it just comes up with
all sorts of nonsense.
Or timezones, if you build in UTC-12 then
this date library doesn't work anymore
and it's like 'you had one job
to be a date library'.
[audience laughter]
It's pretty scary and some pretty
cute bugs.
It also detects if your machine is, you
just have a broken ??? [8:28].
We build a year and a month in the
future. You find things like the
maintainer has added a pre-generated SSL
certificate to their tests and
it expires in the year. And so it breaks.
We're preemptively detecting that fail to
??? [9:01] source.