WEBVTT

00:00:00.800 --> 00:00:03.924
So, I lead a team at Google
that works on machine intelligence;

00:00:03.948 --> 00:00:08.598
in other words, the engineering discipline
of making computers and devices

00:00:08.622 --> 00:00:11.041
able to do some of the things
that brains do.

00:00:11.439 --> 00:00:14.538
And this makes us
interested in real brains

00:00:14.562 --> 00:00:15.851
and neuroscience as well,

00:00:15.875 --> 00:00:20.047
and especially interested
in the things that our brains do

00:00:20.071 --> 00:00:24.113
that are still far superior
to the performance of computers.

NOTE Paragraph

00:00:25.209 --> 00:00:28.818
Historically, one of those areas
has been perception,

00:00:28.842 --> 00:00:31.881
the process by which things
out there in the world --

00:00:31.905 --> 00:00:33.489
sounds and images --

00:00:33.513 --> 00:00:35.691
can turn into concepts in the mind.

00:00:36.235 --> 00:00:38.752
This is essential for our own brains,

00:00:38.776 --> 00:00:41.240
and it's also pretty useful on a computer.

00:00:41.636 --> 00:00:44.986
The machine perception algorithms,
for example, that our team makes,

00:00:45.010 --> 00:00:48.884
are what enable your pictures
on Google Photos to become searchable,

00:00:48.908 --> 00:00:50.305
based on what's in them.

00:00:51.594 --> 00:00:55.087
The flip side of perception is creativity:

00:00:55.111 --> 00:00:58.149
turning a concept into something
out there into the world.

00:00:58.173 --> 00:01:01.728
So over the past year,
our work on machine perception

00:01:01.752 --> 00:01:06.611
has also unexpectedly connected
with the world of machine creativity

00:01:06.635 --> 00:01:07.795
and machine art.

NOTE Paragraph

00:01:08.556 --> 00:01:11.840
I think Michelangelo
had a penetrating insight

00:01:11.864 --> 00:01:15.520
into to this dual relationship
between perception and creativity.

00:01:16.023 --> 00:01:18.029
This is a famous quote of his:

00:01:18.053 --> 00:01:21.376
"Every block of stone
has a statue inside of it,

00:01:22.036 --> 00:01:25.038
and the job of the sculptor
is to discover it."

00:01:26.029 --> 00:01:29.245
So I think that what
Michelangelo was getting at

00:01:29.269 --> 00:01:32.449
is that we create by perceiving,

00:01:32.473 --> 00:01:35.496
and that perception itself
is an act of imagination

00:01:35.520 --> 00:01:37.981
and is the stuff of creativity.

NOTE Paragraph

00:01:38.691 --> 00:01:42.616
The organ that does all the thinking
and perceiving and imagining,

00:01:42.640 --> 00:01:44.228
of course, is the brain.

00:01:45.089 --> 00:01:47.634
And I'd like to begin
with a brief bit of history

00:01:47.658 --> 00:01:49.960
about what we know about brains.

00:01:50.496 --> 00:01:52.942
Because unlike, say,
the heart or the intestines,

00:01:52.966 --> 00:01:56.110
you really can't say very much
about a brain by just looking at it,

00:01:56.134 --> 00:01:57.546
at least with the naked eye.

00:01:57.983 --> 00:02:00.399
The early anatomists who looked at brains

00:02:00.423 --> 00:02:04.230
gave the superficial structures
of this thing all kinds of fanciful names,

00:02:04.254 --> 00:02:06.687
like hippocampus, meaning "little shrimp."

00:02:06.711 --> 00:02:09.475
But of course that sort of thing
doesn't tell us very much

00:02:09.499 --> 00:02:11.817
about what's actually going on inside.

NOTE Paragraph

00:02:12.780 --> 00:02:16.393
The first person who, I think, really
developed some kind of insight

00:02:16.417 --> 00:02:18.347
into what was going on in the brain

00:02:18.371 --> 00:02:22.291
was the great Spanish neuroanatomist,
Santiago Ramón y Cajal,

00:02:22.315 --> 00:02:23.859
in the 19th century,

00:02:23.883 --> 00:02:27.638
who used microscopy and special stains

00:02:27.662 --> 00:02:31.832
that could selectively fill in
or render in very high contrast

00:02:31.856 --> 00:02:33.864
the individual cells in the brain,

00:02:33.888 --> 00:02:37.042
in order to start to understand
their morphologies.

00:02:37.972 --> 00:02:40.863
And these are the kinds of drawings
that he made of neurons

00:02:40.887 --> 00:02:42.096
in the 19th century.

NOTE Paragraph

00:02:42.120 --> 00:02:44.004
This is from a bird brain.

00:02:44.028 --> 00:02:47.085
And you see this incredible variety
of different sorts of cells,

00:02:47.109 --> 00:02:50.544
even the cellular theory itself
was quite new at this point.

00:02:50.568 --> 00:02:51.846
And these structures,

00:02:51.870 --> 00:02:54.129
these cells that have these arborizations,

00:02:54.153 --> 00:02:56.761
these branches that can go
very, very long distances --

00:02:56.785 --> 00:02:58.401
this was very novel at the time.

00:02:58.779 --> 00:03:01.682
They're reminiscent, of course, of wires.

00:03:01.706 --> 00:03:05.163
That might have been obvious
to some people in the 19th century;

00:03:05.187 --> 00:03:09.501
the revolutions of wiring and electricity
were just getting underway.

00:03:09.964 --> 00:03:11.142
But in many ways,

00:03:11.166 --> 00:03:14.479
these microanatomical drawings
of Ramón y Cajal's, like this one,

00:03:14.503 --> 00:03:16.835
they're still in some ways unsurpassed.

NOTE Paragraph

00:03:16.859 --> 00:03:18.713
We're still more than a century later,

00:03:18.737 --> 00:03:21.562
trying to finish the job
that Ramón y Cajal started.

00:03:21.586 --> 00:03:24.720
These are raw data from our collaborators

00:03:24.744 --> 00:03:27.625
at the Max Planck Institute
of Neuroscience.

00:03:27.649 --> 00:03:29.439
And what our collaborators have done

00:03:29.463 --> 00:03:34.464
is to image little pieces of brain tissue.

00:03:34.488 --> 00:03:37.814
The entire sample here
is about one cubic millimeter in size,

00:03:37.838 --> 00:03:40.459
and I'm showing you a very,
very small piece of it here.

00:03:40.483 --> 00:03:42.829
That bar on the left is about one micron.

00:03:42.853 --> 00:03:45.262
The structures you see are mitochondria

00:03:45.286 --> 00:03:47.330
that are the size of bacteria.

00:03:47.354 --> 00:03:48.905
And these are consecutive slices

00:03:48.929 --> 00:03:52.077
through this very, very
tiny block of tissue.

00:03:52.101 --> 00:03:54.504
Just for comparison's sake,

00:03:54.528 --> 00:03:58.320
the diameter of an average strand
of hair is about 100 microns.

00:03:58.344 --> 00:04:00.618
So we're looking at something
much, much smaller

00:04:00.642 --> 00:04:02.040
than a single strand of hair.

NOTE Paragraph

00:04:02.064 --> 00:04:06.095
And from these kinds of serial
electron microscopy slices,

00:04:06.119 --> 00:04:11.127
one can start to make reconstructions
in 3D of neurons that look like these.

00:04:11.151 --> 00:04:14.308
So these are sort of in the same
style as Ramón y Cajal.

00:04:14.332 --> 00:04:15.824
Only a few neurons lit up,

00:04:15.848 --> 00:04:18.629
because otherwise we wouldn't
be able to see anything here.

00:04:18.653 --> 00:04:19.965
It would be so crowded,

00:04:19.989 --> 00:04:21.319
so full of structure,

00:04:21.343 --> 00:04:24.067
of wiring all connecting
one neuron to another.

NOTE Paragraph

00:04:25.293 --> 00:04:28.097
So Ramón y Cajal was a little bit
ahead of his time,

00:04:28.121 --> 00:04:30.676
and progress on understanding the brain

00:04:30.700 --> 00:04:32.971
proceeded slowly
over the next few decades.

00:04:33.455 --> 00:04:36.308
But we knew that neurons used electricity,

00:04:36.332 --> 00:04:39.268
and by World War II, our technology
was advanced enough

00:04:39.292 --> 00:04:42.098
to start doing real electrical
experiments on live neurons

00:04:42.122 --> 00:04:44.228
to better understand how they worked.

00:04:44.631 --> 00:04:48.987
This was the very same time
when computers were being invented,

00:04:49.011 --> 00:04:52.111
very much based on the idea
of modeling the brain --

00:04:52.135 --> 00:04:55.220
of "intelligent machinery,"
as Alan Turing called it,

00:04:55.244 --> 00:04:57.235
one of the fathers of computer science.

NOTE Paragraph

00:04:57.923 --> 00:05:02.555
Warren McCulloch and Walter Pitts
looked at Ramón y Cajal's drawing

00:05:02.579 --> 00:05:03.896
of visual cortex,

00:05:03.920 --> 00:05:05.482
which I'm showing here.

00:05:05.506 --> 00:05:09.948
This is the cortex that processes
imagery that comes from the eye.

00:05:10.424 --> 00:05:13.932
And for them, this looked
like a circuit diagram.

00:05:14.353 --> 00:05:18.188
So there are a lot of details
in McCulloch and Pitts's circuit diagram

00:05:18.212 --> 00:05:19.564
that are not quite right.

00:05:19.588 --> 00:05:20.823
But this basic idea

00:05:20.847 --> 00:05:24.839
that visual cortex works like a series
of computational elements

00:05:24.863 --> 00:05:27.609
that pass information
one to the next in a cascade,

00:05:27.633 --> 00:05:29.235
is essentially correct.

NOTE Paragraph

00:05:29.259 --> 00:05:31.609
Let's talk for a moment

00:05:31.633 --> 00:05:35.665
about what a model for processing
visual information would need to do.

00:05:36.228 --> 00:05:38.969
The basic task of perception

00:05:38.993 --> 00:05:43.187
is to take an image like this one and say,

00:05:43.211 --> 00:05:44.387
"That's a bird,"

00:05:44.411 --> 00:05:47.285
which is a very simple thing
for us to do with our brains.

00:05:47.309 --> 00:05:50.730
But you should all understand
that for a computer,

00:05:50.754 --> 00:05:53.841
this was pretty much impossible
just a few years ago.

00:05:53.865 --> 00:05:55.781
The classical computing paradigm

00:05:55.805 --> 00:05:58.312
is not one in which
this task is easy to do.

NOTE Paragraph

00:05:59.366 --> 00:06:01.918
So what's going on between the pixels,

00:06:01.942 --> 00:06:05.970
between the image of the bird
and the word "bird,"

00:06:05.994 --> 00:06:08.808
is essentially a set of neurons
connected to each other

00:06:08.832 --> 00:06:09.987
in a neural network,

00:06:10.011 --> 00:06:11.234
as I'm diagramming here.

00:06:11.258 --> 00:06:14.530
This neural network could be biological,
inside our visual cortices,

00:06:14.554 --> 00:06:16.716
or, nowadays, we start
to have the capability

00:06:16.740 --> 00:06:19.194
to model such neural networks
on the computer.

00:06:19.834 --> 00:06:22.187
And I'll show you what
that actually looks like.

NOTE Paragraph

00:06:22.211 --> 00:06:25.627
So the pixels you can think
about as a first layer of neurons,

00:06:25.651 --> 00:06:27.890
and that's, in fact,
how it works in the eye --

00:06:27.914 --> 00:06:29.577
that's the neurons in the retina.

00:06:29.601 --> 00:06:31.101
And those feed forward

00:06:31.125 --> 00:06:34.528
into one layer after another layer,
after another layer of neurons,

00:06:34.552 --> 00:06:37.585
all connected by synapses
of different weights.

00:06:37.609 --> 00:06:38.944
The behavior of this network

00:06:38.968 --> 00:06:42.252
is characterized by the strengths
of all of those synapses.

00:06:42.276 --> 00:06:45.564
Those characterize the computational
properties of this network.

00:06:45.588 --> 00:06:47.058
And at the end of the day,

00:06:47.082 --> 00:06:49.529
you have a neuron
or a small group of neurons

00:06:49.553 --> 00:06:51.200
that light up, saying, "bird."

NOTE Paragraph

00:06:51.824 --> 00:06:54.956
Now I'm going to represent
those three things --

00:06:54.980 --> 00:06:59.676
the input pixels and the synapses
in the neural network,

00:06:59.700 --> 00:07:01.285
and bird, the output --

00:07:01.309 --> 00:07:04.366
by three variables: x, w and y.

00:07:04.853 --> 00:07:06.664
There are maybe a million or so x's --

00:07:06.688 --> 00:07:08.641
a million pixels in that image.

00:07:08.665 --> 00:07:11.111
There are billions or trillions of w's,

00:07:11.135 --> 00:07:14.556
which represent the weights of all
these synapses in the neural network.

00:07:14.580 --> 00:07:16.455
And there's a very small number of y's,

00:07:16.479 --> 00:07:18.337
of outputs that that network has.

00:07:18.361 --> 00:07:20.110
"Bird" is only four letters, right?

00:07:21.088 --> 00:07:24.514
So let's pretend that this
is just a simple formula,

00:07:24.538 --> 00:07:26.701
x "x" w = y.

00:07:26.725 --> 00:07:28.761
I'm putting the times in scare quotes

00:07:28.785 --> 00:07:31.065
because what's really
going on there, of course,

00:07:31.089 --> 00:07:34.135
is a very complicated series
of mathematical operations.

NOTE Paragraph

00:07:35.172 --> 00:07:36.393
That's one equation.

00:07:36.417 --> 00:07:38.089
There are three variables.

00:07:38.113 --> 00:07:40.839
And we all know
that if you have one equation,

00:07:40.863 --> 00:07:44.505
you can solve one variable
by knowing the other two things.

00:07:45.158 --> 00:07:48.538
So the problem of inference,

00:07:48.562 --> 00:07:51.435
that is, figuring out
that the picture of a bird is a bird,

00:07:51.459 --> 00:07:52.733
is this one:

00:07:52.757 --> 00:07:56.216
it's where y is the unknown
and w and x are known.

00:07:56.240 --> 00:07:58.699
You know the neural network,
you know the pixels.

00:07:58.723 --> 00:08:02.050
As you can see, that's actually
a relatively straightforward problem.

00:08:02.074 --> 00:08:04.260
You multiply two times three
and you're done.

00:08:04.862 --> 00:08:06.985
I'll show you an artificial neural network

00:08:07.009 --> 00:08:09.305
that we've built recently,
doing exactly that.

NOTE Paragraph

00:08:09.634 --> 00:08:12.494
This is running in real time
on a mobile phone,

00:08:12.518 --> 00:08:15.831
and that's, of course,
amazing in its own right,

00:08:15.855 --> 00:08:19.323
that mobile phones can do so many
billions and trillions of operations

00:08:19.347 --> 00:08:20.595
per second.

00:08:20.619 --> 00:08:22.234
What you're looking at is a phone

00:08:22.258 --> 00:08:25.805
looking at one after another
picture of a bird,

00:08:25.829 --> 00:08:28.544
and actually not only saying,
"Yes, it's a bird,"

00:08:28.568 --> 00:08:31.979
but identifying the species of bird
with a network of this sort.

00:08:32.890 --> 00:08:34.716
So in that picture,

00:08:34.740 --> 00:08:38.542
the x and the w are known,
and the y is the unknown.

00:08:38.566 --> 00:08:41.074
I'm glossing over the very
difficult part, of course,

00:08:41.098 --> 00:08:44.959
which is how on earth
do we figure out the w,

00:08:44.983 --> 00:08:47.170
the brain that can do such a thing?

00:08:47.194 --> 00:08:49.028
How would we ever learn such a model?

NOTE Paragraph

00:08:49.418 --> 00:08:52.651
So this process of learning,
of solving for w,

00:08:52.675 --> 00:08:55.322
if we were doing this
with the simple equation

00:08:55.346 --> 00:08:57.346
in which we think about these as numbers,

00:08:57.370 --> 00:09:00.057
we know exactly how to do that: 6 = 2 x w,

00:09:00.081 --> 00:09:03.393
well, we divide by two and we're done.

00:09:04.001 --> 00:09:06.221
The problem is with this operator.

00:09:06.823 --> 00:09:07.974
So, division --

00:09:07.998 --> 00:09:11.119
we've used division because
it's the inverse to multiplication,

00:09:11.143 --> 00:09:12.583
but as I've just said,

00:09:12.607 --> 00:09:15.056
the multiplication is a bit of a lie here.

00:09:15.080 --> 00:09:18.406
This is a very, very complicated,
very non-linear operation;

00:09:18.430 --> 00:09:20.134
it has no inverse.

00:09:20.158 --> 00:09:23.308
So we have to figure out a way
to solve the equation

00:09:23.332 --> 00:09:25.356
without a division operator.

00:09:25.380 --> 00:09:27.723
And the way to do that
is fairly straightforward.

00:09:27.747 --> 00:09:30.418
You just say, let's play
a little algebra trick,

00:09:30.442 --> 00:09:33.348
and move the six over
to the right-hand side of the equation.

00:09:33.372 --> 00:09:35.198
Now, we're still using multiplication.

00:09:35.675 --> 00:09:39.255
And that zero -- let's think
about it as an error.

00:09:39.279 --> 00:09:41.794
In other words, if we've solved
for w the right way,

00:09:41.818 --> 00:09:43.474
then the error will be zero.

00:09:43.498 --> 00:09:45.436
And if we haven't gotten it quite right,

00:09:45.460 --> 00:09:47.209
the error will be greater than zero.

NOTE Paragraph

00:09:47.233 --> 00:09:50.599
So now we can just take guesses
to minimize the error,

00:09:50.623 --> 00:09:53.310
and that's the sort of thing
computers are very good at.

00:09:53.334 --> 00:09:54.927
So you've taken an initial guess:

00:09:54.951 --> 00:09:56.107
what if w = 0?

00:09:56.131 --> 00:09:57.371
Well, then the error is 6.

00:09:57.395 --> 00:09:58.841
What if w = 1? The error is 4.

00:09:58.865 --> 00:10:01.232
And then the computer can
sort of play Marco Polo,

00:10:01.256 --> 00:10:03.623
and drive down the error close to zero.

00:10:03.647 --> 00:10:07.021
As it does that, it's getting
successive approximations to w.

00:10:07.045 --> 00:10:10.701
Typically, it never quite gets there,
but after about a dozen steps,

00:10:10.725 --> 00:10:15.349
we're up to w = 2.999,
which is close enough.

00:10:16.302 --> 00:10:18.116
And this is the learning process.

NOTE Paragraph

00:10:18.140 --> 00:10:20.870
So remember that what's been going on here

00:10:20.894 --> 00:10:25.272
is that we've been taking
a lot of known x's and known y's

00:10:25.296 --> 00:10:28.750
and solving for the w in the middle
through an iterative process.

00:10:28.774 --> 00:10:32.330
It's exactly the same way
that we do our own learning.

00:10:32.354 --> 00:10:34.584
We have many, many images as babies

00:10:34.608 --> 00:10:37.241
and we get told, "This is a bird;
this is not a bird."

00:10:37.714 --> 00:10:39.812
And over time, through iteration,

00:10:39.836 --> 00:10:42.764
we solve for w, we solve
for those neural connections.

NOTE Paragraph

00:10:43.460 --> 00:10:47.546
So now, we've held
x and w fixed to solve for y;

00:10:47.570 --> 00:10:49.417
that's everyday, fast perception.

00:10:49.441 --> 00:10:51.204
We figure out how we can solve for w,

00:10:51.228 --> 00:10:53.131
that's learning, which is a lot harder,

00:10:53.155 --> 00:10:55.140
because we need to do error minimization,

00:10:55.164 --> 00:10:56.851
using a lot of training examples.

NOTE Paragraph

00:10:56.875 --> 00:11:00.062
And about a year ago,
Alex Mordvintsev, on our team,

00:11:00.086 --> 00:11:03.636
decided to experiment
with what happens if we try solving for x,

00:11:03.660 --> 00:11:05.697
given a known w and a known y.

00:11:06.124 --> 00:11:07.275
In other words,

00:11:07.299 --> 00:11:08.651
you know that it's a bird,

00:11:08.675 --> 00:11:11.978
and you already have your neural network
that you've trained on birds,

00:11:12.002 --> 00:11:14.346
but what is the picture of a bird?

00:11:15.034 --> 00:11:20.058
It turns out that by using exactly
the same error-minimization procedure,

00:11:20.082 --> 00:11:23.512
one can do that with the network
trained to recognize birds,

00:11:23.536 --> 00:11:26.924
and the result turns out to be ...

00:11:30.400 --> 00:11:31.705
a picture of birds.

00:11:32.814 --> 00:11:36.551
So this is a picture of birds
generated entirely by a neural network

00:11:36.575 --> 00:11:38.401
that was trained to recognize birds,

00:11:38.425 --> 00:11:41.963
just by solving for x
rather than solving for y,

00:11:41.987 --> 00:11:43.275
and doing that iteratively.

NOTE Paragraph

00:11:43.732 --> 00:11:45.579
Here's another fun example.

00:11:45.603 --> 00:11:49.040
This was a work made
by Mike Tyka in our group,

00:11:49.064 --> 00:11:51.372
which he calls "Animal Parade."

00:11:51.396 --> 00:11:54.272
It reminds me a little bit
of William Kentridge's artworks,

00:11:54.296 --> 00:11:56.785
in which he makes sketches, rubs them out,

00:11:56.809 --> 00:11:58.269
makes sketches, rubs them out,

00:11:58.293 --> 00:11:59.691
and creates a movie this way.

00:11:59.715 --> 00:12:00.866
In this case,

00:12:00.890 --> 00:12:04.167
what Mike is doing is varying y
over the space of different animals,

00:12:04.191 --> 00:12:06.573
in a network designed
to recognize and distinguish

00:12:06.597 --> 00:12:08.407
different animals from each other.

00:12:08.431 --> 00:12:12.182
And you get this strange, Escher-like
morph from one animal to another.

NOTE Paragraph

00:12:14.221 --> 00:12:18.835
Here he and Alex together
have tried reducing

00:12:18.859 --> 00:12:21.618
the y's to a space of only two dimensions,

00:12:21.642 --> 00:12:25.080
thereby making a map
out of the space of all things

00:12:25.104 --> 00:12:26.823
recognized by this network.

00:12:26.847 --> 00:12:28.870
Doing this kind of synthesis

00:12:28.894 --> 00:12:31.276
or generation of imagery
over that entire surface,

00:12:31.300 --> 00:12:34.146
varying y over the surface,
you make a kind of map --

00:12:34.170 --> 00:12:37.311
a visual map of all the things
the network knows how to recognize.

00:12:37.335 --> 00:12:40.200
The animals are all here;
"armadillo" is right in that spot.

NOTE Paragraph

00:12:40.919 --> 00:12:43.398
You can do this with other kinds
of networks as well.

00:12:43.422 --> 00:12:46.296
This is a network designed
to recognize faces,

00:12:46.320 --> 00:12:48.320
to distinguish one face from another.

00:12:48.344 --> 00:12:51.593
And here, we're putting
in a y that says, "me,"

00:12:51.617 --> 00:12:53.192
my own face parameters.

00:12:53.216 --> 00:12:54.922
And when this thing solves for x,

00:12:54.946 --> 00:12:57.564
it generates this rather crazy,

00:12:57.588 --> 00:13:02.016
kind of cubist, surreal,
psychedelic picture of me

00:13:02.040 --> 00:13:03.846
from multiple points of view at once.

00:13:03.870 --> 00:13:06.604
The reason it looks like
multiple points of view at once

00:13:06.628 --> 00:13:10.315
is because that network is designed
to get rid of the ambiguity

00:13:10.339 --> 00:13:12.815
of a face being in one pose
or another pose,

00:13:12.839 --> 00:13:16.215
being looked at with one kind of lighting,
another kind of lighting.

00:13:16.239 --> 00:13:18.324
So when you do
this sort of reconstruction,

00:13:18.348 --> 00:13:20.652
if you don't use some sort of guide image

00:13:20.676 --> 00:13:21.887
or guide statistics,

00:13:21.911 --> 00:13:25.676
then you'll get a sort of confusion
of different points of view,

00:13:25.700 --> 00:13:27.068
because it's ambiguous.

00:13:27.786 --> 00:13:32.009
This is what happens if Alex uses
his own face as a guide image

00:13:32.033 --> 00:13:35.354
during that optimization process
to reconstruct my own face.

00:13:36.284 --> 00:13:38.612
So you can see it's not perfect.

00:13:38.636 --> 00:13:40.510
There's still quite a lot of work to do

00:13:40.534 --> 00:13:42.987
on how we optimize
that optimization process.

00:13:43.011 --> 00:13:45.838
But you start to get something
more like a coherent face,

00:13:45.862 --> 00:13:47.876
rendered using my own face as a guide.

NOTE Paragraph

00:13:48.892 --> 00:13:51.393
You don't have to start
with a blank canvas

00:13:51.417 --> 00:13:52.573
or with white noise.

00:13:52.597 --> 00:13:53.901
When you're solving for x,

00:13:53.925 --> 00:13:57.814
you can begin with an x,
that is itself already some other image.

00:13:57.838 --> 00:14:00.394
That's what this little demonstration is.

00:14:00.418 --> 00:14:04.540
This is a network
that is designed to categorize

00:14:04.564 --> 00:14:07.683
all sorts of different objects --
man-made structures, animals ...

00:14:07.707 --> 00:14:10.300
Here we're starting
with just a picture of clouds,

00:14:10.324 --> 00:14:11.995
and as we optimize,

00:14:12.019 --> 00:14:16.505
basically, this network is figuring out
what it sees in the clouds.

00:14:16.931 --> 00:14:19.251
And the more time
you spend looking at this,

00:14:19.275 --> 00:14:22.028
the more things you also
will see in the clouds.

00:14:23.004 --> 00:14:26.379
You could also use the face network
to hallucinate into this,

00:14:26.403 --> 00:14:28.215
and you get some pretty crazy stuff.

NOTE Paragraph

00:14:28.239 --> 00:14:29.389
(Laughter)

NOTE Paragraph

00:14:30.401 --> 00:14:33.145
Or, Mike has done some other experiments

00:14:33.169 --> 00:14:37.074
in which he takes that cloud image,

00:14:37.098 --> 00:14:40.605
hallucinates, zooms, hallucinates,
zooms hallucinates, zooms.

00:14:40.629 --> 00:14:41.780
And in this way,

00:14:41.804 --> 00:14:45.479
you can get a sort of fugue state
of the network, I suppose,

00:14:45.503 --> 00:14:49.183
or a sort of free association,

00:14:49.207 --> 00:14:51.434
in which the network
is eating its own tail.

00:14:51.458 --> 00:14:54.879
So every image is now the basis for,

00:14:54.903 --> 00:14:56.324
"What do I think I see next?

00:14:56.348 --> 00:14:59.151
What do I think I see next?
What do I think I see next?"

NOTE Paragraph

00:14:59.487 --> 00:15:02.423
I showed this for the first time in public

00:15:02.447 --> 00:15:07.884
to a group at a lecture in Seattle
called "Higher Education" --

00:15:07.908 --> 00:15:10.345
this was right after
marijuana was legalized.

NOTE Paragraph

00:15:10.369 --> 00:15:12.784
(Laughter)

NOTE Paragraph

00:15:14.627 --> 00:15:16.731
So I'd like to finish up quickly

00:15:16.755 --> 00:15:21.010
by just noting that this technology
is not constrained.

00:15:21.034 --> 00:15:24.699
I've shown you purely visual examples
because they're really fun to look at.

00:15:24.723 --> 00:15:27.174
It's not a purely visual technology.

00:15:27.198 --> 00:15:29.191
Our artist collaborator, Ross Goodwin,

00:15:29.215 --> 00:15:32.886
has done experiments involving
a camera that takes a picture,

00:15:32.910 --> 00:15:37.144
and then a computer in his backpack
writes a poem using neural networks,

00:15:37.168 --> 00:15:39.112
based on the contents of the image.

00:15:39.136 --> 00:15:42.083
And that poetry neural network
has been trained

00:15:42.107 --> 00:15:44.341
on a large corpus of 20th-century poetry.

00:15:44.365 --> 00:15:45.864
And the poetry is, you know,

00:15:45.888 --> 00:15:47.802
I think, kind of not bad, actually.

NOTE Paragraph

00:15:47.826 --> 00:15:49.210
(Laughter)

NOTE Paragraph

00:15:49.234 --> 00:15:50.393
In closing,

00:15:50.417 --> 00:15:52.549
I think that per Michelangelo,

00:15:52.573 --> 00:15:53.807
I think he was right;

00:15:53.831 --> 00:15:57.267
perception and creativity
are very intimately connected.

00:15:57.611 --> 00:16:00.245
What we've just seen are neural networks

00:16:00.269 --> 00:16:02.572
that are entirely trained to discriminate,

00:16:02.596 --> 00:16:04.838
or to recognize different
things in the world,

00:16:04.862 --> 00:16:08.023
able to be run in reverse, to generate.

00:16:08.047 --> 00:16:09.830
One of the things that suggests to me

00:16:09.854 --> 00:16:12.252
is not only that
Michelangelo really did see

00:16:12.276 --> 00:16:14.728
the sculpture in the blocks of stone,

00:16:14.752 --> 00:16:18.390
but that any creature,
any being, any alien

00:16:18.414 --> 00:16:22.071
that is able to do
perceptual acts of that sort

00:16:22.095 --> 00:16:23.470
is also able to create

00:16:23.494 --> 00:16:26.718
because it's exactly the same
machinery that's used in both cases.

NOTE Paragraph

00:16:26.742 --> 00:16:31.274
Also, I think that perception
and creativity are by no means

00:16:31.298 --> 00:16:32.508
uniquely human.

00:16:32.532 --> 00:16:36.240
We start to have computer models
that can do exactly these sorts of things.

00:16:36.264 --> 00:16:39.592
And that ought to be unsurprising;
the brain is computational.

NOTE Paragraph

00:16:39.616 --> 00:16:41.273
And finally,

00:16:41.297 --> 00:16:45.965
computing began as an exercise
in designing intelligent machinery.

00:16:45.989 --> 00:16:48.451
It was very much modeled after the idea

00:16:48.475 --> 00:16:51.488
of how could we make machines intelligent.

00:16:51.512 --> 00:16:53.674
And we finally are starting to fulfill now

00:16:53.698 --> 00:16:56.104
some of the promises
of those early pioneers,

00:16:56.128 --> 00:16:57.841
of Turing and von Neumann

00:16:57.865 --> 00:17:00.130
and McCulloch and Pitts.

00:17:00.154 --> 00:17:04.252
And I think that computing
is not just about accounting

00:17:04.276 --> 00:17:06.423
or playing Candy Crush or something.

00:17:06.447 --> 00:17:09.025
From the beginning,
we modeled them after our minds.

00:17:09.049 --> 00:17:12.318
And they give us both the ability
to understand our own minds better

00:17:12.342 --> 00:17:13.871
and to extend them.

NOTE Paragraph

00:17:14.627 --> 00:17:15.794
Thank you very much.

NOTE Paragraph

00:17:15.818 --> 00:17:21.757
(Applause)