How a computer learns to recognize objects instantly
-
0:01 - 0:02Ten years ago,
-
0:02 - 0:05computer vision researchers
thought that getting a computer -
0:05 - 0:07to tell the difference
between a cat and a dog -
0:08 - 0:09would be almost impossible,
-
0:10 - 0:13even with the significant advance
in the state of artificial intelligence. -
0:13 - 0:17Now we can do it at a level
greater than 99 percent accuracy. -
0:18 - 0:20This is called image classification --
-
0:20 - 0:23give it an image,
put a label to that image -- -
0:23 - 0:26and computers know
thousands of other categories as well. -
0:27 - 0:30I'm a graduate student
at the University of Washington, -
0:30 - 0:31and I work on a project called Darknet,
-
0:32 - 0:33which is a neural network framework
-
0:33 - 0:36for training and testing
computer vision models. -
0:36 - 0:39So let's just see what Darknet thinks
-
0:39 - 0:41of this image that we have.
-
0:43 - 0:45When we run our classifier
-
0:45 - 0:46on this image,
-
0:46 - 0:49we see we don't just get
a prediction of dog or cat, -
0:49 - 0:51we actually get
specific breed predictions. -
0:51 - 0:53That's the level
of granularity we have now. -
0:53 - 0:55And it's correct.
-
0:55 - 0:57My dog is in fact a malamute.
-
0:57 - 1:01So we've made amazing strides
in image classification, -
1:01 - 1:03but what happens
when we run our classifier -
1:03 - 1:05on an image that looks like this?
-
1:07 - 1:08Well ...
-
1:13 - 1:17We see that the classifier comes back
with a pretty similar prediction. -
1:17 - 1:20And it's correct,
there is a malamute in the image, -
1:20 - 1:23but just given this label,
we don't actually know that much -
1:23 - 1:25about what's going on in the image.
-
1:25 - 1:27We need something more powerful.
-
1:27 - 1:30I work on a problem
called object detection, -
1:30 - 1:33where we look at an image
and try to find all of the objects, -
1:33 - 1:34put bounding boxes around them
-
1:34 - 1:36and say what those objects are.
-
1:36 - 1:40So here's what happens
when we run a detector on this image. -
1:41 - 1:43Now, with this kind of result,
-
1:44 - 1:46we can do a lot more
with our computer vision algorithms. -
1:46 - 1:49We see that it knows
that there's a cat and a dog. -
1:49 - 1:51It knows their relative locations,
-
1:52 - 1:53their size.
-
1:53 - 1:55It may even know some extra information.
-
1:55 - 1:57There's a book sitting in the background.
-
1:57 - 2:01And if you want to build a system
on top of computer vision, -
2:01 - 2:04say a self-driving vehicle
or a robotic system, -
2:04 - 2:06this is the kind
of information that you want. -
2:07 - 2:10You want something so that
you can interact with the physical world. -
2:11 - 2:13Now, when I started working
on object detection, -
2:13 - 2:16it took 20 seconds
to process a single image. -
2:16 - 2:20And to get a feel for why
speed is so important in this domain, -
2:21 - 2:24here's an example of an object detector
-
2:24 - 2:26that takes two seconds
to process an image. -
2:26 - 2:29So this is 10 times faster
-
2:29 - 2:32than the 20-seconds-per-image detector,
-
2:32 - 2:35and you can see that by the time
it makes predictions, -
2:35 - 2:37the entire state of the world has changed,
-
2:38 - 2:40and this wouldn't be very useful
-
2:40 - 2:42for an application.
-
2:42 - 2:44If we speed this up
by another factor of 10, -
2:44 - 2:47this is a detector running
at five frames per second. -
2:47 - 2:49This is a lot better,
-
2:49 - 2:51but for example,
-
2:51 - 2:53if there's any significant movement,
-
2:53 - 2:56I wouldn't want a system
like this driving my car. -
2:57 - 3:00This is our detection system
running in real time on my laptop. -
3:01 - 3:04So it smoothly tracks me
as I move around the frame, -
3:04 - 3:08and it's robust to a wide variety
of changes in size, -
3:09 - 3:11pose,
-
3:11 - 3:13forward, backward.
-
3:13 - 3:14This is great.
-
3:14 - 3:16This is what we really need
-
3:16 - 3:19if we're going to build systems
on top of computer vision. -
3:19 - 3:23(Applause)
-
3:24 - 3:26So in just a few years,
-
3:26 - 3:29we've gone from 20 seconds per image
-
3:29 - 3:33to 20 milliseconds per image,
a thousand times faster. -
3:33 - 3:34How did we get there?
-
3:34 - 3:37Well, in the past,
object detection systems -
3:37 - 3:39would take an image like this
-
3:39 - 3:42and split it into a bunch of regions
-
3:42 - 3:45and then run a classifier
on each of these regions, -
3:45 - 3:47and high scores for that classifier
-
3:47 - 3:51would be considered
detections in the image. -
3:51 - 3:55But this involved running a classifier
thousands of times over an image, -
3:55 - 3:58thousands of neural network evaluations
to produce detection. -
3:59 - 4:04Instead, we trained a single network
to do all of detection for us. -
4:04 - 4:08It produces all of the bounding boxes
and class probabilities simultaneously. -
4:09 - 4:12With our system, instead of looking
at an image thousands of times -
4:12 - 4:14to produce detection,
-
4:14 - 4:15you only look once,
-
4:15 - 4:18and that's why we call it
the YOLO method of object detection. -
4:19 - 4:23So with this speed,
we're not just limited to images; -
4:23 - 4:26we can process video in real time.
-
4:26 - 4:29And now, instead of just seeing
that cat and dog, -
4:29 - 4:32we can see them move around
and interact with each other. -
4:35 - 4:37This is a detector that we trained
-
4:37 - 4:41on 80 different classes
-
4:41 - 4:44in Microsoft's COCO dataset.
-
4:44 - 4:48It has all sorts of things
like spoon and fork, bowl, -
4:48 - 4:49common objects like that.
-
4:50 - 4:53It has a variety of more exotic things:
-
4:53 - 4:57animals, cars, zebras, giraffes.
-
4:57 - 4:59And now we're going to do something fun.
-
4:59 - 5:01We're just going to go
out into the audience -
5:01 - 5:03and see what kind of things we can detect.
-
5:03 - 5:04Does anyone want a stuffed animal?
-
5:06 - 5:08There are some teddy bears out there.
-
5:10 - 5:15And we can turn down
our threshold for detection a little bit, -
5:15 - 5:18so we can find more of you guys
out in the audience. -
5:20 - 5:22Let's see if we can get these stop signs.
-
5:22 - 5:24We find some backpacks.
-
5:26 - 5:28Let's just zoom in a little bit.
-
5:30 - 5:32And this is great.
-
5:32 - 5:35And all of the processing
is happening in real time -
5:35 - 5:36on the laptop.
-
5:37 - 5:39And it's important to remember
-
5:39 - 5:42that this is a general purpose
object detection system, -
5:42 - 5:47so we can train this for any image domain.
-
5:48 - 5:51The same code that we use
-
5:51 - 5:53to find stop signs or pedestrians,
-
5:53 - 5:55bicycles in a self-driving vehicle,
-
5:55 - 5:58can be used to find cancer cells
-
5:58 - 6:01in a tissue biopsy.
-
6:01 - 6:05And there are researchers around the globe
already using this technology -
6:06 - 6:10for advances in things
like medicine, robotics. -
6:10 - 6:11This morning, I read a paper
-
6:11 - 6:16where they were taking a census
of animals in Nairobi National Park -
6:16 - 6:19with YOLO as part
of this detection system. -
6:19 - 6:22And that's because Darknet is open source
-
6:22 - 6:24and in the public domain,
free for anyone to use. -
6:26 - 6:31(Applause)
-
6:31 - 6:36But we wanted to make detection
even more accessible and usable, -
6:36 - 6:40so through a combination
of model optimization, -
6:40 - 6:43network binarization and approximation,
-
6:43 - 6:47we actually have object detection
running on a phone. -
6:53 - 6:58(Applause)
-
6:59 - 7:04And I'm really excited because
now we have a pretty powerful solution -
7:04 - 7:06to this low-level computer vision problem,
-
7:06 - 7:10and anyone can take it
and build something with it. -
7:10 - 7:13So now the rest is up to all of you
-
7:13 - 7:16and people around the world
with access to this software, -
7:16 - 7:20and I can't wait to see what people
will build with this technology. -
7:20 - 7:21Thank you.
-
7:21 - 7:25(Applause)
- Title:
- How a computer learns to recognize objects instantly
- Speaker:
- Joseph Redmon
- Description:
-
Ten years ago, researchers thought that getting a computer to tell the difference between a cat and a dog would be almost impossible. Today, computer vision systems do it with greater than 99 percent accuracy. How? Joseph Redmon works on the YOLO (You Only Look Once) system, an open-source method of object detection that can identify objects in images and video -- from zebras to stop signs -- with lightning-quick speed. In a remarkable live demo, Redmon shows off this important step forward for applications like self-driving cars, robotics and even cancer detection.
- Video Language:
- English
- Team:
- closed TED
- Project:
- TEDTalks
- Duration:
- 07:37
Brian Greene edited English subtitles for How computers learn to recognize objects instantly | ||
Brian Greene approved English subtitles for How computers learn to recognize objects instantly | ||
Brian Greene edited English subtitles for How computers learn to recognize objects instantly | ||
Joanna Pietrulewicz accepted English subtitles for How computers learn to recognize objects instantly | ||
Joanna Pietrulewicz edited English subtitles for How computers learn to recognize objects instantly | ||
Joanna Pietrulewicz edited English subtitles for How computers learn to recognize objects instantly | ||
Joseph Geni edited English subtitles for How computers learn to recognize objects instantly |