Big data is better data
-
0:01 - 0:05America's favorite pie is?
-
0:05 - 0:08Audience: Apple.
Kenneth Cukier: Apple. Of course it is. -
0:08 - 0:09How do we know it?
-
0:09 - 0:12Because of data.
-
0:12 - 0:14You look at supermarket sales.
-
0:14 - 0:17You look at supermarket
sales of 30-centimeter pies -
0:17 - 0:21that are frozen, and apple wins, no contest.
-
0:21 - 0:26The majority of the sales are apple.
-
0:26 - 0:29But then supermarkets started selling
-
0:29 - 0:32smaller, 11-centimeter pies,
-
0:32 - 0:36and suddenly, apple fell to fourth or fifth place.
-
0:36 - 0:39Why? What happened?
-
0:39 - 0:42Okay, think about it.
-
0:42 - 0:46When you buy a 30-centimeter pie,
-
0:46 - 0:48the whole family has to agree,
-
0:48 - 0:52and apple is everyone's second favorite.
-
0:52 - 0:54(Laughter)
-
0:54 - 0:57But when you buy an individual 11-centimeter pie,
-
0:57 - 1:01you can buy the one that you want.
-
1:01 - 1:05You can get your first choice.
-
1:05 - 1:07You have more data.
-
1:07 - 1:08You can see something
-
1:08 - 1:09that you couldn't see
-
1:09 - 1:13when you only had smaller amounts of it.
-
1:13 - 1:16Now, the point here is that more data
-
1:16 - 1:18doesn't just let us see more,
-
1:18 - 1:20more of the same thing we were looking at.
-
1:20 - 1:23More data allows us to see new.
-
1:23 - 1:27It allows us to see better.
-
1:27 - 1:30It allows us to see different.
-
1:30 - 1:33In this case, it allows us to see
-
1:33 - 1:36what America's favorite pie is:
-
1:36 - 1:39not apple.
-
1:39 - 1:42Now, you probably all have heard the term big data.
-
1:42 - 1:44In fact, you're probably sick of hearing the term
-
1:44 - 1:46big data.
-
1:46 - 1:49It is true that there is a lot of hype around the term,
-
1:49 - 1:52and that is very unfortunate,
-
1:52 - 1:55because big data is an extremely important tool
-
1:55 - 1:59by which society is going to advance.
-
1:59 - 2:02In the past, we used to look at small data
-
2:02 - 2:04and think about what it would mean
-
2:04 - 2:05to try to understand the world,
-
2:05 - 2:07and now we have a lot more of it,
-
2:07 - 2:10more than we ever could before.
-
2:10 - 2:12What we find is that when we have
-
2:12 - 2:15a large body of data, we can fundamentally do things
-
2:15 - 2:18that we couldn't do when we
only had smaller amounts. -
2:18 - 2:21Big data is important, and big data is new,
-
2:21 - 2:22and when you think about it,
-
2:22 - 2:25the only way this planet is going to deal
-
2:25 - 2:26with its global challenges —
-
2:26 - 2:30to feed people, supply them with medical care,
-
2:30 - 2:33supply them with energy, electricity,
-
2:33 - 2:34and to make sure they're not burnt to a crisp
-
2:34 - 2:36because of global warming —
-
2:36 - 2:40is because of the effective use of data.
-
2:40 - 2:44So what is new about big
data? What is the big deal? -
2:44 - 2:46Well, to answer that question, let's think about
-
2:46 - 2:48what information looked like,
-
2:48 - 2:51physically looked like in the past.
-
2:51 - 2:55In 1908, on the island of Crete,
-
2:55 - 3:00archaeologists discovered a clay disc.
-
3:00 - 3:04They dated it from 2000 B.C., so it's 4,000 years old.
-
3:04 - 3:06Now, there's inscriptions on this disc,
-
3:06 - 3:07but we actually don't know what it means.
-
3:07 - 3:09It's a complete mystery, but the point is that
-
3:09 - 3:11this is what information used to look like
-
3:11 - 3:134,000 years ago.
-
3:13 - 3:16This is how society stored
-
3:16 - 3:19and transmitted information.
-
3:19 - 3:23Now, society hasn't advanced all that much.
-
3:23 - 3:27We still store information on discs,
-
3:27 - 3:30but now we can store a lot more information,
-
3:30 - 3:31more than ever before.
-
3:31 - 3:34Searching it is easier. Copying it easier.
-
3:34 - 3:38Sharing it is easier. Processing it is easier.
-
3:38 - 3:41And what we can do is we can reuse this information
-
3:41 - 3:42for uses that we never even imagined
-
3:42 - 3:46when we first collected the data.
-
3:46 - 3:48In this respect, the data has gone
-
3:48 - 3:51from a stock to a flow,
-
3:51 - 3:55from something that is stationary and static
-
3:55 - 3:59to something that is fluid and dynamic.
-
3:59 - 4:03There is, if you will, a liquidity to information.
-
4:03 - 4:06The disc that was discovered off of Crete
-
4:06 - 4:10that's 4,000 years old, is heavy,
-
4:10 - 4:12it doesn't store a lot of information,
-
4:12 - 4:15and that information is unchangeable.
-
4:15 - 4:19By contrast, all of the files
-
4:19 - 4:21that Edward Snowden took
-
4:21 - 4:24from the National Security
Agency in the United States -
4:24 - 4:26fits on a memory stick
-
4:26 - 4:29the size of a fingernail,
-
4:29 - 4:34and it can be shared at the speed of light.
-
4:34 - 4:39More data. More.
-
4:39 - 4:41Now, one reason why we have
so much data in the world today -
4:41 - 4:43is we are collecting things
-
4:43 - 4:46that we've always collected information on,
-
4:46 - 4:49but another reason why is we're taking things
-
4:49 - 4:51that have always been informational
-
4:51 - 4:54but have never been rendered into a data format
-
4:54 - 4:56and we are putting it into data.
-
4:56 - 5:00Think, for example, the question of location.
-
5:00 - 5:02Take, for example, Martin Luther.
-
5:02 - 5:03If we wanted to know in the 1500s
-
5:03 - 5:06where Martin Luther was,
-
5:06 - 5:08we would have to follow him at all times,
-
5:08 - 5:10maybe with a feathery quill and an inkwell,
-
5:10 - 5:12and record it,
-
5:12 - 5:14but now think about what it looks like today.
-
5:14 - 5:16You know that somewhere,
-
5:16 - 5:19probably in a telecommunications carrier's database,
-
5:19 - 5:22there is a spreadsheet or at least a database entry
-
5:22 - 5:24that records your information
-
5:24 - 5:26of where you've been at all times.
-
5:26 - 5:27If you have a cell phone,
-
5:27 - 5:30and that cell phone has GPS,
but even if it doesn't have GPS, -
5:30 - 5:33it can record your information.
-
5:33 - 5:37In this respect, location has been datafied.
-
5:37 - 5:41Now think, for example, of the issue of posture,
-
5:41 - 5:42the way that you are all sitting right now,
-
5:42 - 5:45the way that you sit,
-
5:45 - 5:47the way that you sit, the way that you sit.
-
5:47 - 5:49It's all different, and it's a function of your leg length
-
5:49 - 5:51and your back and the contours of your back,
-
5:51 - 5:54and if I were to put sensors,
maybe 100 sensors -
5:54 - 5:56into all of your chairs right now,
-
5:56 - 5:59I could create an index that's fairly unique to you,
-
5:59 - 6:04sort of like a fingerprint, but it's not your finger.
-
6:04 - 6:07So what could we do with this?
-
6:07 - 6:09Researchers in Tokyo are using it
-
6:09 - 6:14as a potential anti-theft device in cars.
-
6:14 - 6:16The idea is that the carjacker sits behind the wheel,
-
6:16 - 6:19tries to stream off, but the car recognizes
-
6:19 - 6:21that a non-approved driver is behind the wheel,
-
6:21 - 6:23and maybe the engine just stops, unless you
-
6:23 - 6:26type in a password into the dashboard
-
6:26 - 6:31to say, "Hey, I have authorization to drive." Great.
-
6:31 - 6:33What if every single car in Europe
-
6:33 - 6:35had this technology in it?
-
6:35 - 6:38What could we do then?
-
6:38 - 6:40Maybe, if we aggregated the data,
-
6:40 - 6:44maybe we could identify telltale signs
-
6:44 - 6:47that best predict that a car accident
-
6:47 - 6:53is going to take place in the next five seconds.
-
6:53 - 6:55And then what we will have datafied
-
6:55 - 6:57is driver fatigue,
-
6:57 - 6:59and the service would be when the car senses
-
6:59 - 7:03that the person slumps into that position,
-
7:03 - 7:07automatically knows, hey, set an internal alarm
-
7:07 - 7:09that would vibrate the steering wheel, honk inside
-
7:09 - 7:11to say, "Hey, wake up,
-
7:11 - 7:12pay more attention to the road."
-
7:12 - 7:14These are the sorts of things we can do
-
7:14 - 7:17when we datafy more aspects of our lives.
-
7:17 - 7:21So what is the value of big data?
-
7:21 - 7:23Well, think about it.
-
7:23 - 7:25You have more information.
-
7:25 - 7:29You can do things that you couldn't do before.
-
7:29 - 7:30One of the most impressive areas
-
7:30 - 7:32where this concept is taking place
-
7:32 - 7:35is in the area of machine learning.
-
7:35 - 7:39Machine learning is a branch of artificial intelligence,
-
7:39 - 7:42which itself is a branch of computer science.
-
7:42 - 7:43The general idea is that instead of
-
7:43 - 7:46instructing a computer what do do,
-
7:46 - 7:48we are going to simply throw data at the problem
-
7:48 - 7:51and tell the computer to figure it out for itself.
-
7:51 - 7:53And it will help you understand it
-
7:53 - 7:57by seeing its origins.
-
7:57 - 7:59In the 1950s, a computer scientist
-
7:59 - 8:03at IBM named Arthur Samuel liked to play checkers,
-
8:03 - 8:04so he wrote a computer program
-
8:04 - 8:07so he could play against the computer.
-
8:07 - 8:10He played. He won.
-
8:10 - 8:12He played. He won.
-
8:12 - 8:15He played. He won,
-
8:15 - 8:17because the computer only knew
-
8:17 - 8:19what a legal move was.
-
8:19 - 8:21Arthur Samuel knew something else.
-
8:21 - 8:26Arthur Samuel knew strategy.
-
8:26 - 8:28So he wrote a small sub-program alongside it
-
8:28 - 8:30operating in the background, and all it did
-
8:30 - 8:32was score the probability
-
8:32 - 8:34that a given board configuration would likely lead
-
8:34 - 8:37to a winning board versus a losing board
-
8:37 - 8:40after every move.
-
8:40 - 8:43He plays the computer. He wins.
-
8:43 - 8:45He plays the computer. He wins.
-
8:45 - 8:49He plays the computer. He wins.
-
8:49 - 8:51And then Arthur Samuel leaves the computer
-
8:51 - 8:54to play itself.
-
8:54 - 8:57It plays itself. It collects more data.
-
8:57 - 9:01It collects more data. It increases
the accuracy of its prediction. -
9:01 - 9:03And then Arthur Samuel goes back to the computer
-
9:03 - 9:06and he plays it, and he loses,
-
9:06 - 9:08and he plays it, and he loses,
-
9:08 - 9:10and he plays it, and he loses,
-
9:10 - 9:13and Arthur Samuel has created a machine
-
9:13 - 9:19that surpasses his ability in a task that he taught it.
-
9:19 - 9:21And this idea of machine learning
-
9:21 - 9:25is going everywhere.
-
9:25 - 9:28How do you think we have self-driving cars?
-
9:28 - 9:31Are we any better off as a society
-
9:31 - 9:34enshrining all the rules of the road into software?
-
9:34 - 9:36No. Memory is cheaper. No.
-
9:36 - 9:40Algorithms are faster. No. Processors are better. No.
-
9:40 - 9:43All of those things matter, but that's not why.
-
9:43 - 9:46It's because we changed the nature of the problem.
-
9:46 - 9:48We changed the nature of the problem from one
-
9:48 - 9:50in which we tried to overtly and explicitly
-
9:50 - 9:53explain to the computer how to drive
-
9:53 - 9:54to one in which we say,
-
9:54 - 9:56"Here's a lot of data around the vehicle.
-
9:56 - 9:57You figure it out.
-
9:57 - 9:59You figure it out that that is a traffic light,
-
9:59 - 10:01that that traffic light is red and not green,
-
10:01 - 10:03that that means that you need to stop
-
10:03 - 10:06and not go forward."
-
10:06 - 10:08Machine learning is at the basis
-
10:08 - 10:10of many of the things that we do online:
-
10:10 - 10:12search engines,
-
10:12 - 10:16Amazon's personalization algorithm,
-
10:16 - 10:18computer translation,
-
10:18 - 10:22voice recognition systems.
-
10:22 - 10:25Researchers recently have looked at
-
10:25 - 10:28the question of biopsies,
-
10:28 - 10:31cancerous biopsies,
-
10:31 - 10:33and they've asked the computer to identify
-
10:33 - 10:36by looking at the data and survival rates
-
10:36 - 10:40to determine whether cells are actually
-
10:40 - 10:43cancerous or not,
-
10:43 - 10:45and sure enough, when you throw the data at it,
-
10:45 - 10:47through a machine-learning algorithm,
-
10:47 - 10:49the machine was able to identify
-
10:49 - 10:51the 12 telltale signs that best predict
-
10:51 - 10:54that this biopsy of the breast cancer cells
-
10:54 - 10:57are indeed cancerous.
-
10:57 - 11:00The problem: The medical literature
-
11:00 - 11:03only knew nine of them.
-
11:03 - 11:04Three of the traits were ones
-
11:04 - 11:07that people didn't need to look for,
-
11:07 - 11:13but that the machine spotted.
-
11:13 - 11:19Now, there are dark sides to big data as well.
-
11:19 - 11:21It will improve our lives, but there are problems
-
11:21 - 11:24that we need to be conscious of,
-
11:24 - 11:26and the first one is the idea
-
11:26 - 11:29that we may be punished for predictions,
-
11:29 - 11:33that the police may use big data for their purposes,
-
11:33 - 11:35a little bit like "Minority Report."
-
11:35 - 11:38Now, it's a term called predictive policing,
-
11:38 - 11:40or algorithmic criminology,
-
11:40 - 11:42and the idea is that if we take a lot of data,
-
11:42 - 11:44for example where past crimes have been,
-
11:44 - 11:47we know where to send the patrols.
-
11:47 - 11:49That makes sense, but the problem, of course,
-
11:49 - 11:53is that it's not simply going to stop on location data,
-
11:53 - 11:56it's going to go down to the level of the individual.
-
11:56 - 11:59Why don't we use data about the person's
-
11:59 - 12:01high school transcript?
-
12:01 - 12:02Maybe we should use the fact that
-
12:02 - 12:04they're unemployed or not, their credit score,
-
12:04 - 12:06their web-surfing behavior,
-
12:06 - 12:08whether they're up late at night.
-
12:08 - 12:11Their Fitbit, when it's able
to identify biochemistries, -
12:11 - 12:15will show that they have aggressive thoughts.
-
12:15 - 12:17We may have algorithms that are likely to predict
-
12:17 - 12:19what we are about to do,
-
12:19 - 12:20and we may be held accountable
-
12:20 - 12:23before we've actually acted.
-
12:23 - 12:25Privacy was the central challenge
-
12:25 - 12:28in a small data era.
-
12:28 - 12:30In the big data age,
-
12:30 - 12:34the challenge will be safeguarding free will,
-
12:34 - 12:38moral choice, human volition,
-
12:38 - 12:41human agency.
-
12:43 - 12:45There is another problem:
-
12:45 - 12:48Big data is going to steal our jobs.
-
12:48 - 12:52Big data and algorithms are going to challenge
-
12:52 - 12:55white collar, professional knowledge work
-
12:55 - 12:57in the 21st century
-
12:57 - 12:59in the same way that factory automation
-
12:59 - 13:01and the assembly line
-
13:01 - 13:04challenged blue collar labor in the 20th century.
-
13:04 - 13:06Think about a lab technician
-
13:06 - 13:08who is looking through a microscope
-
13:08 - 13:09at a cancer biopsy
-
13:09 - 13:12and determining whether it's cancerous or not.
-
13:12 - 13:14The person went to university.
-
13:14 - 13:15The person buys property.
-
13:15 - 13:17He or she votes.
-
13:17 - 13:21He or she is a stakeholder in society.
-
13:21 - 13:22And that person's job,
-
13:22 - 13:24as well as an entire fleet
-
13:24 - 13:26of professionals like that person,
-
13:26 - 13:29is going to find that their jobs are radically changed
-
13:29 - 13:31or actually completely eliminated.
-
13:31 - 13:33Now, we like to think
-
13:33 - 13:36that technology creates jobs over a period of time
-
13:36 - 13:39after a short, temporary period of dislocation,
-
13:39 - 13:41and that is true for the frame of reference
-
13:41 - 13:43with which we all live, the Industrial Revolution,
-
13:43 - 13:46because that's precisely what happened.
-
13:46 - 13:48But we forget something in that analysis:
-
13:48 - 13:50There are some categories of jobs
-
13:50 - 13:53that simply get eliminated and never come back.
-
13:53 - 13:55The Industrial Revolution wasn't very good
-
13:55 - 13:59if you were a horse.
-
13:59 - 14:01So we're going to need to be careful
-
14:01 - 14:05and take big data and adjust it for our needs,
-
14:05 - 14:08our very human needs.
-
14:08 - 14:10We have to be the master of this technology,
-
14:10 - 14:12not its servant.
-
14:12 - 14:15We are just at the outset of the big data era,
-
14:15 - 14:18and honestly, we are not very good
-
14:18 - 14:22at handling all the data that we can now collect.
-
14:22 - 14:25It's not just a problem for
the National Security Agency. -
14:25 - 14:28Businesses collect lots of
data, and they misuse it too, -
14:28 - 14:32and we need to get better at
this, and this will take time. -
14:32 - 14:34It's a little bit like the challenge that was faced
-
14:34 - 14:36by primitive man and fire.
-
14:36 - 14:38This is a tool, but this is a tool that,
-
14:38 - 14:42unless we're careful, will burn us.
-
14:44 - 14:47Big data is going to transform how we live,
-
14:47 - 14:50how we work and how we think.
-
14:50 - 14:52It is going to help us manage our careers
-
14:52 - 14:55and lead lives of satisfaction and hope
-
14:55 - 14:58and happiness and health,
-
14:58 - 15:02but in the past, we've often
looked at information technology -
15:02 - 15:04and our eyes have only seen the T,
-
15:04 - 15:06the technology, the hardware,
-
15:06 - 15:08because that's what was physical.
-
15:08 - 15:11We now need to recast our gaze at the I,
-
15:11 - 15:12the information,
-
15:12 - 15:14which is less apparent,
-
15:14 - 15:18but in some ways a lot more important.
-
15:18 - 15:21Humanity can finally learn from the information
-
15:21 - 15:24that it can collect,
-
15:24 - 15:26as part of our timeless quest
-
15:26 - 15:29to understand the world and our place in it,
-
15:29 - 15:34and that's why big data is a big deal.
-
15:34 - 15:38(Applause)
- Title:
- Big data is better data
- Speaker:
- Kenneth Cukier
- Description:
-
Self-driving cars were just the start. What's the future of big data-driven technology and design? In a thrilling science talk, Kenneth Cukier looks at what's next for machine learning — and human knowledge.
- Video Language:
- English
- Team:
- closed TED
- Project:
- TEDTalks
- Duration:
- 15:51
Morton Bast edited English subtitles for Big data is better data | ||
Adrian Dobroiu commented on English subtitles for Big data is better data | ||
Morton Bast edited English subtitles for Big data is better data | ||
Morton Bast approved English subtitles for Big data is better data | ||
Morton Bast edited English subtitles for Big data is better data | ||
Morton Bast edited English subtitles for Big data is better data | ||
Morton Bast edited English subtitles for Big data is better data | ||
Madeleine Aronson accepted English subtitles for Big data is better data |
Adrian Dobroiu
5:51: and if I were to put censors, maybe 100 censors --- censors -> sensors (twice)