0:00:00.738,0:00:02.735 If you remember that first decade of the web, 0:00:02.735,0:00:04.990 it was really a static place. 0:00:04.990,0:00:07.235 You could go online, you could look at pages, 0:00:07.235,0:00:09.748 and they were put up either by organizations 0:00:09.748,0:00:11.269 who had teams to do it 0:00:11.269,0:00:13.498 or by individuals who were really tech-savvy 0:00:13.498,0:00:15.235 for the time. 0:00:15.235,0:00:16.810 And with the rise of social media 0:00:16.810,0:00:19.209 and social networks in the early 2000s, 0:00:19.209,0:00:21.358 the web was completely changed 0:00:21.358,0:00:24.966 to a place where now the vast majority of content 0:00:24.966,0:00:28.278 we interact with is put up by average users, 0:00:28.278,0:00:30.975 either in YouTube videos or blog posts 0:00:30.975,0:00:34.290 or product reviews or social media postings. 0:00:34.290,0:00:36.637 And it's also become a much more interactive place, 0:00:36.637,0:00:39.274 where people are interacting with others, 0:00:39.274,0:00:40.970 they're commenting, they're sharing, 0:00:40.970,0:00:42.584 they're not just reading. 0:00:42.584,0:00:44.450 So Facebook is not the only place you can do this, 0:00:44.450,0:00:45.548 but it's the biggest, 0:00:45.548,0:00:47.332 and it serves to illustrate the numbers. 0:00:47.332,0:00:50.809 Facebook has 1.2 billion users per month. 0:00:50.809,0:00:52.739 So half the Earth's Internet population 0:00:52.739,0:00:54.392 is using Facebook. 0:00:54.392,0:00:56.324 They are a site, along with others, 0:00:56.324,0:00:59.543 that has allowed people to create an online persona 0:00:59.543,0:01:01.325 with very little technical skill, 0:01:01.325,0:01:03.801 and people responded by putting huge amounts 0:01:03.801,0:01:05.784 of personal data online. 0:01:05.784,0:01:08.327 So the result is that we have behavioral, 0:01:08.327,0:01:10.313 preference, demographic data 0:01:10.313,0:01:12.414 for hundreds of millions of people, 0:01:12.414,0:01:14.440 which is unprecedented in history. 0:01:14.440,0:01:17.000 And as a computer scientist, [br]what this means is that 0:01:17.000,0:01:18.664 I've been able to build models 0:01:18.664,0:01:20.986 that can predict all sorts of hidden attributes 0:01:20.986,0:01:23.270 for all of you that you don't even know 0:01:23.270,0:01:25.472 you're sharing information about. 0:01:25.472,0:01:27.854 As scientists, we use that to help 0:01:27.854,0:01:29.968 the way people interact online, 0:01:29.968,0:01:32.467 but there's less altruistic applications, 0:01:32.467,0:01:34.848 and there's a problem in that users don't really 0:01:34.848,0:01:37.318 understand these techniques and how they work, 0:01:37.318,0:01:40.446 and even if they did, they don't[br]have a lot of control over it. 0:01:40.446,0:01:41.936 So what I want to talk to you about today 0:01:41.936,0:01:44.638 is some of these things that we're able to do, 0:01:44.638,0:01:47.401 and then give us some ideas[br]of how we might go forward 0:01:47.401,0:01:50.170 to move some control back into the hands of users. 0:01:50.170,0:01:51.756 So this is Target, the company. 0:01:51.756,0:01:53.080 I didn't just put that logo 0:01:53.080,0:01:55.250 on this poor, pregnant woman's belly. 0:01:55.250,0:01:57.090 You may have seen this anecdote that was printed 0:01:57.090,0:01:59.151 in Forbes magazine where Target 0:01:59.151,0:02:01.512 sent a flyer to this 15-year-old girl 0:02:01.512,0:02:03.222 with advertisements and coupons 0:02:03.222,0:02:05.776 for baby bottles and diapers and cribs 0:02:05.776,0:02:07.460 two weeks before she told her parents 0:02:07.460,0:02:09.324 that she was pregnant. 0:02:09.324,0:02:12.028 Yeah, the dad was really upset. 0:02:12.028,0:02:13.744 He said, "How did Target figure out 0:02:13.744,0:02:15.568 that this high school girl was pregnant 0:02:15.568,0:02:17.528 before she told her parents?" 0:02:17.528,0:02:20.149 It turns out that they have the purchase history 0:02:20.149,0:02:22.450 for hundreds of thousands of customers 0:02:22.450,0:02:25.180 and they compute what they [br]call a pregnancy score, 0:02:25.180,0:02:27.512 which is not just whether or [br]not a woman's pregnant, 0:02:27.512,0:02:29.242 but what her due date is. 0:02:29.242,0:02:30.546 And they compute that 0:02:30.546,0:02:32.314 not by looking at the obvious things, 0:02:32.314,0:02:34.826 like, she's buying a crib or baby clothes, 0:02:34.826,0:02:37.769 but things like, she bought more vitamins 0:02:37.769,0:02:39.486 than she normally had, 0:02:39.486,0:02:40.950 or she bought a handbag 0:02:40.950,0:02:42.661 that's big enough to hold diapers. 0:02:42.661,0:02:44.571 And by themselves, those purchases don't seem 0:02:44.571,0:02:47.040 like they might reveal a lot, 0:02:47.040,0:02:49.018 but it's a pattern of behavior that, 0:02:49.018,0:02:52.135 when you take it in the context [br]of thousands of other people, 0:02:52.135,0:02:54.892 starts to actually reveal some insights. 0:02:54.892,0:02:56.685 So that's the kind of thing that we do 0:02:56.685,0:02:59.252 when we're predicting stuff[br]about you on social media. 0:02:59.252,0:03:02.048 We're looking for little[br]patterns of behavior that, 0:03:02.048,0:03:04.730 when you detect them among millions of people, 0:03:04.730,0:03:07.436 lets us find out all kinds of things. 0:03:07.436,0:03:09.183 So in my lab and with colleagues, 0:03:09.183,0:03:10.960 we've developed mechanisms where we can 0:03:10.960,0:03:12.520 quite accurately predict things 0:03:12.520,0:03:14.245 like your political preference, 0:03:14.245,0:03:17.997 your personality score, gender, sexual orientation, 0:03:17.997,0:03:20.870 religion, age, intelligence, 0:03:20.870,0:03:22.264 along with things like 0:03:22.264,0:03:24.201 how much you trust the people you know 0:03:24.201,0:03:26.005 and how strong those relationships are. 0:03:26.005,0:03:27.790 We can do all of this really well. 0:03:27.790,0:03:29.987 And again, it doesn't come from what you might 0:03:29.987,0:03:32.089 think of as obvious information. 0:03:32.089,0:03:34.370 So my favorite example is from this study 0:03:34.370,0:03:35.610 that was published this year 0:03:35.610,0:03:37.405 in the Proceedings of the National Academies. 0:03:37.405,0:03:38.690 If you Google this, you'll find it. 0:03:38.690,0:03:40.562 It's four pages, easy to read. 0:03:40.562,0:03:43.565 And they looked at just people's Facebook likes, 0:03:43.565,0:03:45.485 so just the things you like on Facebook, 0:03:45.485,0:03:47.623 and used that to predict all these attributes, 0:03:47.623,0:03:49.268 along with some other ones. 0:03:49.268,0:03:52.229 And in their paper they listed the five likes 0:03:52.229,0:03:55.016 that were most indicative of high intelligence. 0:03:55.016,0:03:57.340 And among those was liking a page 0:03:57.340,0:03:59.245 for curly fries. (Laughter) 0:03:59.245,0:04:01.338 Curly fries are delicious, 0:04:01.338,0:04:03.868 but liking them does not necessarily mean 0:04:03.868,0:04:05.948 that you're smarter than the average person. 0:04:05.948,0:04:09.155 So how is it that one of the strongest indicators 0:04:09.155,0:04:10.725 of your intelligence 0:04:10.725,0:04:12.172 is liking this page 0:04:12.172,0:04:14.424 when the content is totally irrelevant 0:04:14.424,0:04:16.951 to the attribute that's being predicted? 0:04:16.951,0:04:18.535 And it turns out that we have to look at 0:04:18.535,0:04:20.153 a whole bunch of underlying theories 0:04:20.153,0:04:22.722 to see why we're able to do this. 0:04:22.722,0:04:25.635 One of them is a sociological[br]theory called homophily, 0:04:25.635,0:04:28.727 which basically says people are[br]friends with people like them. 0:04:28.727,0:04:30.741 So if you're smart, you tend to[br]be friends with smart people, 0:04:30.741,0:04:33.371 and if you're young, you tend[br]to be friends with young people, 0:04:33.371,0:04:34.998 and this is well established 0:04:34.998,0:04:36.743 for hundreds of years. 0:04:36.743,0:04:37.975 We also know a lot 0:04:37.975,0:04:40.525 about how information spreads through networks. 0:04:40.525,0:04:42.279 It turns out things like viral videos 0:04:42.279,0:04:44.685 or Facebook likes or other information 0:04:44.685,0:04:46.573 spreads in exactly the same way 0:04:46.573,0:04:49.027 that diseases spread through social networks. 0:04:49.027,0:04:50.818 So this is something we've studied for a long time. 0:04:50.818,0:04:52.394 We have good models of it. 0:04:52.394,0:04:54.551 And so you can put those things together 0:04:54.551,0:04:57.639 and start seeing why things like this happen. 0:04:57.639,0:04:59.453 So if I were to give you a hypothesis, 0:04:59.453,0:05:02.680 it would be that a smart guy started this page, 0:05:02.680,0:05:04.619 or maybe one of the first people who liked it 0:05:04.619,0:05:06.355 would have scored high on that test. 0:05:06.355,0:05:08.643 And they liked it, and their friends saw it, 0:05:08.643,0:05:11.765 and by homophily, we know that[br]he probably had smart friends, 0:05:11.765,0:05:14.821 and so it spread to them, [br]and some of them liked it, 0:05:14.821,0:05:16.010 and they had smart friends, 0:05:16.010,0:05:16.817 and so it spread to them, 0:05:16.817,0:05:18.790 and so it propagated through the network 0:05:18.790,0:05:21.359 to a host of smart people, 0:05:21.359,0:05:23.415 so that by the end, the action 0:05:23.415,0:05:25.959 of liking the curly fries page 0:05:25.959,0:05:27.574 is indicative of high intelligence, 0:05:27.574,0:05:29.377 not because of the content, 0:05:29.377,0:05:31.899 but because the actual action of liking 0:05:31.899,0:05:33.799 reflects back the common attributes 0:05:33.799,0:05:36.267 of other people who have done it. 0:05:36.267,0:05:39.164 So this is pretty complicated stuff, right? 0:05:39.164,0:05:41.363 It's a hard thing to sit down and explain 0:05:41.363,0:05:44.211 to an average user, and even if you do, 0:05:44.211,0:05:46.399 what can the average user do about it? 0:05:46.399,0:05:48.447 How do you know that [br]you've liked something 0:05:48.447,0:05:49.939 that indicates a trait for you 0:05:49.939,0:05:53.484 that's totally irrelevant to the[br]content of what you've liked? 0:05:53.484,0:05:56.030 There's a lot of power that users don't have 0:05:56.030,0:05:58.260 to control how this data is used. 0:05:58.260,0:06:01.372 And I see that as a real [br]problem going forward. 0:06:01.372,0:06:03.349 So I think there's a couple paths 0:06:03.349,0:06:04.350 that we want to look at 0:06:04.350,0:06:06.260 if we want to give users some control 0:06:06.260,0:06:08.000 over how this data is used, 0:06:08.000,0:06:09.940 because it's not always going to be used 0:06:09.940,0:06:11.321 for their benefit. 0:06:11.321,0:06:12.743 An example I often give is that, 0:06:12.743,0:06:14.389 if I ever get bored being a professor, 0:06:14.389,0:06:16.042 I'm going to go start a company 0:06:16.042,0:06:17.496 that predicts all of these attributes 0:06:17.496,0:06:19.098 and things like how well you work in teams 0:06:19.098,0:06:21.769 and if you're a drug user, if you're an alcoholic. 0:06:21.769,0:06:23.209 We know how to predict all that. 0:06:23.209,0:06:24.970 And I'm going to sell reports 0:06:24.970,0:06:27.070 to H.R. companies and big businesses 0:06:27.070,0:06:29.343 that want to hire you. 0:06:29.343,0:06:30.520 We totally can do that now. 0:06:30.520,0:06:32.308 I could start that business tomorrow, 0:06:32.308,0:06:34.360 and you would have[br]absolutely no control 0:06:34.360,0:06:36.498 over me using your data like that. 0:06:36.498,0:06:38.790 That seems to me to be a problem. 0:06:38.790,0:06:40.700 So one of the paths we can go down 0:06:40.700,0:06:42.732 is the policy and law path. 0:06:42.732,0:06:45.778 And in some respects, I think[br]that that would be most effective, 0:06:45.778,0:06:48.534 but the problem is we'd[br]actually have to do it. 0:06:48.534,0:06:51.314 Observing our political process in action 0:06:51.314,0:06:53.693 makes me think it's highly unlikely 0:06:53.693,0:06:55.290 that we're going to get a bunch of representatives 0:06:55.290,0:06:57.276 to sit down, learn about this, 0:06:57.276,0:06:59.382 and then enact sweeping changes 0:06:59.382,0:07:01.539 to intellectual property law in the U.S. 0:07:01.539,0:07:04.000 so users control their data. 0:07:04.000,0:07:05.304 We could go the policy route, 0:07:05.304,0:07:06.783 where social media companies say, 0:07:06.783,0:07:08.185 you know what? You own your data. 0:07:08.185,0:07:10.674 You have total control over how it's used. 0:07:10.674,0:07:12.522 The problem is that the revenue models 0:07:12.522,0:07:14.246 for most social media companies 0:07:14.246,0:07:18.277 rely on sharing or exploiting [br]users' data in some way. 0:07:18.277,0:07:20.110 It's sometimes said of Facebook that the users 0:07:20.110,0:07:22.638 aren't the customer, they're the product. 0:07:22.638,0:07:25.352 And so how do you get a company 0:07:25.352,0:07:27.910 to cede control of their main asset 0:07:27.910,0:07:29.159 back to the users? 0:07:29.159,0:07:30.860 It's possible, but I don't think it's something 0:07:30.860,0:07:33.180 that we're going to see change quickly. 0:07:33.180,0:07:34.680 So I think the other path 0:07:34.680,0:07:36.968 that we can go down that's[br]going to be more effective 0:07:36.968,0:07:38.476 is one of more science. 0:07:38.476,0:07:40.986 It's doing science that allowed us to develop 0:07:40.986,0:07:42.736 all these mechanisms for computing 0:07:42.736,0:07:44.788 this personal data in the first place. 0:07:44.788,0:07:46.894 And it's actually very similar research 0:07:46.894,0:07:48.332 that we'd have to do 0:07:48.332,0:07:50.718 if we want to develop mechanisms 0:07:50.718,0:07:52.139 that can say to a user, 0:07:52.139,0:07:54.368 "Here's the risk of that action you just took." 0:07:54.368,0:07:56.448 By liking that Facebook page, 0:07:56.448,0:07:58.983 or by sharing this piece of personal information, 0:07:58.983,0:08:00.485 you've now improved my ability 0:08:00.485,0:08:02.571 to predict whether or not you're using drugs 0:08:02.571,0:08:05.433 or whether or not you get[br]along well in the workplace. 0:08:05.433,0:08:07.281 And that, I think, can affect whether or not 0:08:07.281,0:08:08.791 people want to share something, 0:08:08.791,0:08:12.030 keep it private, or just keep it offline altogether. 0:08:12.030,0:08:13.593 We can also look at things like 0:08:13.593,0:08:16.321 allowing people to encrypt data that they upload, 0:08:16.321,0:08:18.176 so it's kind of invisible and worthless 0:08:18.176,0:08:19.607 to sites like Facebook 0:08:19.607,0:08:22.236 or third party services that access it, 0:08:22.236,0:08:25.483 but that select users who the person who posted it 0:08:25.483,0:08:28.153 want to see it have access to see it. 0:08:28.153,0:08:30.319 This is all super exciting research 0:08:30.319,0:08:31.939 from an intellectual perspective, 0:08:31.939,0:08:33.798 and so scientists are going to be willing to do it. 0:08:33.798,0:08:37.408 So that gives us an advantage over the law side. 0:08:37.408,0:08:39.133 One of the problems that people bring up 0:08:39.133,0:08:40.728 when I talk about this is, they say, 0:08:40.728,0:08:43.374 you know, if people start[br]keeping all this data private, 0:08:43.374,0:08:45.487 all those methods that you've been developing 0:08:45.487,0:08:48.140 to predict their traits are going to fail. 0:08:48.140,0:08:51.660 And I say, absolutely, and for me, that's success, 0:08:51.660,0:08:53.446 because as a scientist, 0:08:53.446,0:08:57.134 my goal is not to infer information about users, 0:08:57.134,0:08:59.901 it's to improve the way people interact online. 0:08:59.901,0:09:03.119 And sometimes that involves[br]inferring things about them, 0:09:03.119,0:09:06.141 but if users don't want me to use that data, 0:09:06.141,0:09:08.179 I think they should have the right to do that. 0:09:08.179,0:09:10.830 I want users to be informed and consenting 0:09:10.830,0:09:12.942 users of the tools that we develop. 0:09:12.942,0:09:15.894 And so I think encouraging this kind of science 0:09:15.894,0:09:17.240 and supporting researchers 0:09:17.240,0:09:20.263 who want to cede some of that control back to users 0:09:20.263,0:09:22.574 and away from the social media companies 0:09:22.574,0:09:25.245 means that going forward, as these tools evolve 0:09:25.245,0:09:26.721 and advance, 0:09:26.721,0:09:28.135 means that we're going to have an educated 0:09:28.135,0:09:29.829 and empowered user base, 0:09:29.829,0:09:30.929 and I think all of us can agree 0:09:30.929,0:09:33.493 that that's a pretty ideal way to go forward. 0:09:33.493,0:09:35.677 Thank you. 0:09:35.677,0:09:38.757 (Applause)