WEBVTT 00:00:00.079 --> 00:00:06.270 but ever since these companies started 00:00:03.020 --> 00:00:10.559 amassing our data the clock has been 00:00:06.270 --> 00:00:13.049 ticking in 2008 we showed you how you 00:00:10.559 --> 00:00:16.049 and your friends 'facebook data could be 00:00:13.049 --> 00:00:21.390 accessed by a rogue facebook application 00:00:16.049 --> 00:00:23.369 without consent in 2011 a researcher 00:00:21.390 --> 00:00:26.189 called Mikael Kaczynski warns that if 00:00:23.369 --> 00:00:28.710 computers could analyze enough Facebook 00:00:26.189 --> 00:00:30.570 data from enough people they could spot 00:00:28.710 --> 00:00:33.660 connections between the way you act 00:00:30.570 --> 00:00:38.600 online and your personality traits the 00:00:33.660 --> 00:00:38.600 type of person you are what's really 00:00:38.930 --> 00:00:43.739 world-changing about those algorithms 00:00:41.250 --> 00:00:46.190 it's that they can take your music 00:00:43.739 --> 00:00:48.809 preferences or your book preferences and 00:00:46.190 --> 00:00:51.149 extract from these seemingly innocent 00:00:48.809 --> 00:00:53.219 information very accurate predictions 00:00:51.149 --> 00:00:56.070 about your religiosity leadership 00:00:53.219 --> 00:00:58.770 potential political views personality 00:00:56.070 --> 00:01:00.539 and so on by having hundreds and 00:00:58.770 --> 00:01:02.670 hundreds of thousands of Americans 00:01:00.539 --> 00:01:05.188 undertake this survey we were able to 00:01:02.670 --> 00:01:07.229 form a model to predict the personality 00:01:05.188 --> 00:01:11.189 of every single adult in the United 00:01:07.229 --> 00:01:13.170 States of America by 2016 alexander nix 00:01:11.189 --> 00:01:15.990 was explaining how cambridge analytic 00:01:13.170 --> 00:01:18.180 could use this kind of research to find 00:01:15.990 --> 00:01:21.119 people of different personality types 00:01:18.180 --> 00:01:24.030 and target them with specific messages 00:01:21.119 --> 00:01:26.460 that might influence their behavior if 00:01:24.030 --> 00:01:28.439 you know that the personality of the 00:01:26.460 --> 00:01:30.420 people you're targeting you can nuance 00:01:28.439 --> 00:01:32.790 your messaging to resonate more 00:01:30.420 --> 00:01:35.400 effectively with those key audience 00:01:32.790 --> 00:01:37.530 groups because it's personality that 00:01:35.400 --> 00:01:40.860 drives behavior and behavior that 00:01:37.530 --> 00:01:43.110 obviously influences how you vote soon 00:01:40.860 --> 00:01:46.110 afterwards these techniques were used by 00:01:43.110 --> 00:01:48.649 two political campaigns that would rock 00:01:46.110 --> 00:01:48.649 the world 00:01:49.860 --> 00:01:55.540 yes your likes and dislikes your 00:01:53.049 --> 00:01:58.720 comments and posts your personal data 00:01:55.540 --> 00:02:02.920 they are valuable but it's what they say 00:01:58.720 --> 00:02:07.420 about you as a person that's where the 00:02:02.920 --> 00:02:09.008 real power lies no one knows exactly how 00:02:07.420 --> 00:02:11.590 much these techniques actually 00:02:09.008 --> 00:02:14.080 contributed to the results of the votes 00:02:11.590 --> 00:02:16.450 one of the first researchers to ask the 00:02:14.080 --> 00:02:18.250 question was Paul Olivier - hey 00:02:16.450 --> 00:02:20.290 he works on an article at the end of 00:02:18.250 --> 00:02:22.150 2016 that investigated what was 00:02:20.290 --> 00:02:24.459 happening and this week he was here in 00:02:22.150 --> 00:02:28.150 London to give evidence to MPs about the 00:02:24.459 --> 00:02:29.920 latest revelations sitting alongside him 00:02:28.150 --> 00:02:31.690 in the Commons Select Committee was 00:02:29.920 --> 00:02:34.900 Cambridge analytic a whistleblower 00:02:31.690 --> 00:02:39.100 Christopher Wylie and straight after the 00:02:34.900 --> 00:02:40.420 session Paul sat down with me this isn't 00:02:39.100 --> 00:02:41.890 just about Facebook and this isn't just 00:02:40.420 --> 00:02:43.870 about Cambridge analytic where is it 00:02:41.890 --> 00:02:45.430 this kind of data collection analysis 00:02:43.870 --> 00:02:47.410 has been going on for a long time 00:02:45.430 --> 00:02:50.200 and it's being done by lots of people 00:02:47.410 --> 00:02:53.680 right so it's in two ways it's not just 00:02:50.200 --> 00:02:55.150 about those companies Facebook enables a 00:02:53.680 --> 00:02:58.540 lot more companies than just camber 00:02:55.150 --> 00:03:01.090 charity care to suck out data in similar 00:02:58.540 --> 00:03:03.130 ways so that's the first thing and then 00:03:01.090 --> 00:03:05.380 Facebook is just one player in a big 00:03:03.130 --> 00:03:08.560 ecosystem of online advertising online 00:03:05.380 --> 00:03:10.900 profiling some of the companies you have 00:03:08.560 --> 00:03:13.660 heard of but some of them you just have 00:03:10.900 --> 00:03:15.310 no relationship with even if you fully 00:03:13.660 --> 00:03:17.109 understand the terms and conditions that 00:03:15.310 --> 00:03:19.690 you're agreeing to about what data 00:03:17.109 --> 00:03:21.640 you're sharing I don't think anyone 00:03:19.690 --> 00:03:25.060 really understood what can be inferred 00:03:21.640 --> 00:03:27.790 from the data so not the list of your 00:03:25.060 --> 00:03:28.959 friends not your likes and dislikes but 00:03:27.790 --> 00:03:31.239 the things that you've never talked 00:03:28.959 --> 00:03:33.340 about that now they can tell from your 00:03:31.239 --> 00:03:34.930 digital footprint yeah it's really hard 00:03:33.340 --> 00:03:36.970 to understand the inference power of 00:03:34.930 --> 00:03:39.100 this data what can be deduced from it 00:03:36.970 --> 00:03:41.709 that's true how people make decisions 00:03:39.100 --> 00:03:43.329 basically whether they think about about 00:03:41.709 --> 00:03:45.790 the issue but before making a decision 00:03:43.329 --> 00:03:47.850 or not another way to say this is that 00:03:45.790 --> 00:03:50.950 they were trying to find gullible people 00:03:47.850 --> 00:03:52.750 so if you are able to do that you can 00:03:50.950 --> 00:03:55.780 just make them you know buy into 00:03:52.750 --> 00:04:00.819 anything into any content it's easy to 00:03:55.780 --> 00:04:02.310 believe that Facebook managed to swing 00:04:00.819 --> 00:04:05.450 the u.s. election 00:04:02.310 --> 00:04:09.270 to swing brexit was only people on 00:04:05.450 --> 00:04:10.680 Facebook who saw these new ads that were 00:04:09.270 --> 00:04:13.080 targeted to them and then went out and 00:04:10.680 --> 00:04:14.880 possibly changed their votes is that 00:04:13.080 --> 00:04:16.560 what we're talking about or are we 00:04:14.880 --> 00:04:18.600 talking about Facebook just being used 00:04:16.560 --> 00:04:21.690 as a research tool that could then be 00:04:18.600 --> 00:04:23.940 applied to the wider community in in 00:04:21.690 --> 00:04:25.530 many ways whether people were 00:04:23.940 --> 00:04:27.510 individually convinced to vote 00:04:25.530 --> 00:04:30.300 differently I don't personally believe 00:04:27.510 --> 00:04:32.220 that's how it happens what I believe is 00:04:30.300 --> 00:04:34.919 that Facebook itself could be 00:04:32.220 --> 00:04:37.350 manipulated using those techniques to 00:04:34.919 --> 00:04:39.810 make some content go viral that would 00:04:37.350 --> 00:04:42.210 affect public discourse so that would 00:04:39.810 --> 00:04:45.060 steer the conversation basically and if 00:04:42.210 --> 00:04:47.130 you're able to do this more or less you 00:04:45.060 --> 00:04:49.229 know automated more less repetitive in a 00:04:47.130 --> 00:04:50.790 repetitive fashion then you've partly 00:04:49.229 --> 00:04:52.110 already won the election because you're 00:04:50.790 --> 00:04:55.229 steering the conversation around the 00:04:52.110 --> 00:04:56.610 election and that's precisely the point 00:04:55.229 --> 00:04:58.350 that Hillary Clinton has been making 00:04:56.610 --> 00:05:00.240 again and again about camera gigantica 00:04:58.350 --> 00:05:02.400 is their ability to steer the 00:05:00.240 --> 00:05:08.760 conversation on specific topics like 00:05:02.400 --> 00:05:10.320 emails and that's that had an impact the 00:05:08.760 --> 00:05:12.510 fact that some content was Reshard 00:05:10.320 --> 00:05:15.000 widely during the election had an impact 00:05:12.510 --> 00:05:17.490 on editorial decisions made by classic 00:05:15.000 --> 00:05:19.260 media more established media which in 00:05:17.490 --> 00:05:24.390 turn had an impact on you know other 00:05:19.260 --> 00:05:26.130 people's opinions Paul says that even 00:05:24.390 --> 00:05:27.870 though Facebook and Google have recently 00:05:26.130 --> 00:05:30.260 allowed us to download everything that 00:05:27.870 --> 00:05:33.600 they have on us it's not really 00:05:30.260 --> 00:05:35.700 everything so Facebook can collect data 00:05:33.600 --> 00:05:36.479 of people who don't have Facebook 00:05:35.700 --> 00:05:40.139 accounts 00:05:36.479 --> 00:05:41.970 yeah it's called shadow profiles yeah so 00:05:40.139 --> 00:05:45.479 that practice for instance has been 00:05:41.970 --> 00:05:48.090 forbidden in Belgium where I'm from they 00:05:45.479 --> 00:05:50.970 even people who do have an account are 00:05:48.090 --> 00:05:53.220 being tracked all over the web all over 00:05:50.970 --> 00:05:55.050 that same information is collected about 00:05:53.220 --> 00:05:57.270 them why can't they see it why can't 00:05:55.050 --> 00:06:00.479 they see all the web pages that Facebook 00:05:57.270 --> 00:06:02.430 knows they visited before making that 00:06:00.479 --> 00:06:05.370 transparent will have a very dramatic 00:06:02.430 --> 00:06:07.979 effect I think in making people aware of 00:06:05.370 --> 00:06:11.370 how much tracking goes on do you think 00:06:07.979 --> 00:06:13.540 that UK or EU regulation is strong 00:06:11.370 --> 00:06:16.510 enough when it comes to protecting 00:06:13.540 --> 00:06:18.730 our data that's what part of what I 00:06:16.510 --> 00:06:20.260 wanted to say in the committee we have 00:06:18.730 --> 00:06:22.600 very strong regulations around personal 00:06:20.260 --> 00:06:24.280 data that are going to get stronger but 00:06:22.600 --> 00:06:26.890 it's completely useless and actually 00:06:24.280 --> 00:06:29.050 worse than not having them if we're not 00:06:26.890 --> 00:06:30.760 going to enforce them they need to be 00:06:29.050 --> 00:06:32.680 enforced that's the critical point we're 00:06:30.760 --> 00:06:34.960 currently things are failing why are 00:06:32.680 --> 00:06:36.580 they not being enforced because the 00:06:34.960 --> 00:06:38.560 regulator's currently see their role as 00:06:36.580 --> 00:06:40.780 balancing commercial interests with 00:06:38.560 --> 00:06:43.450 Democratic interest around oversight of 00:06:40.780 --> 00:06:46.150 personal data and that balancing they've 00:06:43.450 --> 00:06:48.040 done so far was wrong or simply wrong 00:06:46.150 --> 00:06:51.130 too much on the side of commercial 00:06:48.040 --> 00:06:54.760 interests and not enough the 00:06:51.130 --> 00:06:57.850 counterbalances issue on Facebook's 00:06:54.760 --> 00:06:59.950 reputation and its wealth has taken a 00:06:57.850 --> 00:07:02.980 massive hit in the last couple of weeks 00:06:59.950 --> 00:07:05.620 with 80 billion dollars being wiped off 00:07:02.980 --> 00:07:07.600 its value so can the recently announced 00:07:05.620 --> 00:07:12.420 new privacy tools help to restore 00:07:07.600 --> 00:07:14.710 confidence this is the end for Facebook 00:07:12.420 --> 00:07:16.780 Facebook can still adapt their ways they 00:07:14.710 --> 00:07:18.610 can still change they will have to 00:07:16.780 --> 00:07:22.090 anyway because of the regulation that's 00:07:18.610 --> 00:07:24.010 coming into force it's an opportunity to 00:07:22.090 --> 00:07:26.340 revenge themselves if you want to say it 00:07:24.010 --> 00:07:26.340 that way