0:00:14.641,0:00:16.809 All right, thanks Chris[br]and thanks for having me here. 0:00:16.809,0:00:19.441 Can everybody hear me OK? 0:00:21.665,0:00:24.825 So, today I'm going to talk 0:00:24.825,0:00:27.094 about a type of data[br]we're all very familiar with 0:00:27.094,0:00:30.861 and I think most of us like, intrinsically. 0:00:30.861,0:00:34.878 And that is geographic data[br]and particularly imagery of the Earth. 0:00:34.878,0:00:38.131 And we've already seen[br]some examples of that today. 0:00:38.131,0:00:41.839 I'm going to start[br]with a little show and tell here. 0:00:41.845,0:00:44.305 I had to bring a prop, I couldn't resist. 0:00:44.305,0:00:51.144 My old Macintosh Power Book 145 from 1992. 0:00:51.144,0:00:56.052 This is the first computer[br]I had with a hard drive. 0:00:56.052,0:00:58.747 It came with 4, was it 4? 0:00:58.747,0:01:03.534 Actually with about 6 megabytes of memory. 0:01:03.534,0:01:07.434 That was a big deal[br]and I was just blown away. 0:01:07.434,0:01:09.376 I couldn't believe I had that much memory. 0:01:09.376,0:01:11.240 I'm just going to put this back here. 0:01:11.240,0:01:14.610 I couldn't believe[br]I had that much memory at my fingertips. 0:01:14.610,0:01:16.953 Today, we now all have computers. 0:01:16.979,0:01:19.178 I can go down to Staples[br]and buy a computer 0:01:19.178,0:01:22.143 that has a quarter million times[br]more memory than that 0:01:22.143,0:01:25.149 for about $400 or $600[br]or something like that. 0:01:25.149,0:01:27.930 Times have changed[br]and that's 20 years ago. 0:01:27.930,0:01:31.726 As a result,[br]with all this increased computing power, 0:01:31.726,0:01:34.393 we are drowning in data. 0:01:34.393,0:01:36.236 We're just absolutely drowning in data. 0:01:36.236,0:01:38.775 And one of the types of data[br]we are most drowning in 0:01:38.775,0:01:41.931 is remote sensing[br]or imagery data, satellite data, 0:01:41.931,0:01:44.127 aerial data, things like that. 0:01:44.127,0:01:46.345 And we've all played around[br]with this, I'm sure. 0:01:46.345,0:01:49.015 We all love Google Earth,[br]it's free and it's fun. 0:01:49.015,0:01:51.728 And it's just teaming with imagery. 0:01:51.728,0:01:54.276 So, what do we do with all this stuff? 0:01:54.276,0:01:56.065 How do we make use of this? 0:01:56.065,0:01:59.014 Here's an image of Baltimore.[br]This is urban Baltimore. 0:01:59.061,0:02:00.856 It's got all these great objects. 0:02:00.856,0:02:02.675 I can look in there and see, 0:02:02.675,0:02:04.140 it's hard with this projector, 0:02:04.163,0:02:06.525 but I can see trees and buildings,[br]things like that. 0:02:06.549,0:02:07.890 And let's just say 0:02:07.916,0:02:12.144 I wanted to actually do some kind[br]of a quantitative study with that. 0:02:12.144,0:02:14.262 Say I had to do something[br]that required knowing 0:02:14.262,0:02:15.580 where the trees really were. 0:02:15.580,0:02:17.200 I can see where the trees are, 0:02:17.200,0:02:20.392 but the computer doesn't know,[br]it has no clue what a tree is. 0:02:20.392,0:02:23.021 Let's just say I wanted to do[br]something like, 0:02:23.021,0:02:25.114 these are actually[br]the locations of crimes, 0:02:25.114,0:02:29.114 say I wanted to know[br]if the density of trees affects crime. 0:02:29.114,0:02:32.291 There's no way I can do that[br]with imagery the way we have it now, 0:02:32.291,0:02:35.894 in a computing environment. 0:02:35.894,0:02:38.645 Part of the reason for this 0:02:38.645,0:02:41.395 is that computers[br]aren't really good at recognizing things 0:02:41.395,0:02:43.348 the way that we can recognise things. 0:02:43.348,0:02:47.911 We are excellent are recognizing[br]things with very slight differences. 0:02:47.911,0:02:50.572 I can tell you within two seconds 0:02:50.572,0:02:53.359 that that's George Carlin[br]and that's Sigmund Freud. 0:02:53.359,0:02:56.148 That that is the Big Lebowski[br]and that is Eddie Vedder. 0:02:56.148,0:02:59.896 For me to train a computer[br]to recognise the difference 0:02:59.896,0:03:03.644 between the Big Lebowski, a.k.a. the Dude[br]and Eddie Vedder, 0:03:03.644,0:03:07.394 would take me unbelievable amounts[br]of time to do. 0:03:07.394,0:03:11.648 Yet I can do that instantly,[br]so that's an issue right here. 0:03:11.648,0:03:14.866 So, let's cut to the chase here. 0:03:14.866,0:03:17.798 On the left I have raw data, 0:03:17.824,0:03:20.510 color infrared remote sense imagery. 0:03:20.510,0:03:26.557 On the right I have[br]a classified GIS layer. 0:03:26.573,0:03:29.644 That is usable information. 0:03:29.644,0:03:33.944 The computer knows what's grass,[br]what's buildings and knows what's trees. 0:03:33.944,0:03:36.976 How do I get from one to the other? 0:03:36.976,0:03:41.223 This is a major major conundrum[br]in today's world of high resolution data. 0:03:41.223,0:03:44.192 Here's an image[br]of just a typical suburban area. 0:03:44.192,0:03:46.144 I look at it and I see[br]all sort of features 0:03:46.160,0:03:48.325 and I see that it is at a[br]very fine resolution. 0:03:48.325,0:03:52.063 If I were to be working[br]with remote sensing data 15 years ago, 0:03:52.063,0:03:54.785 I'd have coarse resolution imagery. 0:03:54.811,0:03:57.224 This is the same exact location[br]using 30 meter pixels. 0:03:57.224,0:04:01.980 Back then, classifying this stuff[br]was a qualitatively different thing, 0:04:01.980,0:04:05.509 because all I really needed to do[br]is get in the general ball park. 0:04:05.509,0:04:08.315 These pixels here[br]are sort of generally urbanized, 0:04:08.315,0:04:11.121 these pixels here are generally forested. 0:04:11.121,0:04:14.707 I didn't really have to know[br]about the specific identity of objects. 0:04:14.707,0:04:18.587 Now fast forward to today[br]and I've got imagery which I can zoom in 0:04:18.587,0:04:23.472 and I can see[br]a million different types of objects. 0:04:23.472,0:04:29.917 From cars in a parking lot[br]to shipping containers, 0:04:29.917,0:04:33.818 to the cranes[br]at the shipping containers facility 0:04:33.858,0:04:37.556 to the mechanicals on top of a building 0:04:37.937,0:04:39.270 to, I love this one, 0:04:39.270,0:04:42.908 this is the Sphinx[br]at the Luxor Hotel in Las Vegas. 0:04:42.908,0:04:45.088 Try telling a computer what that is. 0:04:45.088,0:04:48.504 Here's another Vegas one,[br]I love Google Earth in Vegas, 0:04:48.504,0:04:50.024 it's the best, it's so much fun. 0:04:50.024,0:04:52.746 This is a tropical fish-shaped pool. 0:04:52.793,0:04:56.218 Again, not so easy to tell a computer[br]what that is. 0:04:56.250,0:04:57.710 And this is the best of all. 0:04:57.757,0:05:03.103 I believe this is high-resolution[br]satellite imagery 0:05:03.103,0:05:07.496 of camels in the middle of Africa[br]and it's part of this Google, 0:05:07.496,0:05:11.889 National Geographic,[br]Africa mega fly-over project. 0:05:11.889,0:05:14.056 The number of possible things 0:05:14.082,0:05:16.282 I have to prepare computers[br]to be ready for, 0:05:16.282,0:05:18.687 the types of objects[br]out on the surface of the Earth 0:05:18.713,0:05:20.345 surface of the Earth,[br]is staggering. 0:05:20.745,0:05:25.208 And it's a project[br]that we'll never see finished, 0:05:25.208,0:05:29.381 that project of teaching computers[br]the artificial intelligence, 0:05:29.381,0:05:31.663 giving them the artificial intelligence[br]they need 0:05:31.687,0:05:33.885 to recognize all of this variation[br]on the Earth. 0:05:33.909,0:05:35.353 Here's another even better one. 0:05:35.369,0:05:38.114 This is actually a real thing,[br]this is the Colonel. 0:05:38.114,0:05:42.733 Someone actually did mega art[br]of the Colonel in the middle of Nevada. 0:05:43.233,0:05:46.371 It's interesting[br]how many of these come from Nevada. 0:05:46.400,0:05:48.828 (Laughter) 0:05:49.471,0:05:51.618 So, let's explain why it's difficult 0:05:51.618,0:05:55.125 to use the methods[br]we've always used in the past 0:05:55.125,0:05:58.302 for this generation[br]of high resolution imagery. 0:05:58.302,0:06:01.337 Here's a high resolution image[br]of Burlington. 0:06:01.337,0:06:06.232 If I try to classify each pixel,[br]pixel by pixel, 0:06:06.232,0:06:10.710 I get this awful pixelated[br]meaningless gobbledygook. 0:06:10.710,0:06:14.370 If I look at a particular object[br]like a house, that house is made up, 0:06:14.370,0:06:17.410 it's hard to see, but it's just dozens[br]of different pixel values 0:06:17.410,0:06:19.005 that don't really mean anything. 0:06:19.029,0:06:23.784 If I take a single tree, again,[br]is made up of a gobbledygook of pixels. 0:06:23.784,0:06:26.802 Now, if I zoom in on that tree,[br]for instance, 0:06:26.802,0:06:33.312 I will see that it's made up of pixels,[br]lots of different tones, different colors, 0:06:33.312,0:06:37.837 and if this is the direct representation[br]in classified pixels, 0:06:37.837,0:06:39.332 it's meaningless, right? 0:06:39.332,0:06:40.949 This is not finding objects. 0:06:40.949,0:06:45.969 So, I need to teach a computer[br]to see objects and to think like me. 0:06:46.017,0:06:48.795 So this means teaching a computer[br]to think like a human, 0:06:48.795,0:06:52.148 which means working[br]based on shape size, tone, pattern, 0:06:52.148,0:06:55.310 texture, site and association,[br]a lot of this is all spatial. 0:06:55.310,0:07:00.450 We have to stop thinking pixel by pixel[br]and start thinking of things spatially. 0:07:00.450,0:07:04.265 That means taking an image[br]and, what's called segmenting it, 0:07:04.265,0:07:05.757 turning it into objects. 0:07:05.788,0:07:08.694 And the process of segmenting it[br]is very difficult. 0:07:08.694,0:07:12.763 You have to train a computer[br]to segment imagery correctly. 0:07:12.763,0:07:15.496 And if I look, here is a house,[br]there's one side of the roof 0:07:15.496,0:07:17.911 and another side of the roof,[br]there is the driveway, 0:07:17.939,0:07:19.727 they're segmented as different objects 0:07:19.753,0:07:21.702 and I can then re-aggregate those objects 0:07:21.702,0:07:25.181 into something that is just a house[br]and another one that is just a driveway. 0:07:25.181,0:07:27.761 And at the end of the day,[br]what I'm going to end up with 0:07:27.787,0:07:29.107 is something like this. 0:07:29.107,0:07:31.829 I will be able to tell you the difference. 0:07:31.845,0:07:37.503 Even though the spectral signature[br]is the same of this roof and this road, 0:07:37.598,0:07:40.845 I know that their compactness[br]factor is different, 0:07:40.845,0:07:43.568 and because of that,[br]because of the shape metrics of them, 0:07:43.568,0:07:46.291 I can tell you which one's a roof[br]and which one's a road, 0:07:46.291,0:07:49.015 and I can start[br]classifying things in that way. 0:07:52.845,0:07:55.646 To do this requires huge rule sets. 0:07:55.646,0:08:01.125 The rule sets could be dozens and dozens[br]to over a hundred pages long 0:08:01.125,0:08:06.186 of all these classification rules[br]and I won't bore you with the details. 0:08:06.186,0:08:09.687 I also will make use of[br]a lot of ancillary data. 0:08:09.687,0:08:13.136 There's all sort of great GIS data[br]that helps me classify things now. 0:08:13.136,0:08:16.174 Most cities are collecting things[br]about building footprints, 0:08:16.174,0:08:18.721 we know where parcels are,[br]we know where sewer lines are 0:08:18.737,0:08:20.371 and roads are and things like that. 0:08:20.398,0:08:21.806 We can use this to help us, 0:08:21.806,0:08:23.823 but the most important[br]form of ancillary data 0:08:23.849,0:08:27.767 that's out there today is called LIDAR:[br]Light Detection and Ranging. 0:08:27.767,0:08:30.044 And LIDAR has been used[br]in engineering for a while 0:08:30.060,0:08:32.808 and it allows us to essentially create[br]models of the surface. 0:08:32.808,0:08:35.566 This is Columbus Circle[br]in Central Park in New York City 0:08:35.566,0:08:38.787 and this is a surface elevation[br]of the trees. 0:08:38.787,0:08:42.107 The LIDAR tells me[br]where the canopy of the trees is, 0:08:42.107,0:08:47.580 where the tops of the buildings are,[br]where the ground surface is too. 0:08:47.580,0:08:50.702 And I can create these incredibly detailed[br]models of the world 0:08:50.702,0:08:52.195 so now I'm not just working 0:08:52.195,0:08:55.539 with spatial spectral information,[br]reflectance information, 0:08:55.539,0:08:59.011 I'm also working with height information,[br]I know the heights of things 0:08:59.011,0:09:02.349 so I can see two objects[br]that are green and woody, 0:09:02.349,0:09:05.687 but I can tell that one of them[br]is a shrub and one of them is a tree. 0:09:05.687,0:09:09.027 And this is just zooming[br]into that stuff there. 0:09:09.027,0:09:12.042 Now the problem is,[br]this is incredibly data-intensive 0:09:12.052,0:09:17.757 and nobody's figured out until recently,[br]and I mean like maybe two years ago, 0:09:17.757,0:09:20.862 people were doing this[br]on a tile by tile basis 0:09:20.862,0:09:22.840 to work on one little tile of data[br]at a time 0:09:22.866,0:09:24.672 that might be, you know, 0:09:24.672,0:09:28.552 one, that just that red outline[br]that you see right there, 0:09:28.576,0:09:31.250 and that might be half a gigabyte[br]or something like that. 0:09:31.250,0:09:35.148 So we've worked on turning this[br]into an enterprise environment, 0:09:35.148,0:09:38.397 that's what we have to do,[br]make an enterprise environment out of this 0:09:38.397,0:09:41.322 so we can start looking[br]at thousands of tiles of data at a time, 0:09:41.322,0:09:42.860 and we've successfully done that. 0:09:44.316,0:09:48.088 My lab, which is the[br]Spatial Analysis Lab. 0:09:48.114,0:09:51.860 The Spatial Analysis lab is the lab I run, 0:09:51.860,0:09:54.921 and they've been doing this stuff[br]for a number of years, 0:09:54.921,0:09:59.142 and they've collected,[br]through 64 projects, 837 communities, 0:09:59.142,0:10:03.895 covering 28 million people,[br]almost 9000 square miles of data mapped, 0:10:03.895,0:10:06.977 250 billion pixels of land cover[br]products generated 0:10:06.977,0:10:09.024 and 110 terabytes of data. 0:10:09.064,0:10:12.346 So this is a major undertaking[br]but it's only the beginning. 0:10:12.386,0:10:15.578 Going back to the crime data[br]that I was telling you about those trees, 0:10:15.578,0:10:17.141 so here's Baltimore again. 0:10:17.181,0:10:20.395 Using this method,[br]we turn data into information. 0:10:20.434,0:10:22.640 We get trees,[br]we now know where trees are. 0:10:22.695,0:10:24.274 I overlay it with the crime. 0:10:24.456,0:10:28.479 I end up with information,[br]I can now do a study, 0:10:28.532,0:10:30.683 and we just submitted this[br]for publication. 0:10:30.707,0:10:34.008 We just found out in fact[br]there's a strong negative correlation 0:10:34.048,0:10:35.437 between trees and crime 0:10:35.460,0:10:37.618 even when adjust for about[br]fifty other things. 0:10:37.658,0:10:42.844 We couldn't have done that[br]without this sort of information. 0:10:42.852,0:10:45.352 So with that, I will say thanks[br]to the people 0:10:45.391,0:10:46.986 from the Spatial Analysis Lab, 0:10:47.031,0:10:50.221 and particularly Jarlath O'Neil-Dunne[br]who helped me put this together 0:10:50.222,0:10:53.245 and has been doing this research[br]for a long time and thanks to you. 0:10:53.250,0:10:56.312 Thank you. (Applause)