[Script Info] Title: [Events] Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Dialogue: 0,0:00:00.36,0:00:03.29,Default,,0000,0000,0000,,Hello and welcome to Chapter 11, Regular\NExpressions Dialogue: 0,0:00:03.29,0:00:06.66,Default,,0000,0000,0000,,from the book Python for Informatics:\NExploring Information. Dialogue: 0,0:00:07.73,0:00:12.29,Default,,0000,0000,0000,,As always, these slides are copyright\NCreative Commons Attribution, as well as Dialogue: 0,0:00:12.29,0:00:15.28,Default,,0000,0000,0000,,the audio and the video that you're\Nwatching or listening to right now. Dialogue: 0,0:00:16.33,0:00:22.87,Default,,0000,0000,0000,,And, so regular expressions are an\Ninteresting thing. Dialogue: 0,0:00:22.87,0:00:25.53,Default,,0000,0000,0000,,You've seen from, in the chapters up till\Nnow, I've, Dialogue: 0,0:00:25.53,0:00:30.52,Default,,0000,0000,0000,,I've had a singular focus on sort of\Npulling information out of data. Dialogue: 0,0:00:30.52,0:00:34.29,Default,,0000,0000,0000,,Raw data, this mailbox file that perhaps\Nyou're getting tired of already. Dialogue: 0,0:00:34.29,0:00:35.86,Default,,0000,0000,0000,,But it's a lot of fun, because I can have Dialogue: 0,0:00:35.86,0:00:38.03,Default,,0000,0000,0000,,you go look for something and, and\Npick it out. Dialogue: 0,0:00:38.03,0:00:42.24,Default,,0000,0000,0000,,And you're doing something that like would\Nbe really painful to do sort of by hand. Dialogue: 0,0:00:45.09,0:00:47.06,Default,,0000,0000,0000,,And while it's not all of computing, I\Nmean, there's games Dialogue: 0,0:00:47.06,0:00:50.67,Default,,0000,0000,0000,,and there's, you know, things like\Nweather computations that do calculations, Dialogue: 0,0:00:52.46,0:00:56.73,Default,,0000,0000,0000,,pulling and extracting data out is a big\Npart of computing. Dialogue: 0,0:00:56.73,0:01:01.35,Default,,0000,0000,0000,,And so there's actually a library that's\Nbuilt specifically to do this. Dialogue: 0,0:01:01.35,0:01:06.23,Default,,0000,0000,0000,,And, and if you start doing a few finds\Nand slicing, it gets kind of Dialogue: 0,0:01:06.23,0:01:08.11,Default,,0000,0000,0000,,long after a while and that's like split,\Nfor example, Dialogue: 0,0:01:08.11,0:01:10.59,Default,,0000,0000,0000,,really saved us a lot of time. Dialogue: 0,0:01:10.59,0:01:13.76,Default,,0000,0000,0000,,But sometimes the data that you are\Nlooking for is a little Dialogue: 0,0:01:13.76,0:01:18.13,Default,,0000,0000,0000,,more sophisticated than broken into spaces\Nor colons or something like that. Dialogue: 0,0:01:18.13,0:01:21.05,Default,,0000,0000,0000,,And you just want to like tell something\Nto go find Dialogue: 0,0:01:21.05,0:01:25.96,Default,,0000,0000,0000,,I see what I want, and I see where it's\Nembedded in the string, go get it for me. Dialogue: 0,0:01:25.96,0:01:29.16,Default,,0000,0000,0000,,And regular expressions are themselves a\Nprogramming language. Dialogue: 0,0:01:29.16,0:01:33.68,Default,,0000,0000,0000,,They're like a really smart wild card for\Nsearching. Dialogue: 0,0:01:33.68,0:01:35.23,Default,,0000,0000,0000,,So we've used wild cards in various Dialogue: 0,0:01:35.23,0:01:40.18,Default,,0000,0000,0000,,things in search, but they're, they're a\Nreally smart version of a wild card. Dialogue: 0,0:01:42.01,0:01:47.04,Default,,0000,0000,0000,,And so, regular expressions are quite\Npowerful and they're very cryptic. Dialogue: 0,0:01:47.04,0:01:49.08,Default,,0000,0000,0000,,And as a matter of fact, you don't even\Nneed Dialogue: 0,0:01:49.08,0:01:50.74,Default,,0000,0000,0000,,to learn them if you don't feel like it,\Nright? Dialogue: 0,0:01:51.87,0:01:53.42,Default,,0000,0000,0000,,I've got this little guide. Dialogue: 0,0:01:53.42,0:01:56.04,Default,,0000,0000,0000,,I need a guide for myself when I do\Nregular expressions. Dialogue: 0,0:01:56.04,0:01:58.38,Default,,0000,0000,0000,,It sometimes takes me a few minutes to\Nwrite Dialogue: 0,0:01:58.38,0:02:00.27,Default,,0000,0000,0000,,a regular expression to do exactly what I\Nwant. Dialogue: 0,0:02:00.27,0:02:05.38,Default,,0000,0000,0000,,So in a way, writing a regular expression\Nis like program, writing a program. Dialogue: 0,0:02:05.38,0:02:08.93,Default,,0000,0000,0000,,It's highly specialized to searching and\Nextracting data from strings. Dialogue: 0,0:02:08.93,0:02:11.50,Default,,0000,0000,0000,,But it's like writing a program and it\Ntakes a while to get Dialogue: 0,0:02:11.50,0:02:15.41,Default,,0000,0000,0000,,it right and you kind of like, oh, change\Nthis, what about a slash there? Dialogue: 0,0:02:15.41,0:02:18.13,Default,,0000,0000,0000,,And, so, you, but they actually are kind\Nof fun. Dialogue: 0,0:02:18.13,0:02:22.16,Default,,0000,0000,0000,,And, and they are a great way to sort of\Nexchange little program snippets Dialogue: 0,0:02:22.16,0:02:25.33,Default,,0000,0000,0000,,to say, oh yeah, I'm looking for this, oh\Nhere's a little reg expression you might Dialogue: 0,0:02:25.33,0:02:28.38,Default,,0000,0000,0000,,try and then, so they're, they're like\Nprograms themselves. Dialogue: 0,0:02:29.66,0:02:32.54,Default,,0000,0000,0000,,It is this language of marker characters,\Nso when we Dialogue: 0,0:02:32.54,0:02:37.21,Default,,0000,0000,0000,,look for regular expressions, some\Ncharacters like A, B, C, have meaning Dialogue: 0,0:02:37.21,0:02:40.75,Default,,0000,0000,0000,,as A, B, C but some characters like caret or\Ndollar sign mean Dialogue: 0,0:02:40.75,0:02:42.88,Default,,0000,0000,0000,,at the beginning of the line, or at the\Nend of the line. Dialogue: 0,0:02:42.88,0:02:47.42,Default,,0000,0000,0000,,And so we encode in this string a, a\Nprogram, basically. Dialogue: 0,0:02:47.42,0:02:50.94,Default,,0000,0000,0000,,And so it's a rather old-school language.\NIt's from Dialogue: 0,0:02:50.94,0:02:51.61,Default,,0000,0000,0000,,long time. Dialogue: 0,0:02:51.61,0:02:55.46,Default,,0000,0000,0000,,It predates Python, which is over 20 years\Nold, and so Dialogue: 0,0:02:55.46,0:03:00.63,Default,,0000,0000,0000,,it's, it also marks you as sort of a\Nlittle cool, right? Dialogue: 0,0:03:00.63,0:03:03.57,Default,,0000,0000,0000,,It's a, it's a distinct marking that makes Dialogue: 0,0:03:03.57,0:03:06.32,Default,,0000,0000,0000,,it so that you know something other people\Ndon't. Dialogue: 0,0:03:06.32,0:03:09.56,Default,,0000,0000,0000,,Right? So you can know how to program, but\Nif you know regular expressions Dialogue: 0,0:03:09.56,0:03:13.38,Default,,0000,0000,0000,,it'll be like woah, I tried to look at those\Nand they're kind of tough. Dialogue: 0,0:03:13.38,0:03:16.03,Default,,0000,0000,0000,,In a way, knowing regular expressions is Dialogue: 0,0:03:16.03,0:03:17.98,Default,,0000,0000,0000,,kind of like a tattoo. Dialogue: 0,0:03:17.98,0:03:20.79,Default,,0000,0000,0000,,So I, it's casual Friday and that's why\NI'm wearing a T-shirt Dialogue: 0,0:03:20.79,0:03:24.03,Default,,0000,0000,0000,,today and so I figured I would come in\Ntoday in a T-shirt, Dialogue: 0,0:03:24.03,0:03:26.25,Default,,0000,0000,0000,,but seeing as it's the first time I'm wearing\Na short-sleeved shirt, it's Dialogue: 0,0:03:26.25,0:03:29.45,Default,,0000,0000,0000,,also the first time I can show you my,\Nshow my real tattoo here. Dialogue: 0,0:03:29.45,0:03:32.59,Default,,0000,0000,0000,,So, here's my real tattoo and in the\Nmiddle is Sakai, Dialogue: 0,0:03:32.59,0:03:36.16,Default,,0000,0000,0000,,the open source learning management system\Nalways close to my heart. Dialogue: 0,0:03:36.16,0:03:37.78,Default,,0000,0000,0000,,And then you have the IMS logo, which Dialogue: 0,0:03:37.78,0:03:41.20,Default,,0000,0000,0000,,is IMS Learning Tools Interoperability,\Nwhich a standard, Dialogue: 0,0:03:41.20,0:03:46.44,Default,,0000,0000,0000,,it means a lot to me.\NBlackboard, OLAT, Learning Objects, Angel, Dialogue: 0,0:03:46.44,0:03:51.79,Default,,0000,0000,0000,,Moodle, Instructure, Jenzabar, and\NDesire2Learn. Dialogue: 0,0:03:51.79,0:03:54.42,Default,,0000,0000,0000,,I call this the ring of compliance,\Nbecause these are all Dialogue: 0,0:03:54.42,0:03:59.80,Default,,0000,0000,0000,,of the first six or seven learning\Nmanagement systems that complied Dialogue: 0,0:03:59.80,0:04:00.91,Default,,0000,0000,0000,,with the IMS Learning Tools Dialogue: 0,0:04:00.91,0:04:03.21,Default,,0000,0000,0000,,Interoperability standards\Nspecification, which is Dialogue: 0,0:04:03.21,0:04:06.25,Default,,0000,0000,0000,,something that I spent a lot of my life\Nmaking work. Dialogue: 0,0:04:06.25,0:04:06.94,Default,,0000,0000,0000,,So Dialogue: 0,0:04:06.94,0:04:09.75,Default,,0000,0000,0000,,I figured I'd make a tattoo and just\Nkind of Dialogue: 0,0:04:09.75,0:04:12.81,Default,,0000,0000,0000,,part of my rough, tough image and,\Nand actually Dialogue: 0,0:04:12.81,0:04:15.94,Default,,0000,0000,0000,,regular expressions are indeed part of my\Nrough, tough image, Dialogue: 0,0:04:15.94,0:04:18.87,Default,,0000,0000,0000,,because I'm like, I'm down with\Nregular expressions. Dialogue: 0,0:04:18.87,0:04:22.80,Default,,0000,0000,0000,,And people are like impressed with my\Nregular expression knowledge. Dialogue: 0,0:04:22.80,0:04:26.71,Default,,0000,0000,0000,,But as impressive as I am, I still need a\Ncheat sheet, so I'll have a cheat Dialogue: 0,0:04:26.71,0:04:29.23,Default,,0000,0000,0000,,sheet that you can download hopefully on\Nthe pythonlearn Dialogue: 0,0:04:29.23,0:04:31.95,Default,,0000,0000,0000,,website or whatever, and I just, it Dialogue: 0,0:04:31.95,0:04:32.75,Default,,0000,0000,0000,,doesn't have to be much. Dialogue: 0,0:04:32.75,0:04:36.37,Default,,0000,0000,0000,,It's really just a kind of a, a crutch,\Nand these are the characters that have Dialogue: 0,0:04:36.37,0:04:38.44,Default,,0000,0000,0000,,special meaning, like caret or\Ndollar sign Dialogue: 0,0:04:38.44,0:04:41.03,Default,,0000,0000,0000,,match the beginning or end of line,\Nrespectively. Dialogue: 0,0:04:41.03,0:04:44.31,Default,,0000,0000,0000,,So they're not really matching a dollar\Nsign, they match, they, Dialogue: 0,0:04:44.31,0:04:47.49,Default,,0000,0000,0000,,they mean something in our little mini\Nstring-like programming language. Dialogue: 0,0:04:48.80,0:04:52.91,Default,,0000,0000,0000,,So, like many things that we do in Python\Ngoing forward, once you want some Dialogue: 0,0:04:52.91,0:04:55.50,Default,,0000,0000,0000,,sophisticated capability, it comes with\NPython, but Dialogue: 0,0:04:55.50,0:04:57.61,Default,,0000,0000,0000,,it comes in the form of a library. Dialogue: 0,0:04:57.61,0:05:00.87,Default,,0000,0000,0000,,And so the regular expression library we\Nhave to say import r-e Dialogue: 0,0:05:00.87,0:05:04.11,Default,,0000,0000,0000,,at the beginning of our programs to import\Nthe regular expression library. Dialogue: 0,0:05:04.11,0:05:06.38,Default,,0000,0000,0000,,Then we call re.search to say I'm Dialogue: 0,0:05:06.38,0:05:09.24,Default,,0000,0000,0000,,looking for search from the regular\Nexpression library. Dialogue: 0,0:05:09.24,0:05:11.59,Default,,0000,0000,0000,,There's two basic functions or method,\Ntwo, two basic Dialogue: 0,0:05:11.59,0:05:14.23,Default,,0000,0000,0000,,capabilities inside this library that\Nwe're going to look at. Dialogue: 0,0:05:14.23,0:05:18.94,Default,,0000,0000,0000,,One is search, that replaces find, it's\Nlike a smart find, and then Dialogue: 0,0:05:18.94,0:05:24.13,Default,,0000,0000,0000,,findall is a combination of a smart find\Nand a automatic extraction. Dialogue: 0,0:05:24.13,0:05:25.67,Default,,0000,0000,0000,,So we'll look at both of those in turn. Dialogue: 0,0:05:25.67,0:05:28.76,Default,,0000,0000,0000,,And I'll do it by comparing them to\Nexisting Dialogue: 0,0:05:28.76,0:05:31.23,Default,,0000,0000,0000,,Python that you kind of already should\Nknow at this point. Dialogue: 0,0:05:34.32,0:05:37.08,Default,,0000,0000,0000,,So here's some code that's, say, looking\Nfor lines that Dialogue: 0,0:05:37.08,0:05:40.10,Default,,0000,0000,0000,,have the word fr-, have the string From\Ncolon in them. Dialogue: 0,0:05:40.10,0:05:43.54,Default,,0000,0000,0000,,Right, so, we're going to open a file,\Nwe're going to strip the white space. Dialogue: 0,0:05:43.54,0:05:47.62,Default,,0000,0000,0000,,If we find we, hunt within line for\NFrom. Dialogue: 0,0:05:47.62,0:05:51.41,Default,,0000,0000,0000,,If it's greater than or equal to zero then\Nwe'll print it. And so this Dialogue: 0,0:05:51.41,0:05:55.01,Default,,0000,0000,0000,,is just going to give us a number. If it's,\Nif it's not found, it's negative one. Dialogue: 0,0:05:55.01,0:05:58.04,Default,,0000,0000,0000,,So it's only going to print the lines that\Nthat have From in them. Dialogue: 0,0:05:58.04,0:05:59.52,Default,,0000,0000,0000,,Here is the equivalent using Dialogue: 0,0:05:59.52,0:06:03.18,Default,,0000,0000,0000,,regular expressions.\NSo these two things are equivalent. Dialogue: 0,0:06:03.18,0:06:04.82,Default,,0000,0000,0000,,So we have to import the library, like I Dialogue: 0,0:06:04.82,0:06:07.43,Default,,0000,0000,0000,,mentioned before, and all the rest of it's\Nthe same. Dialogue: 0,0:06:07.43,0:06:10.93,Default,,0000,0000,0000,,The if test is re.search. That says within Dialogue: 0,0:06:10.93,0:06:15.26,Default,,0000,0000,0000,,the library re, call the search utility\Nand then Dialogue: 0,0:06:15.26,0:06:17.95,Default,,0000,0000,0000,,pass in the line, the string we're looking\Nfor Dialogue: 0,0:06:17.95,0:06:20.48,Default,,0000,0000,0000,,and the line, the actual text we're\Nlooking in. Dialogue: 0,0:06:20.48,0:06:24.92,Default,,0000,0000,0000,,So this is like look for From inside of\Nline and return me a Dialogue: 0,0:06:24.92,0:06:28.93,Default,,0000,0000,0000,,True or a False, whichever, depending on\Nwhether you find it or not. Dialogue: 0,0:06:28.93,0:06:32.80,Default,,0000,0000,0000,,Now you might say, I, you just got done\Ntelling me that it, it was more dense. Dialogue: 0,0:06:32.80,0:06:34.73,Default,,0000,0000,0000,,And the answer is, there's a few more\Ncharacters here. Dialogue: 0,0:06:34.73,0:06:36.07,Default,,0000,0000,0000,,But we'll see in a second how you Dialogue: 0,0:06:36.07,0:06:39.08,Default,,0000,0000,0000,,can quickly add more power to the regular\Nexpression. Dialogue: 0,0:06:39.08,0:06:40.73,Default,,0000,0000,0000,,Find, you have to start adding more Dialogue: 0,0:06:40.73,0:06:42.91,Default,,0000,0000,0000,,Python lines to make it more sophisticated\Nwhere in Dialogue: 0,0:06:42.91,0:06:45.95,Default,,0000,0000,0000,,the regular expression you start changing, Dialogue: 0,0:06:45.95,0:06:49.95,Default,,0000,0000,0000,,you change the search string to give more of Dialogue: 0,0:06:49.95,0:06:51.94,Default,,0000,0000,0000,,the direction of what you're looking for,\Nand that's what Dialogue: 0,0:06:51.94,0:06:54.55,Default,,0000,0000,0000,,we'll be doing, pretty much, is changing\Nthe search string. Dialogue: 0,0:06:54.55,0:06:58.42,Default,,0000,0000,0000,,So now if we wanted to switch to say,\Nwait, wait, wait, we don't Dialogue: 0,0:06:58.42,0:07:02.90,Default,,0000,0000,0000,,just want the From anywhere in the line,\Nwe want it to start with From. Dialogue: 0,0:07:02.90,0:07:05.73,Default,,0000,0000,0000,,So we would change\Nline.startswith('From'), Dialogue: 0,0:07:05.73,0:07:06.53,Default,,0000,0000,0000,,and that's either going to be true or false Dialogue: 0,0:07:06.53,0:07:10.49,Default,,0000,0000,0000,,depending on whether or not the\Nline starts with From. Dialogue: 0,0:07:10.49,0:07:11.92,Default,,0000,0000,0000,,Now, we do the same thing with Dialogue: 0,0:07:11.92,0:07:14.72,Default,,0000,0000,0000,,regular expressions by changing the\Nsearch string. Dialogue: 0,0:07:15.95,0:07:17.29,Default,,0000,0000,0000,,So now we are in regular expressions. Dialogue: 0,0:07:17.29,0:07:19.98,Default,,0000,0000,0000,,So this really just isn't a string, it's a\Nstring plus Dialogue: 0,0:07:19.98,0:07:21.66,Default,,0000,0000,0000,,characters that are interpreted as Dialogue: 0,0:07:21.66,0:07:24.35,Default,,0000,0000,0000,,commands by the regular expression\Nlibrary. Dialogue: 0,0:07:24.35,0:07:27.97,Default,,0000,0000,0000,,So the caret, which is the first one on\Nour, Dialogue: 0,0:07:27.97,0:07:31.83,Default,,0000,0000,0000,,our little regular expression sheet, matches\Nthe beginning of the line. Dialogue: 0,0:07:31.83,0:07:32.86,Default,,0000,0000,0000,,It's not actually a caret. Dialogue: 0,0:07:32.86,0:07:37.35,Default,,0000,0000,0000,,So that says, the first character, this\Ntwo-character sequence, caret F, Dialogue: 0,0:07:37.35,0:07:40.91,Default,,0000,0000,0000,,means F but in column one, in the first\Ncharacter of the line. Dialogue: 0,0:07:40.91,0:07:43.11,Default,,0000,0000,0000,,And so, again, this is going to give us a Dialogue: 0,0:07:43.11,0:07:46.43,Default,,0000,0000,0000,,True or a False, if this regular\Nexpression matches. Dialogue: 0,0:07:46.43,0:07:49.90,Default,,0000,0000,0000,,The, the beginning of the line, From: and\Nit's the same as Dialogue: 0,0:07:49.90,0:07:54.44,Default,,0000,0000,0000,,this, it's, does it start with From.\NSo again, these two are equivalent. Dialogue: 0,0:07:54.44,0:08:00.24,Default,,0000,0000,0000,,But you see the pattern where we're\Ngoing to do something to this string using Dialogue: 0,0:08:00.24,0:08:05.91,Default,,0000,0000,0000,,these characters that have meaning, okay?\NSo, the next thing that's Dialogue: 0,0:08:05.91,0:08:11.92,Default,,0000,0000,0000,,most commonly done other than caret and\Ndollar sign for the end of line, is Dialogue: 0,0:08:11.92,0:08:16.20,Default,,0000,0000,0000,,the wildcard characters and so, we've used\Nwildcards Dialogue: 0,0:08:16.20,0:08:19.51,Default,,0000,0000,0000,,possibly in like DOS, where we can use ? Dialogue: 0,0:08:19.51,0:08:25.13,Default,,0000,0000,0000,,or * in like a dir command. dir .*.* if\Nyou're familiar with that, Dialogue: 0,0:08:25.13,0:08:29.51,Default,,0000,0000,0000,,or even a Unix command like ls, you\Nknow, star dot whatever. Dialogue: 0,0:08:29.51,0:08:31.52,Default,,0000,0000,0000,,This is not how regular expressions Dialogue: 0,0:08:31.52,0:08:33.73,Default,,0000,0000,0000,,work. And the problem is is that dot, dot Dialogue: 0,0:08:33.73,0:08:38.02,Default,,0000,0000,0000,,is that it matches a single character in\Nregular expressions. Dialogue: 0,0:08:38.02,0:08:41.45,Default,,0000,0000,0000,,Asterisk means any number of times. Dialogue: 0,0:08:41.45,0:08:46.62,Default,,0000,0000,0000,,So if I look at this, if I look at\Nthis and color-code this to make a Dialogue: 0,0:08:46.62,0:08:52.05,Default,,0000,0000,0000,,little more sense, the caret is actually\Nkind of part of the Dialogue: 0,0:08:52.05,0:08:56.56,Default,,0000,0000,0000,,regular expect, regular expression\Nprogramming language. Says I'm, I'm Dialogue: 0,0:08:56.56,0:08:58.91,Default,,0000,0000,0000,,I'm a virtual character matching the\Nbeginning of line. Dialogue: 0,0:08:58.91,0:09:00.62,Default,,0000,0000,0000,,The X is a real character. Dialogue: 0,0:09:00.62,0:09:04.59,Default,,0000,0000,0000,,The dot is part of the regular expression\Nprogramming language, any character. Dialogue: 0,0:09:04.59,0:09:07.59,Default,,0000,0000,0000,,Star is part of the regular expression\Nprogramming, it says Dialogue: 0,0:09:07.59,0:09:12.22,Default,,0000,0000,0000,,the immediate previous character many\Ntimes, zero or more times. Dialogue: 0,0:09:12.22,0:09:14.85,Default,,0000,0000,0000,,And then colon matches the colon. Dialogue: 0,0:09:14.85,0:09:19.91,Default,,0000,0000,0000,,And so if you look at lines, these are the\Nkinds of lines that will give me a True. Dialogue: 0,0:09:19.91,0:09:22.38,Default,,0000,0000,0000,,Because they start with an X, Dialogue: 0,0:09:22.38,0:09:25.75,Default,,0000,0000,0000,,followed by some number of characters,\Nfollowed by a colon. Dialogue: 0,0:09:25.75,0:09:26.90,Default,,0000,0000,0000,,So that's true. Dialogue: 0,0:09:26.90,0:09:30.99,Default,,0000,0000,0000,,Start with a X, followed by some number of\Ncharacters, followed by a colon. Dialogue: 0,0:09:30.99,0:09:32.27,Default,,0000,0000,0000,,Okay? Dialogue: 0,0:09:32.27,0:09:35.18,Default,,0000,0000,0000,,And so that's basically how this works. Dialogue: 0,0:09:35.18,0:09:38.84,Default,,0000,0000,0000,,And so this little, this, in this Dialogue: 0,0:09:38.84,0:09:42.15,Default,,0000,0000,0000,,five-character string there are, you know,\Nsome of Dialogue: 0,0:09:42.15,0:09:44.32,Default,,0000,0000,0000,,these things are like instructions and\Nsome of Dialogue: 0,0:09:44.32,0:09:46.44,Default,,0000,0000,0000,,them are the actual characters we're\Nlooking for. Dialogue: 0,0:09:46.44,0:09:47.67,Default,,0000,0000,0000,,So the X and the colon Dialogue: 0,0:09:47.67,0:09:49.06,Default,,0000,0000,0000,,are the characters we're looking Dialogue: 0,0:09:49.06,0:09:55.00,Default,,0000,0000,0000,,for, and the caret, dot, and star are\Nprogramming. Dialogue: 0,0:09:55.00,0:09:57.45,Default,,0000,0000,0000,,Right? They are logic that we're adding\Nto the string. Dialogue: 0,0:09:59.99,0:10:00.62,Default,,0000,0000,0000,,Okay. Dialogue: 0,0:10:00.62,0:10:04.84,Default,,0000,0000,0000,,So let's say, for example, you're... \NPart of any of these things, Dialogue: 0,0:10:04.84,0:10:07.34,Default,,0000,0000,0000,,and part of the stuff we have done so far, Dialogue: 0,0:10:07.34,0:10:10.53,Default,,0000,0000,0000,,has to assume that the data is some\Nlevel of being clean and Dialogue: 0,0:10:10.53,0:10:14.44,Default,,0000,0000,0000,,so the data that I have been giving you,\Nmbox.txt, is not inconsistent. Dialogue: 0,0:10:15.48,0:10:17.57,Default,,0000,0000,0000,,Right? It doesn't have like too much\Nweirdness in it. Dialogue: 0,0:10:17.57,0:10:20.12,Default,,0000,0000,0000,,I'm not trying to trick you and\Nmislead you, although Dialogue: 0,0:10:20.12,0:10:22.82,Default,,0000,0000,0000,,we've had situations where you sort of get\Na traceback because Dialogue: 0,0:10:22.82,0:10:25.02,Default,,0000,0000,0000,,you think there's going to be five words\Nyou, you grab a line, Dialogue: 0,0:10:25.02,0:10:27.57,Default,,0000,0000,0000,,you break it, and there's only two\Nwords and then you get Dialogue: 0,0:10:27.57,0:10:31.25,Default,,0000,0000,0000,,a traceback because you're looking at the\Nfifth word, or something like that. Dialogue: 0,0:10:32.58,0:10:35.38,Default,,0000,0000,0000,,But if your data is less clean, or even\Nyou just are Dialogue: 0,0:10:35.38,0:10:39.89,Default,,0000,0000,0000,,want to be real careful, you can\Nfine-tune your matching. Dialogue: 0,0:10:39.89,0:10:42.52,Default,,0000,0000,0000,,So, here's that same match. Dialogue: 0,0:10:42.52,0:10:45.12,Default,,0000,0000,0000,,Give me a character X, followed by any\Nnumber of Dialogue: 0,0:10:45.12,0:10:48.09,Default,,0000,0000,0000,,characters, followed by a colon, and that's\Nwhat I'm looking for. Dialogue: 0,0:10:48.09,0:10:50.10,Default,,0000,0000,0000,,Give me lines that match that pattern. Dialogue: 0,0:10:50.10,0:10:52.22,Default,,0000,0000,0000,,So this X starts at any number of\Ncharacters, Dialogue: 0,0:10:52.22,0:10:55.29,Default,,0000,0000,0000,,colon, great, this, any number of\Ncharacters good, great. Dialogue: 0,0:10:55.29,0:10:57.42,Default,,0000,0000,0000,,Oh wait, and there's an email X that says Dialogue: 0,0:10:57.42,0:11:01.02,Default,,0000,0000,0000,,X Plane is two weeks behind sch, behind\Nschedule, colon, two weeks. Dialogue: 0,0:11:01.02,0:11:05.61,Default,,0000,0000,0000,,Well, the regular expression didn't know\Nthat the dash made sense to you. Dialogue: 0,0:11:05.61,0:11:07.30,Default,,0000,0000,0000,,And you just assumed that everything that\Nstarted Dialogue: 0,0:11:07.30,0:11:09.49,Default,,0000,0000,0000,,with a capital X had a dash after it. Dialogue: 0,0:11:09.49,0:11:15.13,Default,,0000,0000,0000,,So X is what it starts with, any number of\Nany character, and then Dialogue: 0,0:11:15.13,0:11:17.43,Default,,0000,0000,0000,,a colon. So this becomes True. Dialogue: 0,0:11:17.43,0:11:21.94,Default,,0000,0000,0000,,This may not make you happy, right? It may\Nnot be what you're looking for. Dialogue: 0,0:11:21.94,0:11:26.29,Default,,0000,0000,0000,,Because you haven't been specific enough\Nin your regular expression. Dialogue: 0,0:11:26.29,0:11:30.55,Default,,0000,0000,0000,,So, we can be more specific in our regular\Nexpression. Dialogue: 0,0:11:30.55,0:11:35.31,Default,,0000,0000,0000,,So for example, this is a more specific\Nregular expression. Dialogue: 0,0:11:35.31,0:11:40.39,Default,,0000,0000,0000,,It still says start with an X as the first\Ncharacter, then a dash, Dialogue: 0,0:11:40.39,0:11:43.22,Default,,0000,0000,0000,,that's a real character not a, then this Dialogue: 0,0:11:43.22,0:11:47.46,Default,,0000,0000,0000,,next thing, instead of being a dot, this\Nbackslash capital S. Dialogue: 0,0:11:47.46,0:11:49.51,Default,,0000,0000,0000,,It's on the sheet. Dialogue: 0,0:11:49.51,0:11:51.41,Default,,0000,0000,0000,,Whoa. It's not on the sheet. Dialogue: 0,0:11:51.41,0:11:53.90,Default,,0000,0000,0000,,I lost the sheet. Come back, sheet. Dialogue: 0,0:11:54.90,0:11:55.40,Default,,0000,0000,0000,,I lost the sheet. Dialogue: 0,0:11:56.07,0:11:58.73,Default,,0000,0000,0000,,I can't live without my sheet. Dialogue: 0,0:12:00.82,0:12:06.18,Default,,0000,0000,0000,,Backslash capital S means a\Nnon-whitespace character. Dialogue: 0,0:12:06.18,0:12:09.04,Default,,0000,0000,0000,,So that means spaces won't match. Dialogue: 0,0:12:09.04,0:12:14.43,Default,,0000,0000,0000,,And then I changed the asterisk, zero or\Nmore times thing, to a plus. Dialogue: 0,0:12:14.43,0:12:16.34,Default,,0000,0000,0000,,And that means one or more times. Dialogue: 0,0:12:16.34,0:12:20.44,Default,,0000,0000,0000,,Here is a character, a non-whitespace.\NThese two things kind of work together. Dialogue: 0,0:12:20.44,0:12:25.17,Default,,0000,0000,0000,,A non-whitespace character at least one\Ntime, as many as we like. Dialogue: 0,0:12:25.17,0:12:26.23,Default,,0000,0000,0000,,And then, a colon. Dialogue: 0,0:12:27.39,0:12:30.68,Default,,0000,0000,0000,,So, if we look here, it starts with X dash, Dialogue: 0,0:12:30.68,0:12:35.43,Default,,0000,0000,0000,,any number of non-whitespace\Ncharacters, and ends in colon. Dialogue: 0,0:12:35.43,0:12:37.15,Default,,0000,0000,0000,,Starts with X dash, any number Dialogue: 0,0:12:37.15,0:12:39.85,Default,,0000,0000,0000,,of non-whitespace characters, ends\Nin a colon. Dialogue: 0,0:12:39.85,0:12:41.52,Default,,0000,0000,0000,,True. True. Dialogue: 0,0:12:41.52,0:12:45.61,Default,,0000,0000,0000,,This one starts with an X, but doesn't\Nstart with an X dash. Dialogue: 0,0:12:45.61,0:12:49.34,Default,,0000,0000,0000,,Oh, as a matter of fact, these characters\Nare blanks, so this becomes a False. Dialogue: 0,0:12:49.34,0:12:52.71,Default,,0000,0000,0000,,It does have an X and it does have a colon\Nand match the previous one, Dialogue: 0,0:12:52.71,0:12:55.50,Default,,0000,0000,0000,,but this one here is more specific. Dialogue: 0,0:12:59.72,0:13:02.68,Default,,0000,0000,0000,,Okay? So it's more specific and so it\Nmatches what you want. Dialogue: 0,0:13:02.68,0:13:04.00,Default,,0000,0000,0000,,Now it depends on what you are looking for. Dialogue: 0,0:13:04.00,0:13:05.09,Default,,0000,0000,0000,,Maybe you do want this line, Dialogue: 0,0:13:05.09,0:13:08.74,Default,,0000,0000,0000,,and so you're looking for X. I don't\Nknow. But if you want, you can be Dialogue: 0,0:13:08.74,0:13:12.77,Default,,0000,0000,0000,,increasingly sophisticated in what Dialogue: 0,0:13:12.77,0:13:15.00,Default,,0000,0000,0000,,you're looking for in a regular\Nexpression. Dialogue: 0,0:13:15.00,0:13:19.95,Default,,0000,0000,0000,,So now, let's talk about extracting data. Dialogue: 0,0:13:19.95,0:13:23.55,Default,,0000,0000,0000,,So everything we've done so far is,\Nis it there or is it not. Dialogue: 0,0:13:23.55,0:13:24.74,Default,,0000,0000,0000,,But it's really common once Dialogue: 0,0:13:24.74,0:13:27.13,Default,,0000,0000,0000,,you find something you that want to\Nbreak it into pieces. Dialogue: 0,0:13:27.13,0:13:31.56,Default,,0000,0000,0000,,So we can combine the searching and the\Nparsing into one statement. Dialogue: 0,0:13:32.59,0:13:36.71,Default,,0000,0000,0000,,And instead of using search, which returns\Nfor us a true/false, we are going to use Dialogue: 0,0:13:36.71,0:13:41.87,Default,,0000,0000,0000,,findall.\NSo in this example, I'm going to to show Dialogue: 0,0:13:41.87,0:13:51.01,Default,,0000,0000,0000,,you a new syntax. The square bracket in\Nregular expression language means Dialogue: 0,0:13:51.01,0:13:52.85,Default,,0000,0000,0000,,a way to list a set of characters. Dialogue: 0,0:13:52.85,0:13:57.62,Default,,0000,0000,0000,,So this says, this is a single character\Nthat says, Dialogue: 0,0:13:57.62,0:14:00.49,Default,,0000,0000,0000,,I want to match anything in the range\N0 through 9. Dialogue: 0,0:14:01.92,0:14:04.11,Default,,0000,0000,0000,,Plus means one or more of those. Dialogue: 0,0:14:04.11,0:14:08.56,Default,,0000,0000,0000,,So that says, so this is, this whole thing\Nsays one or more digits. Dialogue: 0,0:14:08.56,0:14:11.59,Default,,0000,0000,0000,,That's a regular expression that says one\Nor more digits. Dialogue: 0,0:14:11.59,0:14:13.31,Default,,0000,0000,0000,,You can put other things inside here. Dialogue: 0,0:14:14.82,0:14:16.04,Default,,0000,0000,0000,,You can put like, you know, Dialogue: 0,0:14:17.28,0:14:21.67,Default,,0000,0000,0000,,you could make a thing that says a b c d.\NAnd that would say, I'm Dialogue: 0,0:14:21.67,0:14:26.09,Default,,0000,0000,0000,,going to match a single character that's\Na or b or c or d. Or you could say like, Dialogue: 0,0:14:26.95,0:14:32.30,Default,,0000,0000,0000,,you know, 1 3 5 7, bracket. Dialogue: 0,0:14:32.30,0:14:33.18,Default,,0000,0000,0000,,That's a single character Dialogue: 0,0:14:33.18,0:14:35.03,Default,,0000,0000,0000,,that's either a 1 or a 3 or a 5 or a 7. Dialogue: 0,0:14:35.03,0:14:37.08,Default,,0000,0000,0000,,So the bracket is a list of matching Dialogue: 0,0:14:37.08,0:14:41.35,Default,,0000,0000,0000,,characters and the dash inside the\Nbracket means range. Dialogue: 0,0:14:41.35,0:14:44.60,Default,,0000,0000,0000,,We'll see in a second that you can stick a\Nnot inside the bracket. It's on this. Dialogue: 0,0:14:44.60,0:14:47.33,Default,,0000,0000,0000,,So, so again, remember in this little Dialogue: 0,0:14:47.33,0:14:49.92,Default,,0000,0000,0000,,mini-language, we are programming, right? Dialogue: 0,0:14:49.92,0:14:54.66,Default,,0000,0000,0000,,We are giving instructions to the regular\Nexpression engine, as it were. Okay? Dialogue: 0,0:14:58.07,0:15:03.37,Default,,0000,0000,0000,,So, if we do this, and here is an\Nexpression that Dialogue: 0,0:15:03.37,0:15:09.33,Default,,0000,0000,0000,,says I would like to find, you know, things\Nthat are one or more digits. Dialogue: 0,0:15:09.33,0:15:09.89,Default,,0000,0000,0000,,And so, Dialogue: 0,0:15:13.70,0:15:16.64,Default,,0000,0000,0000,,so it's one or more digits and, and so\Nit's going to look Dialogue: 0,0:15:16.64,0:15:19.45,Default,,0000,0000,0000,,through here and it's going to find it as\Nmany times as it can. Dialogue: 0,0:15:20.55,0:15:24.47,Default,,0000,0000,0000,,So there is one or more digits, there is\None or more digits, Dialogue: 0,0:15:24.47,0:15:26.72,Default,,0000,0000,0000,,and there is one or more digits. Dialogue: 0,0:15:26.72,0:15:30.40,Default,,0000,0000,0000,,And so what findall gives us back is a\Nlist of strings. Dialogue: 0,0:15:30.40,0:15:31.80,Default,,0000,0000,0000,,So it found it. Dialogue: 0,0:15:31.80,0:15:33.18,Default,,0000,0000,0000,,Where do I match?\NWhere do I match? Dialogue: 0,0:15:33.18,0:15:37.83,Default,,0000,0000,0000,,It's looking the whole time and then,\Nit says, oh, I've got it. Dialogue: 0,0:15:37.83,0:15:39.41,Default,,0000,0000,0000,,2, 19, 42. Dialogue: 0,0:15:39.41,0:15:43.40,Default,,0000,0000,0000,,So it actually extracts the strings that\Nmatch Dialogue: 0,0:15:43.40,0:15:46.59,Default,,0000,0000,0000,,and gives you a Python list of strings. Dialogue: 0,0:15:46.59,0:15:48.04,Default,,0000,0000,0000,,Python list of strings. Dialogue: 0,0:15:48.04,0:15:53.36,Default,,0000,0000,0000,,Kind of of like split, except it's like a\Nsuper smart split, right? Dialogue: 0,0:15:53.36,0:15:56.94,Default,,0000,0000,0000,,It's split, but I've directed it what to\Nlook for, and if, Dialogue: 0,0:16:01.32,0:16:04.53,Default,,0000,0000,0000,,so here's an example of, you know, that's\Nthe one I just did. Dialogue: 0,0:16:04.53,0:16:10.32,Default,,0000,0000,0000,,Find me one or more digits and extract\Nthem, so 2, 19, 42. Dialogue: 0,0:16:10.32,0:16:14.33,Default,,0000,0000,0000,,Here I'm saying, using the same bracket\Nsyntax, to look for a single Dialogue: 0,0:16:14.33,0:16:19.90,Default,,0000,0000,0000,,character A, capital A E I O or U, and one\Nor more Dialogue: 0,0:16:19.90,0:16:24.52,Default,,0000,0000,0000,,of those. And if you look, there are no\Nupper-case vowels in my string. Dialogue: 0,0:16:24.52,0:16:26.85,Default,,0000,0000,0000,,So it says I'm going to find all the\Nthings that match Dialogue: 0,0:16:26.85,0:16:35.88,Default,,0000,0000,0000,,A E I O U. So things like AA would match\Nand, you know, OU would match. Dialogue: 0,0:16:36.99,0:16:39.43,Default,,0000,0000,0000,,And so that's what we, we would get if\Nthey were in the string. Dialogue: 0,0:16:40.52,0:16:43.83,Default,,0000,0000,0000,,But because there are none, we get an\Nempty string. Dialogue: 0,0:16:43.83,0:16:45.64,Default,,0000,0000,0000,,So even if there are none, you get an\Nempty string. Dialogue: 0,0:16:45.64,0:16:48.26,Default,,0000,0000,0000,,So it always returns a string. Dialogue: 0,0:16:48.26,0:16:51.91,Default,,0000,0000,0000,,It may be a zero-length string, and that's\Nwhat you have Dialogue: 0,0:16:51.91,0:16:54.47,Default,,0000,0000,0000,,to check. Okay? Dialogue: 0,0:17:00.47,0:17:02.43,Default,,0000,0000,0000,,Okay, now Dialogue: 0,0:17:03.43,0:17:05.73,Default,,0000,0000,0000,,matching has this notion of greedy, Dialogue: 0,0:17:06.73,0:17:10.12,Default,,0000,0000,0000,,where when you put one of these pluses Dialogue: 0,0:17:10.12,0:17:15.65,Default,,0000,0000,0000,,or asterisks it kind of has this outward\Npushing feeling, right? Dialogue: 0,0:17:15.65,0:17:17.30,Default,,0000,0000,0000,,And so when you say, Dialogue: 0,0:17:17.30,0:17:19.30,Default,,0000,0000,0000,,I'm looking for something that starts with\Nan Dialogue: 0,0:17:19.30,0:17:21.50,Default,,0000,0000,0000,,F at the beginning of the line, followed Dialogue: 0,0:17:21.50,0:17:23.70,Default,,0000,0000,0000,,by one or more characters, followed by a Dialogue: 0,0:17:23.70,0:17:27.21,Default,,0000,0000,0000,,colon, you can think of this as pushing\Noutward. Dialogue: 0,0:17:27.21,0:17:32.10,Default,,0000,0000,0000,,So if we look at a line here that has From\Ncolon using the colon Dialogue: 0,0:17:32.10,0:17:37.40,Default,,0000,0000,0000,,character, it will try to expand, so it\Ncertainly has Dialogue: 0,0:17:37.40,0:17:42.59,Default,,0000,0000,0000,,to match the F and it's looking for a\Ncolon, any number of characters, Dialogue: 0,0:17:42.59,0:17:46.95,Default,,0000,0000,0000,,but it's trying to make the string that\Nmatches as big as possible. Dialogue: 0,0:17:46.95,0:17:49.73,Default,,0000,0000,0000,,So it skips over this colon and goes to\Nthat Dialogue: 0,0:17:49.73,0:17:51.95,Default,,0000,0000,0000,,colon and so the thing that we get is\Nhere. Dialogue: 0,0:17:51.95,0:17:56.11,Default,,0000,0000,0000,,And so, it ignored this and said I will\Nmake as large a string as I can. Dialogue: 0,0:17:57.27,0:17:59.49,Default,,0000,0000,0000,,So, that that's the plus that's doing it. Dialogue: 0,0:17:59.49,0:18:04.10,Default,,0000,0000,0000,,Dot plus pushes, it's like, I've got a Dialogue: 0,0:18:04.10,0:18:06.66,Default,,0000,0000,0000,,colon, but is there another colon out\Nthere? Dialogue: 0,0:18:06.66,0:18:09.01,Default,,0000,0000,0000,,So you push it, okay? Dialogue: 0,0:18:09.01,0:18:10.97,Default,,0000,0000,0000,,So that's greedy matching. Dialogue: 0,0:18:10.97,0:18:14.86,Default,,0000,0000,0000,,It can get you in some trouble, like being\Ngreedy Dialogue: 0,0:18:14.86,0:18:18.21,Default,,0000,0000,0000,,in general, and both asterisk and plus sort\Nof behave Dialogue: 0,0:18:18.21,0:18:20.42,Default,,0000,0000,0000,,in a greedy way because they're zero more\Nor one Dialogue: 0,0:18:20.42,0:18:24.24,Default,,0000,0000,0000,,or more characters, so they can sort of\Npush outward, okay? Dialogue: 0,0:18:26.33,0:18:28.11,Default,,0000,0000,0000,,Now you can turn this off. Dialogue: 0,0:18:28.11,0:18:31.80,Default,,0000,0000,0000,,It's a programming language, we can tweak\Nit, okay? Dialogue: 0,0:18:31.80,0:18:35.79,Default,,0000,0000,0000,,And so we add a question mark. Dialogue: 0,0:18:35.79,0:18:40.83,Default,,0000,0000,0000,,So this is a three-character sequence now.\NSo if you say dot plus question Dialogue: 0,0:18:40.83,0:18:46.07,Default,,0000,0000,0000,,mark, that says one or more of any\Ncharacters, push, Dialogue: 0,0:18:46.07,0:18:51.68,Default,,0000,0000,0000,,but instead of being greedy and pushing as\Nfar as you can, this means stop Dialogue: 0,0:18:51.68,0:18:57.17,Default,,0000,0000,0000,,at the first. Stop at the first. Dialogue: 0,0:18:57.17,0:18:59.45,Default,,0000,0000,0000,,Oops, stop at the first. Dialogue: 0,0:18:59.45,0:19:01.80,Default,,0000,0000,0000,,I can never draw on this thing fast\Nenough. Dialogue: 0,0:19:01.80,0:19:03.26,Default,,0000,0000,0000,,Stop at the first. Dialogue: 0,0:19:03.26,0:19:04.02,Default,,0000,0000,0000,,Okay? Dialogue: 0,0:19:04.02,0:19:05.91,Default,,0000,0000,0000,,And that's it, just don't be greedy, don't Dialogue: 0,0:19:05.91,0:19:08.26,Default,,0000,0000,0000,,try to make the string as large as\Npossible. Dialogue: 0,0:19:08.26,0:19:11.17,Default,,0000,0000,0000,,Go with the smaller one, the smaller\Npossible one. Dialogue: 0,0:19:11.17,0:19:13.15,Default,,0000,0000,0000,,We still need to find an F, and we still\Nneed Dialogue: 0,0:19:13.15,0:19:16.62,Default,,0000,0000,0000,,to find a colon, but when you find the\Nfirst colon, stop. Dialogue: 0,0:19:16.62,0:19:18.85,Default,,0000,0000,0000,,And so what this does is this changes it\Nso that Dialogue: 0,0:19:18.85,0:19:22.69,Default,,0000,0000,0000,,what we match is from colon instead of\Ngoing all the way. Dialogue: 0,0:19:22.69,0:19:26.92,Default,,0000,0000,0000,,So the greedy match pushes as far as it\Ncan. The non-greedy match Dialogue: 0,0:19:26.92,0:19:32.70,Default,,0000,0000,0000,,is satisfied with the first thing that\Nmeets the criterion of the string. Dialogue: 0,0:19:32.70,0:19:35.78,Default,,0000,0000,0000,,So this is a little three-character\Nprogramming sequence, Dialogue: 0,0:19:35.78,0:19:38.78,Default,,0000,0000,0000,,any character one or more times and not\Ngreedy. Dialogue: 0,0:19:48.46,0:19:50.57,Default,,0000,0000,0000,,If, for example, we were trying to solve the\Nproblem Dialogue: 0,0:19:50.57,0:19:53.36,Default,,0000,0000,0000,,of pulling the email address out of a\Nstring. Dialogue: 0,0:19:54.51,0:19:55.01,Default,,0000,0000,0000,,Right? Dialogue: 0,0:19:57.26,0:20:00.88,Default,,0000,0000,0000,,We can make good use of this non-blank\Ncharacter Dialogue: 0,0:20:00.88,0:20:04.35,Default,,0000,0000,0000,,and so the at sign is just a character and Dialogue: 0,0:20:04.35,0:20:07.68,Default,,0000,0000,0000,,then we can say, I want at least one\Nnon-blank Dialogue: 0,0:20:07.68,0:20:11.50,Default,,0000,0000,0000,,character before it and at least one\Nnon-blank character after it. Dialogue: 0,0:20:11.50,0:20:15.98,Default,,0000,0000,0000,,So the way regular expressions does it\Nsays, okay, I find my at sign and Dialogue: 0,0:20:15.98,0:20:19.80,Default,,0000,0000,0000,,I push in a greedy manner outwards, as Dialogue: 0,0:20:19.80,0:20:22.17,Default,,0000,0000,0000,,long as there are non-blank characters,\Npush, push, push, push, Dialogue: 0,0:20:22.17,0:20:26.59,Default,,0000,0000,0000,,push, push, push, oops, stop.\NPush, push, push, push, push, stop. Dialogue: 0,0:20:26.59,0:20:27.27,Default,,0000,0000,0000,,Okay? Dialogue: 0,0:20:27.27,0:20:30.46,Default,,0000,0000,0000,,So it's some number of non-blank\Ncharacters, an Dialogue: 0,0:20:30.46,0:20:33.04,Default,,0000,0000,0000,,at sign, followed by some number of\Nnon-blank characters. Dialogue: 0,0:20:33.04,0:20:38.08,Default,,0000,0000,0000,,So it's, that's using greedy matching. It,\Nit's doing that, okay? Dialogue: 0,0:20:38.08,0:20:41.38,Default,,0000,0000,0000,,And so this is where we get Stephen\NMarquard, we can, and, Dialogue: 0,0:20:41.38,0:20:45.87,Default,,0000,0000,0000,,and we would know if there wasn't there by\Nthe empty list, right? Dialogue: 0,0:20:45.87,0:20:51.04,Default,,0000,0000,0000,,And so we get stephen.marquard@uct.ac.za. Dialogue: 0,0:20:53.04,0:20:59.35,Default,,0000,0000,0000,,Now, we can also fine-tune what we\Nextract, right? Dialogue: 0,0:20:59.35,0:21:05.47,Default,,0000,0000,0000,,In the previous slide, we extracted\Nwhatever matched. Dialogue: 0,0:21:05.47,0:21:06.07,Default,,0000,0000,0000,,Right? Dialogue: 0,0:21:06.07,0:21:10.31,Default,,0000,0000,0000,,Whatever this matched, it looked across\Nthe whole string and found it, Dialogue: 0,0:21:10.31,0:21:14.63,Default,,0000,0000,0000,,found the thing, shoved it over, and gave\Nus what it matched. Dialogue: 0,0:21:14.63,0:21:18.58,Default,,0000,0000,0000,,But it's possible to make the match larger\Nthan what's extracted, Dialogue: 0,0:21:18.58,0:21:22.86,Default,,0000,0000,0000,,to extract a subset of the match, and we'll\Nsee that on this next slide. Dialogue: 0,0:21:22.86,0:21:23.79,Default,,0000,0000,0000,,Okay? Dialogue: 0,0:21:23.79,0:21:29.95,Default,,0000,0000,0000,,So here's this same thing, which is an at\Nsign followed, and then Dialogue: 0,0:21:29.95,0:21:33.89,Default,,0000,0000,0000,,with non-blank characters as far as the\Neye can see in either direction. Dialogue: 0,0:21:33.89,0:21:37.45,Default,,0000,0000,0000,,But I'm going to add to it caret From\Nspace. Dialogue: 0,0:21:37.45,0:21:44.47,Default,,0000,0000,0000,,So, so this has to be start with, the\Nfirst character has to be a caret, this, Dialogue: 0,0:21:44.47,0:21:45.81,Default,,0000,0000,0000,,it's gotta have the word From, Dialogue: 0,0:21:45.81,0:21:50.56,Default,,0000,0000,0000,,it's gotta have one space and then,\Nimmediately, it's gotta find this, right? Dialogue: 0,0:21:50.56,0:21:53.50,Default,,0000,0000,0000,,It's gotta find a series of non-blanks,\Nfollowed by an at sign, Dialogue: 0,0:21:53.50,0:21:57.62,Default,,0000,0000,0000,,followed by another series of one or\Nmore non-blanks. And then Dialogue: 0,0:21:57.62,0:22:00.49,Default,,0000,0000,0000,,what we do, so this, if we didn't put\Nthese parentheses Dialogue: 0,0:22:00.49,0:22:03.90,Default,,0000,0000,0000,,in, it would match and we would get all of\Nthis data. Dialogue: 0,0:22:03.90,0:22:04.78,Default,,0000,0000,0000,,It would go to here. Dialogue: 0,0:22:05.90,0:22:09.22,Default,,0000,0000,0000,,But what we can do with the parentheses,\Nthe parentheses are part Dialogue: 0,0:22:09.22,0:22:12.33,Default,,0000,0000,0000,,of the regular expression language,\Nsaying, Dialogue: 0,0:22:12.33,0:22:14.62,Default,,0000,0000,0000,,okay, I want to match the whole thing. Dialogue: 0,0:22:14.62,0:22:17.19,Default,,0000,0000,0000,,The parentheses aren't part of the care-,\Na string up here. Dialogue: 0,0:22:17.19,0:22:18.55,Default,,0000,0000,0000,,I want to match the whole thing, but Dialogue: 0,0:22:18.55,0:22:20.62,Default,,0000,0000,0000,,I only want to extract this part in\Nparentheses. Dialogue: 0,0:22:21.80,0:22:24.96,Default,,0000,0000,0000,,So this whole thing is a regular\Nexpression that's matched Dialogue: 0,0:22:24.96,0:22:28.68,Default,,0000,0000,0000,,and then the parentheses part is what's\Nretrieved for you. Dialogue: 0,0:22:28.68,0:22:31.62,Default,,0000,0000,0000,,And so this makes it so that the only time\Nit's going to Dialogue: 0,0:22:31.62,0:22:35.14,Default,,0000,0000,0000,,look for at signs is, are on lines that\Nstart with From space. Dialogue: 0,0:22:35.14,0:22:39.22,Default,,0000,0000,0000,,It is going to want the immediate next\Ncharacter to be a non-blank. Dialogue: 0,0:22:40.59,0:22:42.92,Default,,0000,0000,0000,,Some number of non-blank characters\Nfollowed by an at sign, Dialogue: 0,0:22:42.92,0:22:45.58,Default,,0000,0000,0000,,some number of non-blank characters, it's\Ngoing to stop right there. Dialogue: 0,0:22:45.58,0:22:48.11,Default,,0000,0000,0000,,And it's only going to extract from here\Nto here, Dialogue: 0,0:22:48.11,0:22:50.56,Default,,0000,0000,0000,,and so we get out Stephen Marquard. Dialogue: 0,0:22:50.56,0:22:55.86,Default,,0000,0000,0000,,But this is a pretty narrowly scoped thing\Nbecause Dialogue: 0,0:22:55.86,0:22:57.69,Default,,0000,0000,0000,,the first four characters have to be From\Nspace. Dialogue: 0,0:22:57.69,0:23:00.64,Default,,0000,0000,0000,,And so that's a way to combine a stricter\Nmatch, Dialogue: 0,0:23:00.64,0:23:03.97,Default,,0000,0000,0000,,even though you don't actually want\Nall the data. Dialogue: 0,0:23:03.97,0:23:05.86,Default,,0000,0000,0000,,So you can add those things all over the\Nplace. Dialogue: 0,0:23:05.86,0:23:09.33,Default,,0000,0000,0000,,Okay? Okay. Dialogue: 0,0:23:09.33,0:23:15.45,Default,,0000,0000,0000,,Then, we, we, we can compare the different\Nways of extracting data. Dialogue: 0,0:23:15.45,0:23:19.73,Default,,0000,0000,0000,,So if we look at how we extract the host\Nname. Dialogue: 0,0:23:19.73,0:23:23.20,Default,,0000,0000,0000,,Remember how we did this many chapters ago. Dialogue: 0,0:23:23.20,0:23:26.08,Default,,0000,0000,0000,,So we did a data.find, which says oh, Dialogue: 0,0:23:26.08,0:23:29.85,Default,,0000,0000,0000,,the first at sign is at 21.\NSo the first at sign is at 21. Dialogue: 0,0:23:29.85,0:23:34.33,Default,,0000,0000,0000,,Then we say we want to find the space\Nafter that. Dialogue: 0,0:23:34.33,0:23:38.97,Default,,0000,0000,0000,,So that's the at position, that's 31.\NAnd then we want to extract the data Dialogue: 0,0:23:38.97,0:23:44.46,Default,,0000,0000,0000,,that's one beyond the at up to but not\Nincluding the space. Dialogue: 0,0:23:45.71,0:23:47.54,Default,,0000,0000,0000,,And that is the variable that we are\Ngoing to print out, host. Dialogue: 0,0:23:47.54,0:23:51.61,Default,,0000,0000,0000,,And so we've extracted this bit of\Ninformation and out comes the host. Dialogue: 0,0:23:51.61,0:23:52.88,Default,,0000,0000,0000,,Quite nice. Okay? Dialogue: 0,0:23:53.88,0:23:57.31,Default,,0000,0000,0000,,We also saw another technique, and by the\Nway, all these techniques are okay. Dialogue: 0,0:23:58.68,0:24:00.32,Default,,0000,0000,0000,,All these techniques are fine. Dialogue: 0,0:24:00.32,0:24:02.30,Default,,0000,0000,0000,,Another technique we saw, once we sort of\Nplayed Dialogue: 0,0:24:02.30,0:24:04.30,Default,,0000,0000,0000,,with split and lists, was what we, what I Dialogue: 0,0:24:04.30,0:24:07.91,Default,,0000,0000,0000,,called a double split version of this,\Nwhere the Dialogue: 0,0:24:07.91,0:24:09.74,Default,,0000,0000,0000,,first thing we do is we split that line. Dialogue: 0,0:24:11.89,0:24:15.74,Default,,0000,0000,0000,,The first thing we do is split the line\Nand then we know, and blanks, Dialogue: 0,0:24:19.04,0:24:23.75,Default,,0000,0000,0000,,that the second thing, which is the\Nsub one, words sub one, Dialogue: 0,0:24:23.75,0:24:28.72,Default,,0000,0000,0000,,is the entire email address. Then this is\Nthe double split. Dialogue: 0,0:24:28.72,0:24:32.26,Default,,0000,0000,0000,,We take the email address and we split it by Dialogue: 0,0:24:32.26,0:24:34.95,Default,,0000,0000,0000,,an at sign and then we get a list of the Dialogue: 0,0:24:34.95,0:24:38.18,Default,,0000,0000,0000,,pieces of the email address, the email\Nname and the Dialogue: 0,0:24:38.18,0:24:44.00,Default,,0000,0000,0000,,email host, and then we grab the, the\Nsub one of that, Dialogue: 0,0:24:44.00,0:24:45.42,Default,,0000,0000,0000,,and then we have the host. Dialogue: 0,0:24:45.42,0:24:49.53,Default,,0000,0000,0000,,So that's a double, the double split way\Nof doing this, right? Dialogue: 0,0:24:49.53,0:24:53.29,Default,,0000,0000,0000,,Now in this, we still haven't done\Nthe From yet, Dialogue: 0,0:24:53.29,0:24:57.15,Default,,0000,0000,0000,,but it is the double split way to do this. Dialogue: 0,0:24:57.15,0:25:03.50,Default,,0000,0000,0000,,So, if we think about how we would do\Nthis in a regular expression, okay? Dialogue: 0,0:25:03.50,0:25:12.32,Default,,0000,0000,0000,,We're going to say, look through the\Nstring, findall, we're going to, Dialogue: 0,0:25:12.32,0:25:15.36,Default,,0000,0000,0000,,use the findall, and the regular\Nexpression exploded up says Dialogue: 0,0:25:16.36,0:25:20.83,Default,,0000,0000,0000,,look through the string for an at.\NDo, do, do, do, do, do, got an at. Dialogue: 0,0:25:20.94,0:25:25.97,Default,,0000,0000,0000,,Then, oh, start extracting. End extracting. Dialogue: 0,0:25:25.97,0:25:28.52,Default,,0000,0000,0000,,And then this is another form of the Dialogue: 0,0:25:28.52,0:25:31.15,Default,,0000,0000,0000,,this is one character, it's a Dialogue: 0,0:25:31.15,0:25:35.30,Default,,0000,0000,0000,,single character, match any non-blank\Ncharacter, and Dialogue: 0,0:25:35.30,0:25:37.34,Default,,0000,0000,0000,,zero or more of them. Okay? Dialogue: 0,0:25:37.34,0:25:42.22,Default,,0000,0000,0000,,So find an at sign, start extracting, Dialogue: 0,0:25:42.22,0:25:47.98,Default,,0000,0000,0000,,end extracting, match, this is one character. Dialogue: 0,0:25:47.98,0:25:53.74,Default,,0000,0000,0000,,That is a set of possible matches, and\Nthat's some character, this means not. Dialogue: 0,0:25:56.88,0:25:58.99,Default,,0000,0000,0000,,Okay? Not a blank, that's a blank Dialogue: 0,0:25:58.99,0:26:01.10,Default,,0000,0000,0000,,right there, that's a blank character\Nright there. Dialogue: 0,0:26:01.10,0:26:03.90,Default,,0000,0000,0000,,Not a blank, as many times as you want. Dialogue: 0,0:26:03.90,0:26:05.05,Default,,0000,0000,0000,,You might want to, we might want to turn Dialogue: 0,0:26:05.05,0:26:07.52,Default,,0000,0000,0000,,that into a plus to guarantee at least one. Dialogue: 0,0:26:07.52,0:26:09.78,Default,,0000,0000,0000,,So that might be better done as a plus\Nright there. Dialogue: 0,0:26:13.68,0:26:15.88,Default,,0000,0000,0000,,So this is, probably make more sense as a\Nplus, to say, I Dialogue: 0,0:26:15.88,0:26:21.03,Default,,0000,0000,0000,,want at least, after the at sign, I want\Nat least one non-blank character. Dialogue: 0,0:26:26.21,0:26:30.80,Default,,0000,0000,0000,,And the parentheses simply say, I don't\Nwant the at sign. Dialogue: 0,0:26:30.80,0:26:35.62,Default,,0000,0000,0000,,So if the at sign, I really want those\Nnon-blank characters after the at sign. Dialogue: 0,0:26:35.62,0:26:38.55,Default,,0000,0000,0000,,Okay? So that's what I want to extract. Dialogue: 0,0:26:38.55,0:26:41.87,Default,,0000,0000,0000,,So it's like, go find the at sign. Dialogue: 0,0:26:41.87,0:26:43.64,Default,,0000,0000,0000,,Okay, great, found the at sign. Start Dialogue: 0,0:26:43.64,0:26:48.00,Default,,0000,0000,0000,,extracting, look for non-blank characters,\Nend extracting. Dialogue: 0,0:26:48.00,0:26:50.44,Default,,0000,0000,0000,,So pull that part out and put it right\Nthere. Dialogue: 0,0:26:53.01,0:26:56.29,Default,,0000,0000,0000,,Now an even cooler version of this that Dialogue: 0,0:26:56.29,0:26:59.07,Default,,0000,0000,0000,,you probably kind of imagined right away is, Dialogue: 0,0:27:01.36,0:27:07.47,Default,,0000,0000,0000,,we say, you know what, I would like this\Nfirst character, the first Dialogue: 0,0:27:07.47,0:27:13.35,Default,,0000,0000,0000,,part of the line to be From, with a blank,\Nfollowed by any number of characters, Dialogue: 0,0:27:17.16,0:27:20.93,Default,,0000,0000,0000,,followed by an at sign, so the at sign is\Nreal, then start Dialogue: 0,0:27:20.93,0:27:25.87,Default,,0000,0000,0000,,extracting, then any number of non-blank\Ncharacters, end extracting. Dialogue: 0,0:27:27.35,0:27:32.42,Default,,0000,0000,0000,,So this is a, this is like eight or nine\Nlines of Python Dialogue: 0,0:27:32.42,0:27:35.75,Default,,0000,0000,0000,,all rolled into one thing, okay? Dialogue: 0,0:27:38.80,0:27:44.20,Default,,0000,0000,0000,,So, start at the beginning of the line.\NLook for string From, with a space. Dialogue: 0,0:27:44.20,0:27:50.03,Default,,0000,0000,0000,,Then skip a bunch of characters looking\Nfor an at sign, skip characters until Dialogue: 0,0:27:50.03,0:27:53.37,Default,,0000,0000,0000,,you encounter an at sign, then start Dialogue: 0,0:27:53.37,0:27:58.43,Default,,0000,0000,0000,,extracting, match any non-blank, a single\Nnon-blank character. Dialogue: 0,0:27:58.43,0:28:00.64,Default,,0000,0000,0000,,This is kind of like one non-blank Dialogue: 0,0:28:00.64,0:28:03.86,Default,,0000,0000,0000,,character, one non-blank character, but\Nonce you Dialogue: 0,0:28:03.86,0:28:08.50,Default,,0000,0000,0000,,suffix it with the asterisk that changes it to\Nbe many non-blank characters. Dialogue: 0,0:28:10.60,0:28:13.02,Default,,0000,0000,0000,,And then stop extracting, okay? Dialogue: 0,0:28:14.05,0:28:19.43,Default,,0000,0000,0000,,And so, you know, and so it's like find\NFrom followed by a space, great. Dialogue: 0,0:28:20.59,0:28:22.25,Default,,0000,0000,0000,,That's the first part. Dialogue: 0,0:28:22.25,0:28:25.13,Default,,0000,0000,0000,,Now throw away characters until you find\Nan at sign. Dialogue: 0,0:28:26.13,0:28:28.11,Default,,0000,0000,0000,,Then start extracting. Dialogue: 0,0:28:28.11,0:28:31.48,Default,,0000,0000,0000,,Keep going with non-blank characters until\Nyou hit Dialogue: 0,0:28:31.48,0:28:34.18,Default,,0000,0000,0000,,the first blank characters and pull that\Npart out. Dialogue: 0,0:28:34.18,0:28:35.79,Default,,0000,0000,0000,,Now the result is we get the exact same Dialogue: 0,0:28:35.79,0:28:42.07,Default,,0000,0000,0000,,data. But with this added to it, we are\Nmuch more narrow in the kind of things Dialogue: 0,0:28:42.07,0:28:46.69,Default,,0000,0000,0000,,that we're looking for and if we get\Nnoisy data that like, something like, Dialogue: 0,0:28:46.69,0:28:52.82,Default,,0000,0000,0000,,you know, meet at Joe's, right?\NWe don't want that. Dialogue: 0,0:28:52.82,0:28:53.84,Default,,0000,0000,0000,,That won't match, right? Dialogue: 0,0:28:53.84,0:28:55.95,Default,,0000,0000,0000,,We want that to be like a False. Dialogue: 0,0:28:55.95,0:28:59.40,Default,,0000,0000,0000,,And, and it allows us to sort of really\Nfine-tune our matching Dialogue: 0,0:28:59.40,0:29:02.95,Default,,0000,0000,0000,,and extracting. And this is just the\Nbeginning, they are very, very powerful. Dialogue: 0,0:29:02.95,0:29:08.85,Default,,0000,0000,0000,,So, the last thing that I will show you is\Nsort of a program that is kind of like one Dialogue: 0,0:29:08.85,0:29:11.83,Default,,0000,0000,0000,,of the programs that we did in a previous\Nsection, Dialogue: 0,0:29:11.83,0:29:14.56,Default,,0000,0000,0000,,except now we're going to use regular\Nexpressions to do it. Dialogue: 0,0:29:14.56,0:29:16.26,Default,,0000,0000,0000,,So if you remember, we had this thing where Dialogue: 0,0:29:16.26,0:29:19.91,Default,,0000,0000,0000,,we're doing spam confidence, where we're\Nlooking for lines and Dialogue: 0,0:29:21.45,0:29:23.31,Default,,0000,0000,0000,,you know, and pulling this number out and then Dialogue: 0,0:29:23.31,0:29:26.43,Default,,0000,0000,0000,,calculating the average, or the\Nmaximum, or whatever. Dialogue: 0,0:29:26.43,0:29:31.64,Default,,0000,0000,0000,,And so here is a, we import the regular\Nexpression library, we open the file, Dialogue: 0,0:29:31.64,0:29:35.29,Default,,0000,0000,0000,,we're going to do this with the, appending\Nto the, a list, we'll put the list. Dialogue: 0,0:29:35.29,0:29:37.72,Default,,0000,0000,0000,,We'll put the numbers in a list rather\Nthan doing the calculation in a loop. Dialogue: 0,0:29:39.18,0:29:40.31,Default,,0000,0000,0000,,We strip the data. Dialogue: 0,0:29:40.31,0:29:42.16,Default,,0000,0000,0000,,Now, here's the key thing, right? Dialogue: 0,0:29:42.16,0:29:44.83,Default,,0000,0000,0000,,We're going to have a regular expression\Nthat says, Dialogue: 0,0:29:46.20,0:29:49.02,Default,,0000,0000,0000,,look for the first character being X,\Nfollowed by Dialogue: 0,0:29:49.02,0:29:51.06,Default,,0000,0000,0000,,a dash, followed by all this,\Nall this Dialogue: 0,0:29:51.06,0:29:54.74,Default,,0000,0000,0000,,exactly has to match literally, followed\Nby a colon. Dialogue: 0,0:29:54.74,0:30:00.95,Default,,0000,0000,0000,,And then there's a space, and then we\Nbegin extracting and we are looking for Dialogue: 0,0:30:00.95,0:30:06.43,Default,,0000,0000,0000,,the digit 0 through 9 or a dot and\Nwe are looking for one or Dialogue: 0,0:30:06.43,0:30:09.78,Default,,0000,0000,0000,,more, and then we end extracting. Dialogue: 0,0:30:09.78,0:30:12.72,Default,,0000,0000,0000,,So that's the, the parentheses are telling\Nus what to pull out. Dialogue: 0,0:30:12.72,0:30:15.40,Default,,0000,0000,0000,,So that just means that we're going to\Npull out those numbers, all Dialogue: 0,0:30:15.40,0:30:18.07,Default,,0000,0000,0000,,the digits and the numbers, until we get\Nsomething other, I mean, Dialogue: 0,0:30:18.07,0:30:21.01,Default,,0000,0000,0000,,all the digits and the period, and we'll\Nget something other than Dialogue: 0,0:30:21.01,0:30:24.38,Default,,0000,0000,0000,,a digit and a period, and we, and then\Nwe'll be done, okay? Dialogue: 0,0:30:24.38,0:30:30.03,Default,,0000,0000,0000,,And so if we, and so this is going to pull\Nthose numbers out and give us back a list. Dialogue: 0,0:30:30.03,0:30:31.47,Default,,0000,0000,0000,,Now the thing about it is, we have Dialogue: 0,0:30:31.47,0:30:34.71,Default,,0000,0000,0000,,to realize that sometimes this is not\Ngoing to match, because Dialogue: 0,0:30:34.71,0:30:37.70,Default,,0000,0000,0000,,we're sending every line, not just the\Nones that start Dialogue: 0,0:30:37.70,0:30:41.20,Default,,0000,0000,0000,,with X, we're sending every line through\Nthis and so Dialogue: 0,0:30:41.20,0:30:44.26,Default,,0000,0000,0000,,we need to know when we didn't get a\Nmatch. Dialogue: 0,0:30:44.26,0:30:48.00,Default,,0000,0000,0000,,And that, the way we know we didn't get a\Nmatch is if the list, the Dialogue: 0,0:30:48.00,0:30:52.46,Default,,0000,0000,0000,,number of items in the list that we got\Nback, is zero, then we're going to continue. Dialogue: 0,0:30:52.46,0:30:56.99,Default,,0000,0000,0000,,So this is kind of our if where we're\Nsearching for the needle in the haystack. Dialogue: 0,0:30:56.99,0:31:00.01,Default,,0000,0000,0000,,But then once we find what we are looking Dialogue: 0,0:31:00.01,0:31:02.45,Default,,0000,0000,0000,,for, the actual number that we are\Ninterested in, Dialogue: 0,0:31:04.56,0:31:07.98,Default,,0000,0000,0000,,is already sitting here in stuff sub zero.\NOkay? Dialogue: 0,0:31:07.98,0:31:10.57,Default,,0000,0000,0000,,And then we convert it to a float, we append it. Dialogue: 0,0:31:10.57,0:31:14.10,Default,,0000,0000,0000,,And when the loop is all done, we print\Nout the maximum. Dialogue: 0,0:31:14.10,0:31:14.81,Default,,0000,0000,0000,,Okay? Dialogue: 0,0:31:14.81,0:31:17.18,Default,,0000,0000,0000,,And so this is sort of encoding a number\Nof things Dialogue: 0,0:31:17.18,0:31:22.10,Default,,0000,0000,0000,,and ending up with a very, a very solid and\Nsafe matching. Dialogue: 0,0:31:22.10,0:31:25.91,Default,,0000,0000,0000,,So we're really, it's hard for this to\Nfind a line that's wrong and Dialogue: 0,0:31:25.91,0:31:29.59,Default,,0000,0000,0000,,you could even improve this a little bit\Nto make it even a little tighter Dialogue: 0,0:31:29.59,0:31:35.34,Default,,0000,0000,0000,,where we'd go find a number like 0.999.\NYou could say, oh, it's Dialogue: 0,0:31:35.34,0:31:41.04,Default,,0000,0000,0000,,all the numbers are zero dot, so Dialogue: 0,0:31:41.04,0:31:46.75,Default,,0000,0000,0000,,you could make this a little, a little more\Nprecise. Dialogue: 0,0:31:46.75,0:31:49.45,Default,,0000,0000,0000,,So it wouldn't, so it would even skip\Nthings that Dialogue: 0,0:31:49.45,0:31:52.58,Default,,0000,0000,0000,,you can make it, so it looks exactly the\Nway you want it to look. Dialogue: 0,0:31:52.58,0:31:54.69,Default,,0000,0000,0000,,So, I emphasize that this Dialogue: 0,0:31:54.69,0:31:57.38,Default,,0000,0000,0000,,is kind of a weird language and you need\Nsome kind of thing. Dialogue: 0,0:31:57.38,0:31:58.92,Default,,0000,0000,0000,,We talked about all these. Dialogue: 0,0:31:58.92,0:32:01.50,Default,,0000,0000,0000,,We have the beginning of the line, we have\Nthe end Dialogue: 0,0:32:01.50,0:32:03.83,Default,,0000,0000,0000,,of the line, matching any character, Dialogue: 0,0:32:03.83,0:32:07.62,Default,,0000,0000,0000,,matching space characters, matching\Nnon-whitespace characters. Dialogue: 0,0:32:07.62,0:32:12.75,Default,,0000,0000,0000,,Star is a modifier that says zero or more\Ntimes. Dialogue: 0,0:32:12.75,0:32:18.33,Default,,0000,0000,0000,,Star question mark is a modifier that says\Nzero or more times non-greedy. Dialogue: 0,0:32:18.33,0:32:20.70,Default,,0000,0000,0000,,Plus is one or more times. Dialogue: 0,0:32:20.70,0:32:24.54,Default,,0000,0000,0000,,Plus question mark is one or more times\Nnon-greedy. Dialogue: 0,0:32:24.54,0:32:27.28,Default,,0000,0000,0000,,When you have bracket syntax, it's a set, Dialogue: 0,0:32:27.28,0:32:30.61,Default,,0000,0000,0000,,it's a single character that's in the\Nlisted set. Dialogue: 0,0:32:30.61,0:32:32.53,Default,,0000,0000,0000,,So that's lower-case vowels. Dialogue: 0,0:32:33.71,0:32:35.28,Default,,0000,0000,0000,,You can also have the first, if the first Dialogue: 0,0:32:35.28,0:32:38.68,Default,,0000,0000,0000,,character of this is a caret, that flips it. Dialogue: 0,0:32:38.68,0:32:42.85,Default,,0000,0000,0000,,So that means everything except capital\NX, capital Y, capital Z. Dialogue: 0,0:32:42.85,0:32:45.40,Default,,0000,0000,0000,,So it's everything that's not in the set, Dialogue: 0,0:32:45.40,0:32:47.96,Default,,0000,0000,0000,,capital X, capital Y, capital Z, and then Dialogue: 0,0:32:47.96,0:32:51.08,Default,,0000,0000,0000,,you can also put dashes in to represent\Nranges. Dialogue: 0,0:32:51.08,0:32:53.39,Default,,0000,0000,0000,,Bracket a through z and 0 through 9,\Nand lower-case Dialogue: 0,0:32:53.39,0:32:58.45,Default,,0000,0000,0000,,letters and digits will match, but again,\Nthis is a single character. Dialogue: 0,0:32:58.45,0:33:00.75,Default,,0000,0000,0000,,Now, you can put a plus or a star after Dialogue: 0,0:33:00.75,0:33:04.44,Default,,0000,0000,0000,,these guys to make them happen more than\None time. Dialogue: 0,0:33:04.44,0:33:05.68,Default,,0000,0000,0000,,And you can even put them in twice. Dialogue: 0,0:33:05.68,0:33:12.24,Default,,0000,0000,0000,,So if I wanted a two-digit number, I could\Nsay 0 dash 9, 0 dash 9. Dialogue: 0,0:33:12.81,0:33:14.87,Default,,0000,0000,0000,,Oops. This is one character. Dialogue: 0,0:33:14.87,0:33:18.35,Default,,0000,0000,0000,,This is one character and this is the\Npossible things. Dialogue: 0,0:33:18.35,0:33:22.34,Default,,0000,0000,0000,,So that's, you know, 0 0\Nwould match. Dialogue: 0,0:33:22.34,0:33:26.28,Default,,0000,0000,0000,,1 0 would match, 99 would match, etc. Dialogue: 0,0:33:26.28,0:33:26.98,Default,,0000,0000,0000,,Okay? Dialogue: 0,0:33:29.02,0:33:31.98,Default,,0000,0000,0000,,And then the parentheses are the things\Nthat if you Dialogue: 0,0:33:31.98,0:33:34.34,Default,,0000,0000,0000,,are in the middle of a big long matching\Nstring and Dialogue: 0,0:33:34.34,0:33:37.25,Default,,0000,0000,0000,,you don't want to extract the whole thing,\Nyou can limit the Dialogue: 0,0:33:37.25,0:33:40.47,Default,,0000,0000,0000,,things you're extracting to, to the stuff\Nthat's just in there. Dialogue: 0,0:33:41.48,0:33:43.99,Default,,0000,0000,0000,,With all these characters that have all\Nthis meaning, Dialogue: 0,0:33:43.99,0:33:46.31,Default,,0000,0000,0000,,we have to have a way to match those\Ncharacters. Dialogue: 0,0:33:46.31,0:33:50.10,Default,,0000,0000,0000,,So dollar sign is the end of a line. Dialogue: 0,0:33:50.10,0:33:51.84,Default,,0000,0000,0000,,But what if we're looking for something that Dialogue: 0,0:33:51.84,0:33:53.36,Default,,0000,0000,0000,,actually has a dollar sign in the string? Dialogue: 0,0:33:54.76,0:33:56.83,Default,,0000,0000,0000,,And that's what the backslash is for. Dialogue: 0,0:33:56.83,0:33:58.47,Default,,0000,0000,0000,,So if you put the backslash in front of Dialogue: 0,0:33:58.47,0:34:04.32,Default,,0000,0000,0000,,a otherwise meaningful character, you\Ndon't, it becomes the actual character. Dialogue: 0,0:34:04.32,0:34:06.97,Default,,0000,0000,0000,,So this is saying match a dollar sign. Dialogue: 0,0:34:06.97,0:34:09.25,Default,,0000,0000,0000,,Those two characters say match a dollar\Nsign. Dialogue: 0,0:34:09.25,0:34:13.70,Default,,0000,0000,0000,,And then this says one character that's\N0 through 9 or a, or a dot. Dialogue: 0,0:34:13.70,0:34:16.94,Default,,0000,0000,0000,,And then we put the plus modifier to say Dialogue: 0,0:34:16.94,0:34:19.92,Default,,0000,0000,0000,,at least one or more times and so that sort\Nof is Dialogue: 0,0:34:19.92,0:34:21.36,Default,,0000,0000,0000,,a greedy, of course. Dialogue: 0,0:34:21.36,0:34:25.18,Default,,0000,0000,0000,,So that will get us this and extract it,\Nincluding the dollar sign. Dialogue: 0,0:34:25.18,0:34:28.27,Default,,0000,0000,0000,,So the escape character is the backslash. Dialogue: 0,0:34:29.29,0:34:31.18,Default,,0000,0000,0000,,Okay. So there we are. Dialogue: 0,0:34:31.18,0:34:32.37,Default,,0000,0000,0000,,Now we're done. Dialogue: 0,0:34:32.37,0:34:34.55,Default,,0000,0000,0000,,So this is little bit cryptic. Dialogue: 0,0:34:34.55,0:34:38.04,Default,,0000,0000,0000,,It's, it's kind of a puzzle. Dialogue: 0,0:34:38.04,0:34:38.76,Default,,0000,0000,0000,,It's kind of fun. Dialogue: 0,0:34:38.76,0:34:42.85,Default,,0000,0000,0000,,And it's extremely powerful.\NAnd you don't have to know it. Dialogue: 0,0:34:42.85,0:34:43.75,Default,,0000,0000,0000,,You don't have to learn it. Dialogue: 0,0:34:45.24,0:34:48.88,Default,,0000,0000,0000,,But if you do, you'll find that it's very\Nuseful as we sort Dialogue: 0,0:34:48.88,0:34:53.24,Default,,0000,0000,0000,,of dig through data and are trying to\Nwrite things that are pretty quick. Dialogue: 0,0:34:53.24,0:34:58.52,Default,,0000,0000,0000,,And, and, and they, the thing I like about\Nregular expressions is that they Dialogue: 0,0:34:58.52,0:35:03.48,Default,,0000,0000,0000,,tend to be, if you write them well, they\Ntend to be less sensitive to bad data. Dialogue: 0,0:35:04.67,0:35:06.61,Default,,0000,0000,0000,,They tend to ignore data, they're, you Dialogue: 0,0:35:06.61,0:35:09.80,Default,,0000,0000,0000,,can put more detail, I exactly want this. Dialogue: 0,0:35:09.80,0:35:10.17,Default,,0000,0000,0000,,Whereas you're, Dialogue: 0,0:35:10.17,0:35:12.24,Default,,0000,0000,0000,,if you're writing find and extract, you're Dialogue: 0,0:35:12.24,0:35:14.29,Default,,0000,0000,0000,,making a lot of assumptions about the\Ndata. Dialogue: 0,0:35:14.29,0:35:17.44,Default,,0000,0000,0000,,That it's clean and you're not going to,\Nyou know, mis-hit on something. Dialogue: 0,0:35:17.44,0:35:21.51,Default,,0000,0000,0000,,So, okay, well, good luck, and you're Dialogue: 0,0:35:21.51,0:35:23.54,Default,,0000,0000,0000,,used to regular expressions, and we'll\Nsee you later.