Python for Informatics - Chapter 7 Files
-
0:00 - 0:02Welcome to Chapter Seven.
-
0:02 - 0:04Python for Informatics: Exploring
Information. -
0:04 - 0:05I'm Charles Severence.
-
0:05 - 0:10I'm the author of the book and your host.
And, as always, this is brought to you by. -
0:10 - 0:10No, I'm sorry.
-
0:10 - 0:15It's all creative copyright, Creative
Commons Attribution. -
0:15 - 0:19The audio, the video, the slides, and even
the book. -
0:19 - 0:21So, here we go.
-
0:21 - 0:25Oh, and and so, frankly, where
we've been working -
0:25 - 0:34all along is, we have been writing code
and talking to the CPU. -
0:34 - 0:37Hang on, let me, let me go get
my CPU and stuff. -
0:37 - 0:42Hang on, be right back.
-
0:44 - 0:50[SOUND]
Okay. -
0:50 - 0:54Here we go. Here we go.
-
0:54 - 0:59Here's all the stuff. Remember the stuff
from the first lecture? -
1:01 - 1:02There we go with that.
-
1:03 - 1:06Remember the motherboard from the first
lecture? -
1:06 - 1:08This is kind of the picture of what's on
the screen. -
1:09 - 1:12The motherboard, the CPU plugs in here,
memory plugs in here. -
1:12 - 1:18And remember how the CPU is sort of the
brains, as -
1:18 - 1:23much brains as there is, for the operation.
The CPU is asking what next. -
1:23 - 1:26The instructions come in through these
little pins. -
1:26 - 1:30There's data inside, and it stores sort of
semi-permanent -
1:30 - 1:33data, variables, are all stored pretty
much here in RAM. -
1:35 - 1:38And we write our programs, and so your
Python programs, they're sitting here -
1:38 - 1:44in this RAM, and they're being fed to this
CPU through those chips. -
1:44 - 1:45Through those pins, right?
-
1:45 - 1:48The pins, I mean it doesn't really connect
like that. -
1:48 - 1:52And so, so frankly, up to now, everything
that we've been doing -
1:52 - 1:55is just the Python programming language.
-
1:55 - 1:58And so the only place we've really been
operating is here. -
2:00 - 2:03We have been putting Python into the main
memory. -
2:03 - 2:06And the main memory. And we have
-
2:06 - 2:10been effectively feeding instructions to
the CPU, -
2:10 - 2:14the central processing unit, as it needed
them, and then the program would stop. -
2:14 - 2:16And everything we've done so far
-
2:16 - 2:17everything
-
2:17 - 2:22is just sort of fiddling around here.
We have never escaped it. -
2:22 - 2:26So now we are finally going to escape
-
2:26 - 2:28from the central processing unit and the
memory. -
2:29 - 2:32We'll still write programs and have
variables in here. -
2:33 - 2:39But now we're going to use the disk,
the secondary storage, the -
2:39 - 2:44permanent media, right?
So if I go grab my Raspberry Pi, -
2:44 - 2:46alright, that goes right there.
-
2:46 - 2:51Here's my Raspberry Pi, so here we've got
the Raspberry Pi, which is the small version, -
2:51 - 2:56which of course has a CPU, memory, and
-
2:56 - 2:59graphics processor, all in this little chip
right here. -
2:59 - 3:03But the secondary memory for the,
is this little -
3:03 - 3:06SD card that is the secondary memory for
Raspberry Pi. -
3:06 - 3:08So the structure of the Raspberry Pi is
-
3:08 - 3:09exactly the same as the structure
of any other -
3:09 - 3:13personal computer, it's just smaller and
less expensive. -
3:13 - 3:15And so in the Raspberry Pi, if you're
-
3:15 - 3:18programming the Raspberry Pi, you're sort
of finally escaping. -
3:18 - 3:20All your programs were in here.
-
3:20 - 3:24Your CPU is in here and that's pretty much
how, how far you've got to run. -
3:24 - 3:29But now, of course when you save your files,
you save them to here. -
3:29 - 3:35But now we are going to start looking at
data on the disk drive and so it's time -
3:35 - 3:39to escape to the secondary memory.
Okay? -
3:39 - 3:41Time to escape to the secondary memory.
-
3:41 - 3:44And Raspberry Pi, you can go right there.
Okay? -
3:44 - 3:46So it's time to find some data to mess
with. -
3:46 - 3:49So a lot of what we've been doing so far
is just -
3:49 - 3:53kind of the pre-work to get to the point
where we can do this. -
3:53 - 3:55And in here we're going to have data files.
-
3:55 - 3:56Now, we've been making data files.
-
3:56 - 4:00You've been writing, every Python program
that you write on your computer gets saved -
4:00 - 4:03as a file. Then Python reads the file and runs it.
-
4:04 - 4:07But now we're actually going to start
messing with some data. -
4:09 - 4:12And so, files are where we're going to be
working. -
4:12 - 4:17And so, one of things about secondary memory
is it's much larger. -
4:19 - 4:21And this is, main memory of the computer
is pretty large, it's just -
4:21 - 4:26not large enough to hold everything that
the computer is capable of holding. -
4:26 - 4:28So the files that we're going to work with.
-
4:28 - 4:32Now we're not talking about image files or
Quicktime movies or things like that. -
4:32 - 4:34We're going to work with text files
because the -
4:34 - 4:38theme of this course is digging through
text. -
4:38 - 4:39Sometimes we'll pull it off the Internet.
-
4:39 - 4:42Sometimes we'll read files, but it's
digging through and -
4:42 - 4:44using all the things that we've learned so
far, -
4:44 - 4:46looping and strings, and all those things,
-
4:46 - 4:49to make sense of a sequence of
information. -
4:51 - 4:52Okay?
-
4:52 - 4:58Now, to access file information, we have
to do this thing called opening the file. -
4:58 - 5:02We can't just say, yo, the information is
just omnipresent because there are -
5:02 - 5:06so much data that you can't have Python
sort of know all the data. -
5:06 - 5:09You literally have hundreds of thousands
of files on -
5:09 - 5:12your computer's hard drive.
And you, -
5:12 - 5:14which one are you going to read?
-
5:14 - 5:16So there's a step that you have to do,
-
5:16 - 5:19that you call this built-in function
called open. -
5:19 - 5:22And say, oh, this is the file that I
want to work with, -
5:22 - 5:24of the hundreds of thousands, and then
once you do, -
5:24 - 5:28you've kind of got this little
connector into it. -
5:28 - 5:32And the open is a built-in function inside
Python. -
5:32 - 5:34Hang on a sec, let's say good bye to that.
The open -
5:34 - 5:40function is a built-in function in Python,
and you, it takes two parameters. -
5:40 - 5:46The first parameter is the name of the
file, like mbox.txt, -
5:46 - 5:49and then the second is how you're going to
read it. -
5:49 - 5:49Are you going to read it?
-
5:49 - 5:50are you going to write it? et cetera.
-
5:50 - 5:53Now most of the time we'll be reading our
files. -
5:53 - 5:56So we call the open function and pass it
in the name of -
5:56 - 5:59the file we want to open, and then how we
want to read it. -
5:59 - 6:02Now you can leave this second parameter
off and it -
6:02 - 6:05assumes that you're going to want to read
the file. -
6:05 - 6:05Now.
-
6:09 - 6:12When the open is successful, it doesn't
actually read all -
6:12 - 6:17of the data because the memory is small,
small compared to -
6:17 - 6:19the hard drive, and so you have to sort of
-
6:19 - 6:22step through the data, you'll tell it when
to read it. -
6:22 - 6:27So the act of opening it is not
actually reading all data. -
6:27 - 6:31It is creating kind of like a connection
between the -
6:31 - 6:33memory and the data that's on the hard
drive, right? -
6:33 - 6:34It's connecting
-
6:34 - 6:38between, oh listen to this.
Oh that's going to fall down. -
6:38 - 6:42Is it going to stand up that way?
-
6:42 - 6:45Oh, I should come up with a way to
make that stand. -
6:46 - 6:48So it's a connection.
-
6:48 - 6:50So the, your program's kind of running in
here. -
6:50 - 6:54And the, and the file handle is just sort
of it's -
6:54 - 6:58like a phone call between your memory and
your disk drive. -
6:58 - 7:00It's not the actual data.
The actual data is still -
7:00 - 7:06sitting on the disk drive, okay?
So, a graphical way to take a look at this -
7:06 - 7:12is, the file handle, the thing that comes
back from the open request. -
7:12 - 7:15The open goes and finds the file out on
the disk drive and -
7:15 - 7:20yada, yada, yada, and then the handle is
something that lives in the memory. -
7:20 - 7:22that is sort of like the thing that
-
7:22 - 7:26maintains its connection to where all the
data is -
7:26 - 7:29on the disk or on the SD RAM that's in it.
-
7:29 - 7:31So the handle is not all the data, but it is
-
7:31 - 7:34a mechanism that you can use to get at the
data. -
7:34 - 7:38So if you print it out, it doesn't have
all the data from the file, -
7:38 - 7:44it says, I am a file handle that's opened
this file and we're in read mode. -
7:44 - 7:46So, that doesn't actually have the data,
-
7:46 - 7:48even though this is the data that's
in the file. -
7:48 - 7:51And then we have operations that we do to
the handle like open it, -
7:51 - 7:53close it, read it, write it.
So we do things. -
7:53 - 7:56So, so the handle and then through the
handle it actually changes -
7:56 - 7:59what's on the disk or reads
what's on the disk. -
7:59 - 8:02So the handle is kind of a thing that's
not there. -
8:03 - 8:06If you attempt to open a file and the name
of the file. -
8:06 - 8:09Now the way we're going to do these is
these need to be -
8:09 - 8:14in the same folder on your computer as in,
as your Python code. -
8:14 - 8:16Now, there are trickier ways to do it, but
-
8:16 - 8:17we're going to keep it simple.
-
8:17 - 8:19This is the name of a file in the
-
8:19 - 8:22same folder as the Python code that you're
running. -
8:22 - 8:28[SOUND] And if it's not, then we get, of
course, a traceback and we're -
8:28 - 8:32used to using, reading tracebacks by
now, no such file or directory stuff.txt. -
8:32 - 8:35Oh, of course, I forgot to save it or I
typed it wrong. -
8:38 - 8:39So.
-
8:39 - 8:43The next thing we have to learn is the
notion of the newline character. -
8:43 - 8:44You haven't seen this so far,
-
8:44 - 8:48but there's a special character in files
-
8:48 - 8:52that is used to indicate the end of a line.
-
8:52 - 8:54Because these text files that we've been
writing, -
8:54 - 8:58including Python programs that you have,
are organized into lines. -
8:58 - 9:00Each line has variable length and there is
-
9:00 - 9:03a special non-printing character that you
just don't see. -
9:03 - 9:06Now you see it because you see a line,
-
9:06 - 9:11multiple lines, but you don't see the
character itself. -
9:11 - 9:13So it turns out that this character is
very -
9:13 - 9:16important because the data is just a
stream of -
9:16 - 9:19characters on disk and then it's
punctuated by newlines -
9:19 - 9:22that tell it when it's time to end the
line. -
9:22 - 9:29So if we are building a string, the
constant for newline is backslash n. -
9:29 - 9:33And so, when we make a string that we
want to -
9:33 - 9:38have a newline in it, we'll say Hello
backslash n World. -
9:38 - 9:41And then if you print it out one way, you
actually see the backslash n. -
9:41 - 9:44But then if you use the print to print it
out, you see sort of -
9:44 - 9:50like the, it moves back down, you know,
to the left margin and down. -
9:50 - 9:56So, so, sometimes you see the slash n
and sometimes it's shown as movement. -
9:56 - 9:57Right? You, it moves it.
-
9:59 - 10:00The other thing that's important is even
-
10:00 - 10:02though we represent this as two
characters, -
10:02 - 10:06the backslash n is represented as two characters
in a string, it's actually one character. -
10:06 - 10:10So if we print it out, we see
X newline Y -
10:10 - 10:13and if we ask how many characters are
in stuff, -
10:13 - 10:17which is this string, it says 3.
That's important. -
10:17 - 10:18Okay?
-
10:18 - 10:22There is one, two, three.
The newline is a single character. -
10:22 - 10:27This is a just a syntax that we use to
sort of encode a newline in a string. -
10:28 - 10:28Okay?
-
10:29 - 10:34So, even though these are just a
-
10:34 - 10:37long sequence of characters punctuated by
newlines, -
10:37 - 10:41visually, text editors and operating
systems show them, show -
10:41 - 10:44these files to us as a sequence of lines.
-
10:44 - 10:46And it doesn't take very long to just
start thinking about them -
10:46 - 10:48as a sequence of lines.
-
10:48 - 10:51As a matter of fact, maybe you never, wish
I'd never told you about newlines. -
10:52 - 10:53But when we start reading files, we're
-
10:53 - 10:55going to have to deal with these newlines.
-
10:55 - 10:59So the way that we sort of have to
mentally visualize of what these text -
10:59 - 11:04files look like is they have a newline
that punctuates the end of the line. -
11:04 - 11:09Now in reality, if we look at this, this
R really comes right after it. -
11:09 - 11:09Right?
-
11:09 - 11:13This is all a bunch of characters and the
newlines are punctuation, okay? -
11:13 - 11:17To say this is first line, second line,
third line, and fourth line. -
11:17 - 11:19So, you gotta think that each of these
things -
11:19 - 11:22is here, sitting at the end of the line.
-
11:22 - 11:25And so the number of characters in this
line include that newline. -
11:25 - 11:27Now the newline is one character.
-
11:27 - 11:32Okay? So, how do we read these files?
-
11:32 - 11:36Well, we've already talked about doing an
open xfile. -
11:36 - 11:39And I'm just, this xfile, again that's
just a mneumonic -
11:39 - 11:42name that I made up. This is a handle.
-
11:42 - 11:44Remember, it's not all the data.
-
11:44 - 11:46But the handle is the way that we can read
the data. -
11:46 - 11:49We can use it as a access point.
-
11:49 - 11:52The coolest way to read a file, if it's a
text file in multiple -
11:52 - 11:58lines, is to use a determinant loop, a
for loop. for cheese in xfile. -
11:58 - 12:03So this, remember we would put a list of
numbers or a string here. -
12:03 - 12:04Now we've put a file
-
12:04 - 12:05handle here.
-
12:05 - 12:09Python knows automatically that each time
we are going to run this -
12:09 - 12:12loop, it's going to go to the next line of
the file. -
12:12 - 12:16Automatically, for, a cheese is just a
stupid name that I came up with it. -
12:16 - 12:20I would be better to call line rather than
cheese, but for cheese in and then it goes -
12:20 - 12:23dot, dot, dot, dot, dot, dot, dot,
each file -
12:23 - 12:26and then it stops when it reads
the whole file. -
12:26 - 12:29So this line will print out every line
-
12:29 - 12:34in the file, that's how you do it.
These three lines open a file, -
12:36 - 12:42read every line in the file, okay?
So a file handle itself is a special kind -
12:42 - 12:47of a sequence, much like a list of numbers
or a string is a sequence of characters. -
12:47 - 12:49So one of the things we can do to combine
one of -
12:49 - 12:52our counting idioms is count the number of
lines in a file. -
12:53 - 12:54Okay? And so how we
-
12:54 - 12:57would do that is we would open
the file, set a -
12:57 - 13:01counter to zero, this time I'll use a
mnemonic variable called count. -
13:01 - 13:03For line in fhand, that says run this
-
13:03 - 13:06indented text once for each line in the
file. -
13:06 - 13:08For each line in the file, add count equals
count plus 1. -
13:08 - 13:11When the for loop is done, print the
count. -
13:13 - 13:14Pretty straightforward.
-
13:14 - 13:18Very few other languages are capable of
writing that program in -
13:18 - 13:22as quick and as dense and succinct a way as
Python is. -
13:22 - 13:25Python does a really, really nice
job of this. -
13:25 - 13:28Okay? So that's how you count the lines.
-
13:28 - 13:31Open it, write a for loop, and then add
one. -
13:31 - 13:36Now we, we can't just say, so what you
can't do, and this gives you a sense. -
13:36 - 13:37You can't say len,
-
13:37 - 13:40fhand.
-
13:40 - 13:43And that's because this isn't really the
data. -
13:43 - 13:45That's sort of, you have to like pull the,
pull it -
13:45 - 13:48and read it to get the data out of it.
-
13:48 - 13:50Although we'll see another way of reading
it later. -
13:51 - 13:53Okay? So that's counting the lines in a
file. -
13:55 - 13:57It turns out you can also read the entire
file. -
13:59 - 14:02Now if you read the entire file, it's not
broken into lines. -
14:02 - 14:04You're getting all the characters
punctuated -
14:04 - 14:06by newlines and you get everything.
-
14:06 - 14:10Now you don't want to read this if it's
too big, so it's -
14:10 - 14:13going to all try to read it into the
memory of the computer. -
14:13 - 14:16And if the memory is not big enough,
you're going to slow down to a crawl. -
14:16 - 14:19But if it's a real tiny file, this works
just fine. -
14:19 - 14:22And so, so we have sort of real, we open
-
14:22 - 14:27a file and we say fhand.read, this is
basically saying, hey, -
14:27 - 14:31dear fhand, read it all and return it to
me as a string. -
14:32 - 14:34So that's a string with all the lines of
the file concatenated -
14:34 - 14:39together with newlines, which is actually
exactly what's in the file. -
14:39 - 14:40It's the raw data.
-
14:40 - 14:42That for loop sort of looks for the newline
-
14:42 - 14:44and does all of the stuff
automatically for us. -
14:44 - 14:45It's quite nice.
-
14:46 - 14:50So then we can, like, because inp is a
string at this point, -
14:50 - 14:51we can just print the length of it.
-
14:51 - 14:53And we can say, oh, there's 94,626
-
14:53 - 14:57characters that came from that file.
-
14:57 - 15:02It reads the whole thing, whole file,
reads the whole file. -
15:02 - 15:04We can also do things like, you know, slice
it now. -
15:04 - 15:10And so this is the first 20 characters,
up from zero up to, but not including, 20. -
15:10 - 15:13So this, this is our file. Okay?
-
15:13 - 15:16So that's reading through the whole file.
-
15:16 - 15:18So, let me go back a little bit, this is
the file that we're -
15:18 - 15:19going to play with.
-
15:20 - 15:25This file here that we're going to play
with in this class is a mailbox file. -
15:25 - 15:27And this is actual real data.
And these are real people. -
15:27 - 15:29And these are real dates, having to do
with -
15:29 - 15:32an open source project that I worked on
called Sakai. -
15:32 - 15:36I actually have a tattoo of Sakai here on
my shoulder. -
15:36 - 15:38Maybe in an upcoming lecture, I'll have a
-
15:38 - 15:40short-sleeved shirt, and show you my
tattoo. -
15:40 - 15:44But for now, I can't because I've got, got
clothes on. -
15:44 - 15:52So, but this is real data.
It's the mbox.txt, mbox.txt file. -
15:52 - 15:56So, so that's the file that we're going to
use for most of the next few assignments. -
15:56 - 15:58It'll be the same file. You'll get tired of it.
-
15:58 - 16:00And you'll get to know all these people,
Stephen, -
16:00 - 16:02Chen Wen, and all the people in the file.
-
16:05 - 16:06Okay, so.
-
16:07 - 16:10We can search for lines that have a
prefix. -
16:10 - 16:14This is kind of the find pattern from the
looping lecture. -
16:14 - 16:18So we're going to go through a list of, of
lines in a file, -
16:18 - 16:21and we're going to only print out the ones
that match a certain thing. -
16:21 - 16:23So again, we open the file up.
-
16:23 - 16:25We're going to write a for loop that's
going to say, for each line in the -
16:25 - 16:30file, if the line and then we can call a,
a utility function -
16:30 - 16:33inside of string, because line is a string.
-
16:33 - 16:35If line startswith From, print it out.
-
16:35 - 16:38So this means it's going to loop through
all of the lines in the -
16:38 - 16:43file and it's going to print the ones that
start with the string 'From:' -
16:45 - 16:46Okay?
-
16:46 - 16:50Again, four lines, complete Python program
to read this -
16:50 - 16:53file and print the lines that have a
prefix of from. -
16:55 - 16:59So, if you run this program, and I suggest
that you do, -
17:01 - 17:03this is what the output's going to look like.
-
17:04 - 17:07And it's like, wait a second, I'm seeing
the lines, -
17:10 - 17:14seeing the lines that have the froms, but
then I get these blank lines. -
17:17 - 17:19And why is that?
Why are these blank lines there? -
17:19 - 17:24If I look at the program, I mean, I'm not
printing blank lines. -
17:24 - 17:26I'm only printing lines that
start with from. -
17:26 - 17:28I'm not doing that, so why?
-
17:31 - 17:31What do you think?
-
17:32 - 17:33I'll give you a second.
-
17:35 - 17:38I've certainly done enough foreshadowing
in this lecture. -
17:38 - 17:41Well it turns out these newlines are the
problem. -
17:41 - 17:44So it turns out that the print, we've been
doing this -
17:44 - 17:47all along, you just, we didn't make a fuss
about it. -
17:47 - 17:50The print adds a newline at the end of
everything that it prints. -
17:50 - 17:53So the yellow newlines are coming from
the print statement. -
17:53 - 17:58But when we read the file, each line ends
in a newline. -
17:58 - 18:00So these green newlines are actually from
the file. -
18:03 - 18:06They're the ones from the file.
-
18:06 - 18:08So what's happening is we're seeing two
-
18:08 - 18:11newlines, and so that turns into a
blank line. -
18:12 - 18:14So, how do we deal with that?
-
18:14 - 18:19Well, we've got a string function that
conveniently solves that problem, okay? -
18:19 - 18:21And that is we're going to call rstrip.
-
18:21 - 18:25If you recall, we had strip, lstrip, and
rstrip to strip -
18:25 - 18:28white space on one side, on the other
side, or on both sides. -
18:28 - 18:30So in this one,
-
18:30 - 18:31we're going to use rstrip.
-
18:31 - 18:33We're going to say, we're going to read
the line, that -
18:33 - 18:36this line is going to have a newline in it.
-
18:36 - 18:40rstrip says pull white space, and the
newlines are also counted as white space. -
18:40 - 18:43Blanks or newlines are white space.
-
18:43 - 18:47And then we're going to replace this with
no newline in it. -
18:47 - 18:50Then we're going to ask if it starts with
a from and then we're going to print it -
18:50 - 18:52out, and then we go and we're going to
-
18:52 - 18:55see exactly what we're looking for
in this file. -
18:55 - 18:56And there's no newlines.
-
18:56 - 19:01So the newline that's coming out here
is the one from the print, not the -
19:01 - 19:04one from the file, because the one from
-
19:04 - 19:07the file got wiped out by that particular
line. -
19:08 - 19:08Okay?
-
19:10 - 19:13So another general pattern of these
file-based loops -
19:13 - 19:18that we have done this, is a skipping
pattern. -
19:18 - 19:20Now, you can do, the, the non-skipping
pattern -
19:20 - 19:23is where you're saying, I'm going to look
for lines -
19:23 - 19:26that start with from and do something to
them. -
19:26 - 19:30Sometimes you'll want to do something to
all, to, to the to, to, you want to say, -
19:30 - 19:33here's a bunch of lines I'm going to
skip, and then I'm going to do something. -
19:33 - 19:37So the skipping pattern uses continue.
-
19:37 - 19:39And so the first few lines here are the
same. -
19:39 - 19:42We open a file, we read each line
in the file, -
19:42 - 19:44but we're going to strip off the white
space. -
19:44 - 19:46You're going to get tired of typing these
three lines, -
19:46 - 19:47because you're going to do it a lot.
-
19:47 - 19:52Open the file, start reading the file,
strip the whitespace for each line. -
19:52 - 19:58And you can make it so that you can look
for some fact. -
19:58 - 20:01In this case, I'm going to say, if not
line startswith From, this -
20:01 - 20:05means this is true for all the lines that
don't start with from, -
20:05 - 20:09continue. And if you remember, continue
goes up. -
20:09 - 20:11So the continue says I'm done, it
finishes -
20:11 - 20:14the iteration, and it doesn't do anything
down here. -
20:14 - 20:15Okay?
-
20:15 - 20:18And so it, this is a, and then, we can do
something. -
20:18 - 20:21So, I've kind of flipped this, where I
said, these are the -
20:21 - 20:25things I'm interesting, interested in,
that's lines that start with from. -
20:25 - 20:26So, I'm going to skip the lines that
don't. -
20:26 - 20:28So I'm going to use continue.
-
20:28 - 20:32Either way you can do it, depending on the
complexity or how much. -
20:32 - 20:34Often when you're, this is a good pattern
when -
20:34 - 20:36you have lots of lines of code down here
-
20:36 - 20:38that you're going to do a lot of cool
stuff with. -
20:39 - 20:43You can also use things like in to select
lines. -
20:43 - 20:43Right?
-
20:43 - 20:51So I'm going to, I'm going to look for
lines that have @uct.ac.za in them. -
20:51 - 20:53So again, I'm going to open it up.
-
20:53 - 20:56I'm going to open these, go through each
line in the file. -
20:56 - 21:01I'm going to strip the white space out,
and [COUGH] -
21:01 - 21:03if not u-c-t,
-
21:03 - 21:08if this string is not in line, then I'm
going to continue. -
21:08 - 21:12So it's a way for me to skip all of the
lines that don't have this string in it. -
21:14 - 21:19So these lines do, that one has it too,
and then we're going to print it out. -
21:19 - 21:24It will print out the ones that make it past
here, okay? -
21:24 - 21:28So, but in is another way to do searching,
right, starts with, -
21:28 - 21:29et cetera.
-
21:31 - 21:38So one more thing that you might want to
try is, so we can count, right? -
21:38 - 21:40Now, and this is a pattern for prompting
for a file name. -
21:42 - 21:46And so, so here you, you'll get tired of
sort of -
21:46 - 21:49changing your code every time you want to
open a different file. -
21:49 - 21:51because you probably want to run the
program -
21:51 - 21:54with mbox once and mbox-short because,
just so you -
21:54 - 21:58can test it with different things of data.
So here's just another pattern. -
21:58 - 22:02We add this line to say raw_input, enter
the file name. -
22:02 - 22:05And there you go, we'll type in the file
name. -
22:05 - 22:08And then the thing that we open is
whatever we entered as the file name. -
22:08 - 22:11And then the rest of it is pretty much
yada yada. -
22:11 - 22:14So here I'm, it's reading the whole file.
-
22:14 - 22:17If the line starts with subject, count
equals count plus one. -
22:17 - 22:19And then there were 1797 subject
-
22:19 - 22:22lines in mbox.txt.
-
22:22 - 22:26There were 27 subject lines in
mbox-short.txt, okay? -
22:26 - 22:29So that's prompting for the file names.
-
22:29 - 22:31Now, open.
-
22:31 - 22:35The open statement fails if the file name
doesn't exist. -
22:35 - 22:37So, you might want to add a try and
-
22:37 - 22:40accept around that if you want to, if
you're just writing -
22:40 - 22:43code for yourself and you assume that
everything's okay, -
22:43 - 22:45then you don't have to write try accept
but if -
22:45 - 22:51you want to catch it [SOUND]
and catch a bad file name, -
22:51 - 22:56then you take the open which, and turn it
into these four lines. -
22:56 - 22:58So this is the code that we think might
blow up, -
23:00 - 23:01and it's going to blow up, we know it's
going to blow up. -
23:01 - 23:04If they enter a bad file name like
-
23:04 - 23:07na na boo boo, right, this is is going to
blow up. -
23:07 - 23:09So what do we do?
We use try and accept. -
23:09 - 23:10We put try
-
23:10 - 23:10around that.
-
23:10 - 23:14We're going to take out some insurance on
that particular line. -
23:14 - 23:17And then, if it fails, we're going to
print -
23:17 - 23:20this message and then say exit, to get
out. -
23:20 - 23:23So if you get a good file,
-
23:26 - 23:28if you get a good file, it works, skips the
-
23:28 - 23:32except, then runs the thing, prints out
the count. -
23:32 - 23:36That's what's happening here. If, on the
other hand, you get a bad file, -
23:37 - 23:42it comes here, open blows up, runs the
except, prints this out, and then quits. -
23:43 - 23:46So that's how this one works with a bad
file. -
23:47 - 23:49And now, no traceback, right?
-
23:54 - 23:55So we are
-
23:57 - 24:00It's kind of a short lecture.
We're done with Chapter Seven. -
24:01 - 24:04We open a file.
-
24:04 - 24:06We read the file.
-
24:06 - 24:09We take out white space at the end with
rstrip. -
24:09 - 24:12We had used string functions.
-
24:12 - 24:15So, this is kind of putting it all
together. -
24:15 - 24:17And it's kind of short little programs
now. -
24:17 - 24:22So, it's not.
And you know, starting now, -
24:22 - 24:25we are going to start putting these things
together and start actually doing work. -
24:25 - 24:28Because now, we have, from the first few
chapters, -
24:28 - 24:32we have basic capabilities of Python.
Now we have some data to work with. -
24:32 - 24:33Now going forward
-
24:33 - 24:37we are going to do increasingly
sophisticated things with that data. -
24:37 - 24:38So I can't wait to see in the next
lecture.
- Title:
- Python for Informatics - Chapter 7 Files
- Description:
-
This is Chapter 7 for Python for Informatics. www.pythonearn.com
All Lectures: http://www.youtube.com/playlist?list=PLlRFEj9H3Oj4JXIwMwN1_ss1Tk8wZShEJ - Video Language:
- English
- Team:
- Captions Requested
- Duration:
- 24:39
Claude Almansi edited English subtitles for Python for Informatics - Chapter 7 Files |