In this section, we're gonna look at a new
form of data, called a table. And once we
look at how tables work, then we're gonna
play around with code that manipulates
tables. So it's very similar to the way
earlier we did images and then looked at
the code that manipulates images. The code
to work with tables will actually in some
ways look, similar to the code that worked
on images. So my goal is that the real
patterns that make any sorta code work are
gonna start coming through. So, tables are
a really common way to organize data on
the computer. So as a running example for
this section, I'm gonna use the social
security baby names database. So the
social security administration does
retirement benefits and stuff in the US.
But they also happen to track, every year.
What names are given to babies born in
that year in the US? And so that's gonna
be kinda fun data set that we're gonna
use, So here I've, I've structured this as
an example of a table. So, as I was
saying, table's a way of storing data.
It's basically, you can think of it as
like a rectangle. So the way the table
works is that it is first organized into
fields. So the baby data is organized into
four fields and the fields are name, rank,
gender and year, Look at the other fields
as basically as the columns that make this
thing up, And then the data is stored in
what we'll call rows. So here's the first
row has the data for the name Jacob, so it
says the name is Jacob, the rank is one
for that name and what rank one for this
data set is that Jacob is the most popular
boy name for babies born in 2010. Then we
have gender boys in years 2010. So the
second row has another name. So each name
has its own row. So in this case it says
the name is Isabella, the rank is one. So
what that means is Isabella was The most
popular girl name for babies born in 2010.
So, then we see, Ethan has rank two for
boy names. Sophia has rank two for girls,
and so on. So it, the, the table just has
all the names. In this case there, they're
shown, sorted by rank. So there's o ne row
per name. In this case it has the 1,000
top boy names and the 1,000 top girl
names. So, it's, there's 2,000 rows
overall. So as I was saying, tables are
really common for storing all sorts of
data on the computer. You may have heard
the term database. So, a database is a
related concept to this, sort of simple,
basic idea of a table. Generally the way
this works is that the fields are, are, or
you can think of them as the categories,
the number of fields is not very big.
Fields, and there might be eight or ten or
something. So they represent kinda the
fixed categories we wanna keep track of.
And then the number of rows could be
enormous. It might be millions or maybe
even billions of rows. So I'll just,
mention a couple examples. So you could
think of your, your email inbox is maybe
stored in a table on the computer. So the
way that would work is, well, what would
the fields be? The fields might be
something like from, and to, and date, and
subject, and, you know, a few other things
that you store, per message. And then one
row is just one message. So each message
gets its own row, and then we have this,
fixed number of fields. So then when you
go to your inbox, well, there might be.
10,000 rows in there for all your email
and maybe when you go to your inbox it
just selects the ten most recent ones and
shows you, maybe not all the fields, but
maybe the most important fields from that
message. Another example is Craig's List.
Or, you know, any sorta online auction
site. Where maybe it's stored, it could be
stored in a table where one row is gonna
be one item for sale. And then the fields
would again be sorta the categories that
you want for one item. So the categories,
the fields might be the price, the date it
was listed. Maybe a short description, and
a long description, and a few things like
that. So those are just a couple examples
of how many of the things you deal with
day to day often, back on the computer,
that's gonna be stored in some kinda
table. Alright, so to make this real, I
wanna look at co de to manipulate, tables.
And I'm gonna use the baby name table as
sort of our, our working example for a, a
couple sections here. So, in this case,
the baby data for 2010 is stored in,
baby-2010.csv. I should just mention, CSV
stands for Comma Separated Values. It's a
standard for storing, essentially table
data in a text file, and it's a really
simple, fairly old standard. So it's a
pretty, you know, easy way to interchange
data from one program to another. So in
terms of the code, I'll make my analogy to
images. So for images, we had four pixel
colon images, And that would loop through
all the pixels in the image, and for each
pixel. Everyone, whatever this code was
inside the colon braces. So, for the table
to be very similar we're going to have
four row colon table, And what that's
going to do is it's just going to loop
through each row through the table. So, it
just starts from the top and go through
each one. And for each row it's going to
run whatever code I put in the colon
braces. So, here is our first example.
That is the line, very similar to, loading
an image. So that's the line that, grabs
the table and stores it in a variable,
which I will inevitably just call the
table, And then here I have the four loop,
sorta looking through all the rows. And in
this case, the simplest thing I'm gonna do
is I'm just gonna say, print row. So, I'm
just gonna, essentially just, you know,
look at a, print each row in the data. So
this is the baby data, so if I run this.
There is row one and row two and so on, So
you can see that Jacob, Isabel, Ethan,
those fairly popular names. It actually
made my web page quite tall because of
course there is two thousand of these
things. So you know there's Courtney with
a K The 637 popular girl names. So it runs
all the way down here as I was saying.
Oops, to a, to a thousand. So Acre and an
Danea, So That is one thing, so what, I
guess what this shows, sort of, a bulk
output thing, but what it shows is, that
line ran 2,000 times. Once for each row in
the table. So, just as with the image, the
four loop just went through and looked at
each one. Alright so here I'm gonna comma
this out and run again just to get rid of
the output so I can have my webpage and
I'll be a mile high here. So what are we
gonna do with the table? Just looping
through and printing each row, that's like
[laugh], like for Craigslist or for your
email. That's never what you want. What we
want is to loop through all the rows and
just pick out the six or two of the 2,000
that we want. This is very common thing to
do with table [inaudible]. It is sometimes
called in database terminology a quarry.
That I'm going to kind of sort of narrow
down to just the rows I want. So, let's
talk about the code to do that. So
[inaudible] we're going to do this with an
IF statement, Put an IF statement inside
the loop and in the IF task we will write
a task to select just some of the rows. So
here's gonna be my first example. So here
is the four loop. So that's looping
through all the rows. And then inside the
four loop, I've got this if statement. So
what's gonna happen is, this highlighted
code is gonna run again and again and
again, once for each row in the thing. And
so what I've done. So I've, written a test
here, and my, the goal here is, in this
case, is to just pick out the rows where
the rank is six. And so, let me talk about
how that works. So what's gonna happen is
that highlighted test, that test is gonna
be evaluated once for every row. So in a
sense 2000 times. So, what I'm gonna do is
structure the test so it's true for a row
I care about. And then inside of here I'll
put a print, so it'll print the ones I
care about. In all the other rows this
will be false, and so it won't print the,
won't print those. All right, so how does
this work? So just as for the pixel, we
had get red and get green and get blue the
row has get field. And so you could,
remember we called it a row because all
the way across it has a bunch of different
fields. So you can say, well, which field
do you want? The way this works is each
field has a n ame. In this case, the names
are name, rank, gender and year. So in
this case, I say get field. And then,
within the parentheses, I say in a string,
which field do I want by name? So in this
case, I'm, like, oh right. I wanna go to
the row, and I wanna pick out the rank. So
this highlighted part that goes to the
row. And that picks out the rank. Just as
before we would have a pixel dot get red
and that would pick, that would pull the
red just out of the pixel, so this is
analogous but for a table. So now my call
here for this example is I wanted to just
show what the rows where the rank
[inaudible] required new little bit of
code. So having picked the rank out here,
then I says equals, equals, which I think
we already used before, but two equal
signs next to each other that compares two
things for equality, it tested they are
the same. And so road get field rank
equal, equal six. What that says is, get
the rank out, and test if it's six. And if
it's six, we'll say that that's, the test
is true. And if it's not, we'll say it's
false. So, let me just try running this.
So if I run it, what's happened is, it
went through all 2,000 rows. And for these
two rows, that test was true, Because
that's the case where the, the rank was
six. And obviously, you know, I could say
it, like, 127 here or whatever. And then
we would get the two rows. It just
happens; each rank number has one boy name
and one girl name in the Stata set. So,
that's why I keep getting two rows here.
So let me try another example. Oh, also I
should mention a, a warning about this. So
I'll change this back to six, quick. So
this use of the two equals for equality is
a little odd in computer code. I think it
would be very reasonable to think, oh,
what, shouldn't there be just one equal
sign? Right? If rank equals six? And
unfortunately the single equal sign in
JavaScript already has been used for
variable assignment. It's kinda already
dedicated to meaning that. And so they
couldn't use it for quality, so that's why
there's this different symbol for equa
lity. Now, just for this class. So the,
it's actually a pretty common error coding
to sort of accidentally type a single
equal sign, when someone meant two equal
signs for comparison. In this case. I've
outfitted the run button with some special
checking code, where it notices if in an
if test, it sees a single equal sign, And
it gives this error message that basically
says, hey, did, did you maybe mean to use,
two equal signs? So, that is an easy error
to make, but. Hit the run button and we'll
catch it for you. That, that's something I
just did for this class, Alright so now
let me do a now let me do another example.
So the test I did before I tested if rank
was six but really any kind of test as we
were doing before with images, will work
here. So in this case what I'm going to do
is I want to go through the data set and I
want to find the data, let's just say, for
Alice. So as I mentioned before forget
field you can just patch in the name for
any field. So, you would need to know what
the field names are. For this data set
they are name ranked under here and here.
So, here I will go to the row and say, hey
give me the name field. So I'll say, name
there. And then I'll, I'll equals, equals,
test if the name is, is the same as Alice.
So, if I run that. In effect what this
does is it just pulls out the Alice row.
It goes through all the rows, does this
test, and if the name is Alice, let's hear
the English translation of this, then it
prints the row out. Alright, so that's the
basic pattern. So let me just work a few
examples for this. So, the pattern is
gonna be, [inaudible] just as I was doing.
We have a four loop, there's an if
statement side of it. And then really, all
of the action is in the parentheses of the
test. Where I say row.getfield something,
and I have some test about it. So let's
try these. So if I run it this way, we
pull out, it says, if name is equal, equal
to Alice, I get the Alice row. If I wanted
to look for something else, pull out some
other data, we could say Robert. So Alice
is 172. Ro bert is 54. Let's try Abby.
284. So, what's happening is, this
highlighted test is happening all 2000
times. And it's just a question of which
rows are we, are we picking out there? I
did Robert before. I'll show you something
kind of funny. If you do Bob and you run.
Nothing appears here. What's going on
there is actually no one names their kid
Bob. Apparently, so what's happening is
that we are getting no... Zero printing is
happening here. This thing was just never
true. That's sort of the pattern on the
form I guess for just as how people name
babies is that they tend of the form...
They put a long name, like Robert. So, and
then Bob is like, they don't put on the
form. Maybe that's just what they actually
call the kid. Alright, so let me try a
different test. Let's say I wanna test if
the rank is one. So I would change get
field, and I would type rank here. And
then the equals, equals. I can say one,
sure. So that gives me the two rows Jacob
and Isabelle. We saw four, those are rank
one. So. [inaudible], what was the other
one we did 1,000. So say rank equals a
thousand. And we get crew ending. So the
test we did earlier with images like less
than, less than equal to. All that stuff
works too. So let's say I wanna look at,
if the rank is less than ten. [inaudible]
say less than ten and when I run that. You
can see I get, rank one, rank two, rank
three, rank... All these are rank numbers
where the less than ten test is true.
Although you'll notice the last I get is
Aiden and Cloe, number nine. The rows
where rank is ten, I don't get. And that's
because this form of less than is a strict
less than. So it's true for nine but it's
not true for ten. If you want, there's
another form of less than where you're
like, where you wanna say less than or
equal to. And, I don't think we did this
for the images but it's just, what you do
is you put in an equal sign right after
it. That means less than or equal to. So
if I run it now then it goes through ten.
So, and that works for, greater than as
well. Alright, so let's try a, let's try a
greater than one. So I could say, I would
like to see all the rows where the rank is
greater than 990, let's say. And so what
I, so I get 991, 92, da, da, da, da, up
through 1000. Okay, let me just try one
more. I, so [inaudible] examples with name
and rank. And [inaudible] inevitably, I'm
calling, road.getfield, and just changing
what string is there to pull out a
different field. I'll try pulling out the,
the gender field. And this case, the way
the data's coded, the gender field is
it's, it's, it's just strings. So it's
either the string boy or string girl. So
if I were to say, if gender is equal,
equal to girl. Hit one then I get [sound]
I mean if you look where it say scroll
here, what's happened is I have just
gotten all 1,000 girl bros. And, and none
of the 1000 [inaudible] woops. Alrighty.
Sorry, let me get this back. So this is
ju-, just a trick where I comment out
print, so it prints nothing, and run it
again. So then, that way, it just, it just
blanks out the output here. So. Just to
repeat what the pattern is. So, t, t,
these first few lines were always the
same. And I guess I was always [inaudible]
the row. So the, that was always the same.
What I change is the if test. And the gist
of it, the pattern tended to be I would
say row.getField, whatever field I care
about. And then I would write equals,
equals or less than or equal to or
something. Let's say on the rank or equal,
equal to the name to, in a sense, pull out
the rows. And the rule was, I'm pulling
out a row, if this test is true. And so,
with that in mind, well this can be a good
source of some exercises.