Return to Video

05-01-relational-algebra-1.mp4

  • 0:00 - 0:01
    This is the first of two
  • 0:01 - 0:04
    videos where we learn about relational algebra.
  • 0:04 - 0:07
    Relational Algebra is a formal language.
  • 0:07 - 0:09
    It's an algebra that forms
  • 0:09 - 0:12
    the underpinnings of implemented languages like SQL.
  • 0:12 - 0:14
    In this video we're going to
  • 0:14 - 0:15
    learn the basics of the
  • 0:15 - 0:19
    Relational Algebra Query Language and a few of the most popular operators.
  • 0:19 - 0:20
    In the second video we'll learn
  • 0:20 - 0:22
    some additional operators and some
  • 0:22 - 0:26
    alternate notations for relational algebra.
  • 0:26 - 0:28
    Now, let's just review first from
  • 0:28 - 0:29
    our previous video on relational
  • 0:29 - 0:31
    querying that queries over
  • 0:31 - 0:33
    relational databases operate on
  • 0:33 - 0:36
    relations and they also produce relations as a result.
  • 0:36 - 0:38
    So if we write a query
  • 0:38 - 0:39
    that operates say on the three
  • 0:39 - 0:41
    relations depicted here, the
  • 0:41 - 0:44
    result of that query is going to be a new relation.
  • 0:44 - 0:46
    And, in fact, we can
  • 0:46 - 0:47
    post queries on that
  • 0:47 - 0:49
    new relation or combine that
  • 0:49 - 0:52
    new relation with our previous relations.
  • 0:52 - 0:54
    So let's start out with Relational Algebra.
  • 0:54 - 0:56
    For the examples in
  • 0:56 - 0:57
    this video we're going to be
  • 0:57 - 1:00
    using a simple college admission relations database with three relations.
  • 1:00 - 1:02
    The first relation, the college
  • 1:02 - 1:03
    relation, contains information about the
  • 1:03 - 1:06
    college name, state, and enrollment of the college.
  • 1:06 - 1:08
    The second relation, the student
  • 1:08 - 1:10
    relation, contains an ID
  • 1:10 - 1:11
    for each student, the student's name,
  • 1:11 - 1:14
    GPA and the size of the high school they attended.
  • 1:14 - 1:16
    And, finally, the third relation contains
  • 1:16 - 1:18
    information about students applying to colleges.
  • 1:18 - 1:20
    Specifically, the student's ID, the
  • 1:20 - 1:21
    college name where they're
  • 1:21 - 1:22
    applying, the major they're
  • 1:22 - 1:25
    applying for and the decision of that application.
  • 1:25 - 1:28
    I've underlined the keys for these three relations.
  • 1:28 - 1:30
    As a reminder, a key is
  • 1:30 - 1:31
    an attribute or a set of
  • 1:31 - 1:33
    attributes whose value is
  • 1:33 - 1:35
    guaranteed to be unique.
  • 1:35 - 1:36
    So, for example, we're going
  • 1:36 - 1:38
    to assume the college names are unique,
  • 1:38 - 1:40
    student IDs are unique and that
  • 1:40 - 1:41
    students will only apply to
  • 1:41 - 1:46
    each college for a particular major one time.
  • 1:46 - 1:47
    So, we're going to have a
  • 1:47 - 1:48
    picture of these three relations at
  • 1:48 - 1:52
    the bottom of the slides throughout the video.
  • 1:52 - 1:54
    The simplest query in relational
  • 1:54 - 1:55
    algebra is a query
  • 1:55 - 1:57
    that is simply the name of a relation.
  • 1:57 - 1:58
    So, for example, we can
  • 1:58 - 2:01
    write a query, "student" and that's
  • 2:01 - 2:04
    a valid expression in relational algebra.
  • 2:04 - 2:05
    If we run that query on
  • 2:05 - 2:07
    our database we'll get as
  • 2:07 - 2:09
    a result a copy of the student relation.
  • 2:09 - 2:11
    Pretty straightforward .
  • 2:11 - 2:12
    Now what happens next
  • 2:12 - 2:13
    is that we're going to use
  • 2:13 - 2:15
    operators of the relational algebra
  • 2:15 - 2:19
    to filter relations, slice relations, and combine relations.
  • 2:19 - 2:23
    So, let's through those operators.
  • 2:23 - 2:26
    The first operator is the select operator.
  • 2:26 - 2:27
    So, the select operator is used
  • 2:27 - 2:30
    to pick certain rows out of a relation.
  • 2:30 - 2:31
    The select operator is denoted by
  • 2:31 - 2:33
    a Sigma with a subscript--that's
  • 2:33 - 2:35
    the condition that's used to
  • 2:35 - 2:37
    filter the rows that we extract from the relations.
  • 2:37 - 2:40
    So, we're just going through three examples here.
  • 2:40 - 2:41
    The first example says that we
  • 2:41 - 2:42
    want to find the students
  • 2:42 - 2:44
    whose GPA is greater than 3.7.
  • 2:44 - 2:45
    So to write that
  • 2:45 - 2:47
    expression in relational algebra, we write
  • 2:47 - 2:48
    the sigma which is the
  • 2:48 - 2:50
    selection operator as a
  • 2:50 - 2:51
    subscript the condition that we're
  • 2:51 - 2:54
    filtering for--GPA greater than
  • 2:54 - 2:58
    3.7--and the relation over which we're finding that selection predicate.
  • 2:58 - 3:00
    So, this expression will return
  • 3:00 - 3:02
    a subset of the student
  • 3:02 - 3:04
    table containing those rows
  • 3:04 - 3:07
    where the GPA is greater 3.7.
  • 3:07 - 3:08
    If we want to filter for two
  • 3:08 - 3:09
    conditions, we just do
  • 3:09 - 3:12
    an "and" of the conditions in the subscript of the sigma.
  • 3:12 - 3:13
    So if we want, say, students
  • 3:13 - 3:15
    whose GPA is greater than 3.7
  • 3:15 - 3:16
    and whose high school size
  • 3:16 - 3:18
    is less than a thousand, we'll
  • 3:18 - 3:23
    write select GPA greater than 3.7.
  • 3:23 - 3:24
    We used a logical
  • 3:24 - 3:27
    and operator--a caret, high school
  • 3:27 - 3:28
    size is less than a
  • 3:28 - 3:32
    thousand, and again we'll apply that to the student relation.
  • 3:32 - 3:33
    And once again, the result of
  • 3:33 - 3:34
    that will be a subset of
  • 3:34 - 3:37
    the student relation containing the rows that satisfy the condition.
  • 3:37 - 3:39
    If we want to find
  • 3:39 - 3:41
    the applications to Stanford for
  • 3:41 - 3:42
    a CS major, then we'll be
  • 3:42 - 3:45
    applying a selection condition to the apply relation.
  • 3:45 - 3:46
    Again, we write the sigma
  • 3:46 - 3:48
    and now the subscript is
  • 3:48 - 3:49
    going to say that the college
  • 3:49 - 3:54
    name is Stanford and the major is CS.
  • 3:54 - 3:57
    Again, the and operator, and
  • 3:57 - 3:59
    that will be applied
  • 3:59 - 4:01
    to the apply relation and
  • 4:01 - 4:02
    it will return as a
  • 4:02 - 4:06
    result, a subset of the apply relation.
  • 4:06 - 4:07
    So the general case of the
  • 4:07 - 4:10
    select operator is that we have the sigma.
  • 4:10 - 4:12
    We have a condition as a
  • 4:12 - 4:14
    subscript and then we have a relation name.
  • 4:14 - 4:17
    And we return as a result the subset of the relation.
  • 4:17 - 4:21
    Our next operator is the Project Operator.
  • 4:21 - 4:23
    So the select operator picks certain
  • 4:23 - 4:26
    rows, and the project operator picks certain columns.
  • 4:26 - 4:28
    So let's say we're
  • 4:28 - 4:29
    interested in the applications, but all
  • 4:29 - 4:30
    we wanted to know was the
  • 4:30 - 4:34
    list of ID's and the decisions for those applications.
  • 4:34 - 4:35
    The project operator is written
  • 4:35 - 4:37
    using the Greek pi symbol,
  • 4:37 - 4:39
    and now the subscript is
  • 4:39 - 4:42
    a list of the column names that we would like to extract.
  • 4:42 - 4:43
    So we write ID, sorry,
  • 4:43 - 4:46
    student ID and decision, and
  • 4:46 - 4:49
    we apply that to the apply relation again.
  • 4:49 - 4:51
    And now what we'll get
  • 4:51 - 4:54
    back is a relation that has just two rows.
  • 4:54 - 4:55
    It's going to have all the
  • 4:55 - 4:56
    tuples of apply, but it's
  • 4:56 - 4:57
    only going to have the
  • 4:57 - 5:00
    student ID and the decision columns.
  • 5:00 - 5:02
    So the general case of
  • 5:02 - 5:04
    a project operator is the
  • 5:04 - 5:05
    projection, and then a
  • 5:05 - 5:07
    list of attributes, can be
  • 5:07 - 5:13
    any number, and then a relation name.
  • 5:13 - 5:14
    Now, what if we're interested in
  • 5:14 - 5:16
    picking both rows and columns at the same time.
  • 5:16 - 5:18
    So we want only some of
  • 5:18 - 5:20
    the rows, and we want only some of the columns.
  • 5:20 - 5:23
    Now we're going to compose operators.
  • 5:23 - 5:26
    Remember that relational queries produce relations .
  • 5:26 - 5:27
    So we can write a
  • 5:27 - 5:29
    query, say, with the
  • 5:29 - 5:31
    select operator of the
  • 5:31 - 5:33
    students whose GPA is greater than 3.7.
  • 5:33 - 5:35
    And this is how we do that.
  • 5:35 - 5:37
    And now, we can take
  • 5:37 - 5:39
    that whole expression which produces a
  • 5:39 - 5:40
    relation, and we can
  • 5:40 - 5:42
    apply the project operator to that, and
  • 5:42 - 5:43
    we can get out the student
  • 5:43 - 5:49
    ID and the student name.
  • 5:49 - 5:50
    Okay.
  • 5:50 - 5:51
    So, what we actually see
  • 5:51 - 5:54
    now is that the general
  • 5:54 - 5:55
    case of the selection and
  • 5:55 - 5:58
    projection operators weren't quite what I told you at first.
  • 5:58 - 5:59
    I was deceiving you slightly.
  • 5:59 - 6:01
    When we write the select
  • 6:01 - 6:03
    operator, it's a select
  • 6:03 - 6:06
    with the condition on any
  • 6:06 - 6:07
    expression of the relational
  • 6:07 - 6:08
    algebra, and if it's
  • 6:08 - 6:09
    a big one we might want
  • 6:09 - 6:11
    to put parenthesis on it,
  • 6:11 - 6:13
    and similarly the project operator is
  • 6:13 - 6:16
    a list of attributes from
  • 6:16 - 6:18
    any expression of the relational algebra.
  • 6:18 - 6:20
    And we can compose these as much as we want.
  • 6:20 - 6:22
    We can have select over project, over
  • 6:22 - 6:24
    select, select, project, and so on.
  • 6:24 - 6:27
    Now let's talk about
  • 6:27 - 6:31
    duplicate values in the results of relational algebra queries.
  • 6:31 - 6:33
    Let's suppose we ask
  • 6:33 - 6:34
    for a list of the majors that
  • 6:34 - 6:37
    people have applied for and the decision for those majors.
  • 6:37 - 6:37
    So we write that as the
  • 6:37 - 6:40
    project of the major
  • 6:40 - 6:45
    and the decision on the applied relation.
  • 6:45 - 6:46
    You might think that when we get
  • 6:46 - 6:49
    the results of this query, we're going to have a lot of duplicate values.
  • 6:49 - 6:51
    So we'll have CS yes, CS
  • 6:51 - 6:54
    yes, CS no, EE yes, EE no, and so on.
  • 6:54 - 6:55
    You can imagine in a
  • 6:55 - 6:57
    large realistic database of applications,
  • 6:57 - 6:59
    there's going to be hundreds of
  • 6:59 - 7:02
    people applying for majors and having a yes or a no decision.
  • 7:02 - 7:04
    The semantics of relational
  • 7:04 - 7:08
    algebra says that duplicates are always eliminated.
  • 7:08 - 7:09
    So if you run a query
  • 7:09 - 7:10
    that would logically have a
  • 7:10 - 7:11
    lot of duplicate values, you just
  • 7:11 - 7:14
    get one value for each result.
  • 7:14 - 7:16
    That's actually a bit of a difference with the SQL language.
  • 7:16 - 7:18
    So, SQL is based on
  • 7:18 - 7:20
    what's known as multi-sets or
  • 7:20 - 7:21
    bags and that means
  • 7:21 - 7:24
    that we don't eliminate duplicates, whereas
  • 7:24 - 7:26
    relational algebra is based
  • 7:26 - 7:29
    on sets themselves and duplicates are eliminated.
  • 7:29 - 7:30
    There is a multi-set or
  • 7:30 - 7:32
    bad relational algebra defined as
  • 7:32 - 7:33
    well but we'll be fine by
  • 7:33 - 7:38
    just considering the set relational algebra in these videos.
  • 7:38 - 7:39
    Our first operator that combines two
  • 7:39 - 7:41
    relations is the cross-product operator,
  • 7:41 - 7:43
    also known as the Cartesian product.
  • 7:43 - 7:45
    What this operator does, is it
  • 7:45 - 7:46
    takes two relations and kinda
  • 7:46 - 7:48
    glues them together so that
  • 7:48 - 7:49
    their schema of the result
  • 7:49 - 7:50
    is the union of the
  • 7:50 - 7:52
    schemas of the two relations and
  • 7:52 - 7:54
    the contents of the result are every
  • 7:54 - 7:56
    combination of tuples from those relations.
  • 7:56 - 7:57
    This is in fact the normal
  • 7:57 - 7:59
    set cross product that you
  • 7:59 - 8:01
    might have learned way back in the elementary school.
  • 8:01 - 8:03
    So let's talk about, say,
  • 8:03 - 8:06
    doing the cross products of students and apply.
  • 8:06 - 8:08
    So if we do this
  • 8:08 - 8:09
    cross products, just to
  • 8:09 - 8:11
    save drawing, I'm just gonna glue
  • 8:11 - 8:14
    these two relations together here.
  • 8:14 - 8:15
    So if we do the
  • 8:15 - 8:16
    cross product we'll get at the
  • 8:16 - 8:18
    result a big relation, here, which
  • 8:18 - 8:20
    is going to have eight attributes.
  • 8:20 - 8:21
    The eight attributes across the student
  • 8:21 - 8:24
    and apply now the only
  • 8:24 - 8:25
    small little trick is that
  • 8:25 - 8:27
    when we glue two relations together
  • 8:27 - 8:28
    sometimes they'll have the same
  • 8:28 - 8:31
    attribute and we can see we have SID on both sides.
  • 8:31 - 8:33
    So just as a notational convention,
  • 8:33 - 8:35
    when cross-product is done
  • 8:35 - 8:36
    and there's two attributes that are
  • 8:36 - 8:39
    named, they're prefaced with the name of the relation they came from.
  • 8:39 - 8:40
    So this one would be referred
  • 8:40 - 8:42
    to in the cross-product as
  • 8:42 - 8:44
    the student dot SID where this
  • 8:44 - 8:45
    one over here would be referred
  • 8:45 - 8:48
    to as the apply dot SID.
  • 8:48 - 8:49
    So, again, we glue together in
  • 8:49 - 8:51
    the Cartesian product the two
  • 8:51 - 8:53
    relations with four attributes each,
  • 8:53 - 8:55
    we get a result with eight attributes.
  • 8:55 - 8:57
    Now let's talk about the contents of these.
  • 8:57 - 9:00
    So let's suppose that the student
  • 9:00 - 9:02
    relation had s-tuples in it
  • 9:02 - 9:04
    and that's how many tuples, while
  • 9:04 - 9:05
    the apply had 8 tuples in
  • 9:05 - 9:07
    it, the result of the
  • 9:07 - 9:08
    Cartesian products is gonna
  • 9:08 - 9:10
    have S times A tuples,
  • 9:10 - 9:12
    is going to have one tuple
  • 9:12 - 9:14
    for every combination of tuples
  • 9:14 - 9:19
    from the student relation and the apply relation.
  • 9:19 - 9:20
    Now, the cross-product seems
  • 9:20 - 9:21
    like it might not be that
  • 9:21 - 9:23
    helpful, but what is
  • 9:23 - 9:24
    interesting is when we use
  • 9:24 - 9:26
    the cross-product together with other operators.
  • 9:26 - 9:28
    And let's see a big example of that.
  • 9:28 - 9:30
    Let's suppose that we want
  • 9:30 - 9:32
    to get the names and GPAs of
  • 9:32 - 9:33
    students with a high school
  • 9:33 - 9:35
    size greater than a thousand who
  • 9:35 - 9:37
    applied to CS and were rejected.
  • 9:37 - 9:40
    Okay, so let's take a look.
  • 9:40 - 9:41
    We're going to have to access
  • 9:41 - 9:43
    the students and the apply
  • 9:43 - 9:45
    records in order to run this query.
  • 9:45 - 9:46
    So what we'll do is we'll
  • 9:46 - 9:49
    take student cross apply as our starting point.
  • 9:49 - 9:51
    So now we have
  • 9:51 - 9:53
    a big relation that contains
  • 9:53 - 9:57
    eight attributes and all of those tuples that we described previously.
  • 9:57 - 9:58
    But now we're going to
  • 9:58 - 10:00
    start making things more interesting, because
  • 10:00 - 10:01
    what we're going to do is
  • 10:01 - 10:03
    a big selection over this relation.
  • 10:03 - 10:04
    And that selection is first
  • 10:04 - 10:06
    of all going to make sure
  • 10:06 - 10:08
    that it only combines student and
  • 10:08 - 10:11
    apply tuples that are referring to the same student.
  • 10:11 - 10:12
    So to do that, we write
  • 10:12 - 10:15
    student dot SID equals
  • 10:15 - 10:17
    apply dot SID.
  • 10:17 - 10:18
    So now we've filtered the result
  • 10:18 - 10:20
    of that cross-product to only
  • 10:20 - 10:22
    include combinations of student
  • 10:22 - 10:24
    and apply by couples that make sets.
  • 10:24 - 10:26
    Now we have to do a little bit of additional filtering.
  • 10:26 - 10:28
    We said that we want the
  • 10:28 - 10:29
    high school size to be
  • 10:29 - 10:31
    greater than a thousand, so
  • 10:31 - 10:33
    we do an "and" operator in the high school.
  • 10:33 - 10:35
    We want them to have applied
  • 10:35 - 10:37
    to CS so that's and major equals CS.
  • 10:37 - 10:40
    We're getting a nice big query here.
  • 10:40 - 10:41
    And finally we want them to
  • 10:41 - 10:42
    have been rejected, so "and
  • 10:42 - 10:47
    decision" equals, we'll just be using R for reject.
  • 10:47 - 10:50
    So now, we've got that gigantic query.
  • 10:50 - 10:51
    But that gets us exactly what
  • 10:51 - 10:53
    we want except for one more thing,
  • 10:53 - 10:56
    which is, as I said, all we want is their names and GPAs.
  • 10:56 - 10:57
    So finally we take a
  • 10:57 - 10:59
    big parentheses around here and
  • 10:59 - 11:00
    we apply to that the
  • 11:00 - 11:02
    projection operator, getting the
  • 11:02 - 11:06
    student name and the GPA.
  • 11:06 - 11:07
    And that is the relational
  • 11:07 - 11:09
    algebra expression that produces
  • 11:09 - 11:12
    the query that we have written in English.
  • 11:12 - 11:13
    Now we have seen how the
  • 11:13 - 11:14
    cross product allows us to
  • 11:14 - 11:16
    combine tuples and then
  • 11:16 - 11:18
    apply selection conditions to
  • 11:18 - 11:21
    get meaningful combinations of tuples.
  • 11:21 - 11:22
    It turns out that relational algebra
  • 11:22 - 11:24
    includes an operator called
  • 11:24 - 11:25
    the natural join that is
  • 11:25 - 11:27
    used pretty much for the exact purpose.
  • 11:27 - 11:29
    What the natural join does is
  • 11:29 - 11:31
    it performs a cross-product
  • 11:31 - 11:33
    but then it enforces equality
  • 11:33 - 11:35
    on all of the attributes with the same name.
  • 11:35 - 11:36
    So if we set up our
  • 11:36 - 11:38
    schema properly, for example,
  • 11:38 - 11:40
    we have student ID and student
  • 11:40 - 11:42
    ID here, meaning the same
  • 11:42 - 11:43
    thing, and when the cross
  • 11:43 - 11:45
    product is created, it's only
  • 11:45 - 11:48
    going to combine tuples where the student ID is the same.
  • 11:48 - 11:49
    And furthermore, if we add
  • 11:49 - 11:51
    college in, we can
  • 11:51 - 11:54
    see that we have the college name here and the college name here.
  • 11:54 - 11:56
    If we combine college and apply
  • 11:56 - 11:59
    tuples, we'll only combine tuples that are talking about the same college.
  • 11:59 - 12:01
    Now in addition, one more
  • 12:01 - 12:02
    thing that it does is it
  • 12:02 - 12:06
    gets rid of these pesky attributes that have the same names.
  • 12:06 - 12:07
    So since when we combine,
  • 12:07 - 12:09
    for example, student and apply
  • 12:09 - 12:11
    with the natural join, we're only
  • 12:11 - 12:13
    combining tuples where the
  • 12:13 - 12:16
    student SID is the same as the apply SID.
  • 12:16 - 12:17
    Then we don't need to keep two
  • 12:17 - 12:19
    copies of that
  • 12:19 - 12:23
    column because the values are always going to be equal.
  • 12:23 - 12:25
    So the natural join operator
  • 12:25 - 12:28
    is written using a bow
  • 12:28 - 12:30
    tie, that's just the convention.
  • 12:30 - 12:34
    You will find that in your text editing programs if you look carefully.
  • 12:34 - 12:37
    So let's do some examples now.
  • 12:37 - 12:38
    Let's go back to our same
  • 12:38 - 12:39
    query where we were finding
  • 12:39 - 12:42
    the names and GPAs of students
  • 12:42 - 12:45
    from large high schools who applied to CS and were rejected.
  • 12:45 - 12:46
    So now, instead of using
  • 12:46 - 12:47
    the cross-product we're gonna
  • 12:47 - 12:51
    use the natural join, which, as I said, was written with a bow tie.
  • 12:51 - 12:53
    What that allows us to
  • 12:53 - 12:54
    do, once we do that natural
  • 12:54 - 12:56
    join, is we don't have
  • 12:56 - 12:57
    to write that condition, that enforced
  • 12:57 - 12:59
    equality on those two
  • 12:59 - 13:01
    attributes, because it's going to do it itself.
  • 13:01 - 13:03
    And once we have done that then
  • 13:03 - 13:04
    all we need to do is
  • 13:04 - 13:05
    apply the rest of our conditions,
  • 13:05 - 13:06
    which were that the high school
  • 13:06 - 13:08
    is greater than a thousand and the
  • 13:08 - 13:11
    major is CS and the decision
  • 13:11 - 13:14
    is reject, again we'll
  • 13:14 - 13:16
    call that R. And then,
  • 13:16 - 13:17
    since we're only getting the names
  • 13:17 - 13:19
    and GPAs, we write the
  • 13:19 - 13:24
    student name and the GPA.
  • 13:24 - 13:24
    Okay.
  • 13:24 - 13:26
    And that's the result of the query using a natural join.
  • 13:26 - 13:27
    So, as you can see that's a
  • 13:27 - 13:29
    little bit simpler than the original
  • 13:29 - 13:30
    with the cross-product and by setting
  • 13:30 - 13:33
    up schemas correctly, natural join can be very useful.
  • 13:33 - 13:37
    Now let's add one more complication to our query.
  • 13:37 - 13:38
    Let's suppose that we're only interested
  • 13:38 - 13:41
    in applications to colleges where the enrollment is greater than 20,000.
  • 13:41 - 13:43
    So, so far in
  • 13:43 - 13:44
    our expression we refer to the
  • 13:44 - 13:46
    student relation and the apply
  • 13:46 - 13:48
    relation, but we haven't used the college relation.
  • 13:48 - 13:50
    But if we want to have a
  • 13:50 - 13:51
    filter on enrollment, we're going to have
  • 13:51 - 13:55
    to bring the college relation into the picture.
  • 13:55 - 13:57
    This turns out to perhaps be easier than you think.
  • 13:57 - 13:59
    Let's just erase a couple
  • 13:59 - 14:01
    of our parentheses here, and what
  • 14:01 - 14:02
    we're going to do is we're
  • 14:02 - 14:04
    going to join in the
  • 14:04 - 14:07
    college relation, with the two relations we have already.
  • 14:07 - 14:11
    Now, technically, the natural
  • 14:11 - 14:13
    join is the binary operator, people
  • 14:13 - 14:15
    often use it without parentheses
  • 14:15 - 14:16
    because it's associative, but if
  • 14:16 - 14:20
    we get pedantic about it we could add that and then we're in good shape.
  • 14:20 - 14:22
    Now we've joined all three relations together.
  • 14:22 - 14:24
    And remember, automatically the natural
  • 14:24 - 14:27
    join enforces equality on the shared attributes.
  • 14:27 - 14:29
    Very specifically, the college
  • 14:29 - 14:30
    name here is going to
  • 14:30 - 14:33
    be set equal to the apply college name as well.
  • 14:33 - 14:36
    Now once we've done that, we've got all the information we need.
  • 14:36 - 14:37
    We just need to add one
  • 14:37 - 14:39
    more filtering condition, which is
  • 14:39 - 14:42
    that the college enrollment is greater than 20,000.
  • 14:42 - 14:47
    And with that, we've solved our query.
  • 14:47 - 14:48
    So to summarize the
  • 14:48 - 14:52
    natural join, the natural join combines relations.
  • 14:52 - 14:54
    It automatically sets values equal
  • 14:54 - 14:55
    when attribute names are the
  • 14:55 - 14:58
    same and then it removes the duplicate columns.
  • 14:58 - 15:01
    The natural join actually does not
  • 15:01 - 15:04
    add any expressive power to relational algebra.
  • 15:04 - 15:08
    We can rewrite the natural join without it using the cross-product.
  • 15:08 - 15:10
    So let me just show that rewrite here.
  • 15:10 - 15:12
    If we have, and now I'm
  • 15:12 - 15:14
    going to use the general case of two expressions.
  • 15:14 - 15:17
    One expression, natural join
  • 15:17 - 15:19
    with another expression, that is
  • 15:19 - 15:21
    actually equivalent to doing
  • 15:21 - 15:23
    a projection on the
  • 15:23 - 15:25
    schema of the first
  • 15:25 - 15:27
    expression - I'll just
  • 15:27 - 15:29
    call it E1 now - union
  • 15:29 - 15:30
    the schema of the second expression.
  • 15:30 - 15:32
    That's a real union, so that
  • 15:32 - 15:33
    means if we have two copies we
  • 15:33 - 15:36
    just keep one of them. Over
  • 15:36 - 15:39
    the selection of. Now we're
  • 15:39 - 15:40
    going to set all the shared attributes
  • 15:40 - 15:42
    of the first expression to
  • 15:42 - 15:44
    be equal to the shared attributes of the second.
  • 15:44 - 15:45
    So I'll just write E1, A1
  • 15:45 - 15:48
    equals E2, A1
  • 15:48 - 15:54
    and E1, A2 equals E2 dot A2.
  • 15:54 - 15:57
    Now these are the cases where,
  • 15:57 - 16:00
    again, the attributes have the same names, and so on.
  • 16:00 - 16:02
    So we're setting all those equal,
  • 16:02 - 16:04
    and that is applied over
  • 16:04 - 16:07
    expression one cross-product expression two.
  • 16:07 - 16:10
    So again, the natural
  • 16:10 - 16:12
    join is not giving us
  • 16:12 - 16:18
    additional expressive power, but it is very convenient notationally.
  • 16:18 - 16:20
    The last operator that I'm
  • 16:20 - 16:23
    going to cover in this video is the theta join operator.
  • 16:23 - 16:24
    Like natural join, theta join is
  • 16:24 - 16:28
    actually an abbreviation that doesn't add expressive power to the language.
  • 16:28 - 16:28
    Let me just write it.
  • 16:28 - 16:30
    The theta join operator takes
  • 16:30 - 16:32
    two expressions and combines them
  • 16:32 - 16:34
    with the bow tie looking
  • 16:34 - 16:37
    operator, but with a subscript theta.
  • 16:37 - 16:39
    That theta is a condition.
  • 16:39 - 16:40
    It's a condition in the style
  • 16:40 - 16:43
    of the condition in the selection operator.
  • 16:43 - 16:45
    And what this actually says
  • 16:45 - 16:47
    - it's pretty simple -
  • 16:47 - 16:49
    is it's equivalent to applying
  • 16:49 - 16:51
    the theta condition to the
  • 16:51 - 16:53
    cross-product of the two expressions.
  • 16:53 - 16:55
    So you might wonder why
  • 16:55 - 16:56
    I even mention the theta
  • 16:56 - 16:57
    join operator, and the reason I
  • 16:57 - 16:59
    mention it is that most
  • 16:59 - 17:01
    database management systems implement the
  • 17:01 - 17:03
    theta join as their basic
  • 17:03 - 17:05
    operation for combining relations.
  • 17:05 - 17:07
    So the basic operation is
  • 17:07 - 17:08
    take two relations, combine all tuples,
  • 17:08 - 17:10
    but then only keep the combinations
  • 17:10 - 17:12
    that pass the theta condition.
  • 17:12 - 17:13
    Often when you talk to
  • 17:13 - 17:15
    people who build database systems or
  • 17:15 - 17:16
    use databases, when they
  • 17:16 - 17:21
    use the word join, they really mean the theta join.
  • 17:21 - 17:25
    So, in conclusion, relational algebra is a formal language.
  • 17:25 - 17:27
    It operates on sets of
  • 17:27 - 17:29
    relations and produces relations as a result.
  • 17:29 - 17:30
    The simplest query is just
  • 17:30 - 17:32
    the name of a relation and
  • 17:32 - 17:33
    then operators are used
  • 17:33 - 17:36
    to filter relations, slice them, and combine them.
  • 17:36 - 17:37
    So far, we've learned the
  • 17:37 - 17:40
    select operator for selecting rows;
  • 17:40 - 17:41
    the project operator for selecting
  • 17:41 - 17:43
    columns; the cross-product
  • 17:43 - 17:45
    operator for combining every possible
  • 17:45 - 17:46
    pair of tuples from two
  • 17:46 - 17:48
    relations; and then two
  • 17:48 - 17:50
    abbreviations, the natural join,
  • 17:50 - 17:51
    which a very useful way to combine
  • 17:51 - 17:53
    relations by enforcing a equality
  • 17:53 - 17:56
    on certain columns; and the theta join operator.
  • 17:56 - 17:58
    In the next video, we'll learn
  • 17:58 - 18:00
    some additional operators of relational
  • 18:00 - 18:01
    algebra and also some alternative
  • 18:01 -
    notations for relational algebra expressions.
Title:
05-01-relational-algebra-1.mp4
Video Language:
English
Duration:
18:05
Amara Bot added a translation

English subtitles

Revisions