Return to Video

06-02-basic-select-statement.mp4

  • 0:00 - 0:01
    This is the first of seven
  • 0:01 - 0:04
    videos where we're going to learn the SQL language.
  • 0:04 - 0:05
    The videos are largely going to
  • 0:05 - 0:07
    be live demos of SQL queries
  • 0:07 - 0:09
    and updates running on an actual database.
  • 0:09 - 0:10
    The first video is going
  • 0:10 - 0:13
    to focus on the basics of the SELECT statement.
  • 0:13 - 0:15
    As a reminder, the SELECT statement
  • 0:15 - 0:16
    selects a set of attributes from
  • 0:16 - 0:20
    a set of relations satisfying a particular condition.
  • 0:20 - 0:21
    We will see in the demo that
  • 0:21 - 0:23
    even with the these three clauses,
  • 0:23 - 0:26
    we can write quite powerful queries.
  • 0:26 - 0:27
    All of the seven demos are
  • 0:27 - 0:29
    going to be using the simple
  • 0:29 - 0:31
    college admissions database that we
  • 0:31 - 0:33
    learned about in the relational algebra videos.
  • 0:33 - 0:36
    As a reminder, we have three relations.
  • 0:36 - 0:38
    We have the college relation: college
  • 0:38 - 0:39
    relation contains information about the
  • 0:39 - 0:43
    name of the colleges, the state, and the enrollment of those colleges.
  • 0:43 - 0:44
    We have the student relation, which
  • 0:44 - 0:46
    contains student IDs, their names,
  • 0:46 - 0:49
    their GPA, and the size of the high school that they come from.
  • 0:49 - 0:51
    And finally, the application information, that
  • 0:51 - 0:53
    tells us that a particular student
  • 0:53 - 0:55
    applied to a particular college
  • 0:55 - 0:56
    for a particular major and there
  • 0:56 - 0:58
    was a decision of that application
  • 0:58 - 1:00
    Now as a reminder, in the
  • 1:00 - 1:01
    relational model, when we underline
  • 1:01 - 1:04
    attributes, that means we're designating a key for the relation.
  • 1:04 - 1:06
    So, the underlying attributes in
  • 1:06 - 1:07
    our example say that the
  • 1:07 - 1:10
    knowledge name is going to be unique within the college relation.
  • 1:10 - 1:11
    The student's idea is unique within
  • 1:11 - 1:13
    the student relation and in
  • 1:13 - 1:17
    the applied relation, the combination of these three attributes is unique.
  • 1:17 - 1:18
    That means that student can, if
  • 1:18 - 1:20
    he or she wishes, apply to a
  • 1:20 - 1:21
    college many times, or apply
  • 1:21 - 1:22
    for a major many times, but
  • 1:22 - 1:24
    can only apply to a
  • 1:24 - 1:26
    college for a particular major once.
  • 1:26 - 1:28
    Let's turn to the demo.
  • 1:28 - 1:32
    Let's start by looking at the actual data that we're going to be querying over.
  • 1:32 - 1:33
    We have a set of four
  • 1:33 - 1:36
    colleges: Stanford,
    Berkeley, MIT and Cornell.
  • 1:36 - 1:39
    We have a bunch of students.
  • 1:39 - 1:40
    And a reminder, each student has an
  • 1:40 - 1:43
    ID, a name, a GPA, and a size of high school.
  • 1:43 - 1:45
    And finally, we have a set
  • 1:45 - 1:47
    of application records where a
  • 1:47 - 1:48
    student with a particular ID
  • 1:48 - 1:50
    applies to a college for a
  • 1:50 - 1:51
    particular major, and there's a
  • 1:51 - 1:53
    yes or no decision on that application.
  • 1:53 - 1:56
    So let's go to our first SQL query.
  • 1:56 - 1:57
    This query is going to find
  • 1:57 - 1:59
    the ID, name, and GPA of
  • 1:59 - 2:01
    students whose GPA is greater than 3.6.
  • 2:01 - 2:03
    So, very simple, it's the
  • 2:03 - 2:05
    basic SELECT FROM WHERE structure.
  • 2:05 - 2:06
    The SELECT gives our table name,
  • 2:06 - 2:08
    the WHERE gives our filtering condition
  • 2:08 - 2:11
    and the SELECT tells us what we want to get out of the query.
  • 2:11 - 2:13
    We'll execute that query and
  • 2:13 - 2:14
    we will find here all of
  • 2:14 - 2:17
    our students with a GPA greater than 3.6.
  • 2:17 - 2:19
    Now, it's not necessary
  • 2:19 - 2:20
    to include the GPA in
  • 2:20 - 2:24
    the result to the query even if we filter on the GPA.
  • 2:24 - 2:25
    So, I could just take GPA away
  • 2:25 - 2:27
    from the SELECT clause, run the
  • 2:27 - 2:28
    query again and now, we
  • 2:28 - 2:32
    see the same result but without the GPA.
  • 2:32 - 2:33
    Okay.
  • 2:33 - 2:35
    Let's go to our second query.
  • 2:35 - 2:39
    Our second query is going to combine two relations.
  • 2:39 - 2:40
    In this query, we're going to
  • 2:40 - 2:41
    find the names of the students
  • 2:41 - 2:44
    and the majors for which they've applied.
  • 2:44 - 2:45
    So, now, we're involving both the
  • 2:45 - 2:46
    student table and the
  • 2:46 - 2:48
    apply table and the
  • 2:48 - 2:49
    condition we see here is the
  • 2:49 - 2:51
    join condition that tells us
  • 2:51 - 2:52
    we want to combine students with
  • 2:52 - 2:55
    apply records that have the same student ID.
  • 2:55 - 2:57
    This is what would happen automatically in
  • 2:57 - 2:58
    a natural join of the
  • 2:58 - 3:00
    relational algebra, but in SQL
  • 3:00 - 3:00
    we need to always write the
  • 3:00 - 3:03
    join condition explicitly, and finally
  • 3:03 - 3:05
    we get the student name and the major.
  • 3:05 - 3:06
    And if we execute the
  • 3:06 - 3:08
    query, we get, expectedly, a
  • 3:08 - 3:11
    bunch of students and the majors that they've applied for.
  • 3:11 - 3:14
    Now, we do notice here that we have several duplicate values.
  • 3:14 - 3:15
    We have two copies of Amy
  • 3:15 - 3:19
    applying to CS and two copies of Craig applying to Bio-Engineering.
  • 3:19 - 3:20
    As we discussed in the relational
  • 3:20 - 3:22
    algebra video, in relational algebra
  • 3:22 - 3:24
    which underlies SQL, it's by
  • 3:24 - 3:26
    default the set model; we don't have duplicates.
  • 3:26 - 3:28
    But in the SQL language we
  • 3:28 - 3:31
    do have duplicates, it's based on a multi-set model.
  • 3:31 - 3:32
    If we don't like the duplicates in
  • 3:32 - 3:35
    our results SQL provides us a convenient way to get rid of them.
  • 3:35 - 3:37
    We simply add the keyword,
  • 3:37 - 3:39
    "distinct", to our query after
  • 3:39 - 3:41
    the word, "select", we execute, and
  • 3:41 - 3:42
    now we get the same result
  • 3:42 - 3:47
    but with the duplicate values eliminated.
  • 3:47 - 3:49
    Our next query is going
  • 3:49 - 3:50
    to be a little more complicated; it's
  • 3:50 - 3:52
    going to find the names
  • 3:52 - 3:54
    and GPAs of students whose
  • 3:54 - 3:55
    size high school is less
  • 3:55 - 3:56
    than a thousand, they've applied to
  • 3:56 - 3:57
    CS at Stanford, and we're going
  • 3:57 - 3:59
    to get the decision associated with that.
  • 3:59 - 4:01
    So again we have two
  • 4:01 - 4:04
    relations, two tables involved, the student and the apply.
  • 4:04 - 4:06
    We have the join condition, making
  • 4:06 - 4:07
    sure we're talking about the same
  • 4:07 - 4:10
    student and the student and apply tuples.
  • 4:10 - 4:11
    Very important to remember that one.
  • 4:11 - 4:13
    We are going to filter the result
  • 4:13 - 4:15
    based on size high school, major,
  • 4:15 - 4:18
    and the college to which they're applying.
  • 4:18 - 4:20
    So let's run this query and
  • 4:20 - 4:21
    we will see the result that
  • 4:21 - 4:22
    we have two students who
  • 4:22 - 4:26
    have applied to CS at Stanford from a small high school.
  • 4:26 - 4:30
    Our next query is again a join of two relations.
  • 4:30 - 4:32
    This time we're going to find all large
  • 4:32 - 4:35
    campuses that have someone applying to that campus in CS.
  • 4:35 - 4:38
    So this time we're going to join the college table and the apply table.
  • 4:38 - 4:39
    And again, we need to
  • 4:39 - 4:40
    be careful to make sure we
  • 4:40 - 4:42
    only join tuples that are talking about the same college.
  • 4:42 - 4:43
    So we have college.cname
  • 4:43 - 4:46
    equals apply.cname.
  • 4:46 - 4:47
    We have an enrollment that's greater
  • 4:47 - 4:49
    than 20,000 and a major that equals CS.
  • 4:49 - 4:51
    Let's run this query.
  • 4:51 - 4:53
    Oops, we got an error!
  • 4:53 - 4:54
    Well, actually I knew that was
  • 4:54 - 4:57
    coming, but I wanted to show you what happens here.
  • 4:57 - 4:58
    So the error is that we
  • 4:58 - 4:59
    have an ambiguous column name,
  • 4:59 - 5:02
    and that's the one right here, the C name.
  • 5:02 - 5:03
    So I haven't pointed it
  • 5:03 - 5:06
    out explicitly, but whenever I've
  • 5:06 - 5:06
    referred to attributes where there's
  • 5:06 - 5:08
    an attribute from both of
  • 5:08 - 5:10
    the relations we're querying, I prefaced
  • 5:10 - 5:11
    it with the name of
  • 5:11 - 5:14
    the relation that we cared about, the college here in the apply.
  • 5:14 - 5:16
    So the attribute name here
  • 5:16 - 5:17
    in the select clause is actually
  • 5:17 - 5:19
    ambiguous because there's a
  • 5:19 - 5:22
    C name attribute in college and there's one there in apply.
  • 5:22 - 5:23
    Now we happen to set those equal,
  • 5:23 - 5:25
    but in order for the query to
  • 5:25 - 5:26
    actually run we have to choose
  • 5:26 - 5:27
    So let's just say we're
  • 5:27 - 5:30
    going to take that C name from college.
  • 5:30 - 5:33
    Now, everything should be fine, and here we go.
  • 5:33 - 5:34
    So those are the colleges where we
  • 5:34 - 5:36
    have at least one
  • 5:36 - 5:38
    CS major and their enrollment is greater than 20,000.
  • 5:38 - 5:41
    Again, we see duplicates
  • 5:41 - 5:42
    so if we don't like
  • 5:42 - 5:43
    the two copies of Berkeley, we
  • 5:43 - 5:46
    simply add distinct and we run the query again.
  • 5:46 - 5:48
    And now we have Berkeley and Cornell.
  • 5:48 - 5:52
    Now, let's do a query with a bigger result.
  • 5:52 - 5:55
    This time we're finally going to join all three of our relations.
  • 5:55 - 5:56
    Student, college and apply.
  • 5:56 - 5:57
    And we're going to apply the
  • 5:57 - 5:59
    joint conditions that ensure that
  • 5:59 - 6:03
    we're talking about the same student and the same college.
  • 6:03 - 6:04
    And then from the result
  • 6:04 - 6:06
    of that big cross-product, that
  • 6:06 - 6:07
    big join, we're going to
  • 6:07 - 6:08
    get the student ID, their name,
  • 6:08 - 6:10
    their GPA, the college that
  • 6:10 - 6:12
    they're applying to and the enrollment of that college.
  • 6:12 - 6:13
    So just a whole bunch of
  • 6:13 - 6:17
    information associated with this students' applications.
  • 6:17 - 6:19
    And we execute this and here
  • 6:19 - 6:21
    we get the result with all the attributes that we asked for.
  • 6:21 - 6:23
    Now, one thing I haven't
  • 6:23 - 6:24
    mentioned yet is the order
  • 6:24 - 6:27
    of the results that we get when we run SQL queries.
  • 6:27 - 6:30
    SO SQL is, at its heart, an unordered model.
  • 6:30 - 6:31
    That means that we can get
  • 6:31 - 6:32
    the results of our queries in
  • 6:32 - 6:34
    any order, and in fact,
  • 6:34 - 6:35
    we could run a query today
  • 6:35 - 6:37
    and get our results in a particular order.
  • 6:37 - 6:39
    And then run the query tomorrow and get a different order.
  • 6:39 - 6:41
    And that's permitted with the
  • 6:41 - 6:44
    specification of SQL on relational databases.
  • 6:44 - 6:46
    If we care about the order
  • 6:46 - 6:48
    of our result SQL provides a
  • 6:48 - 6:49
    clause that we can ask for a
  • 6:49 - 6:50
    result to be sorted by
  • 6:50 - 6:55
    a particular attribute or set of attributes. So
  • 6:55 - 6:56
    let's say we want our application information here
  • 6:56 - 6:59
    sorted by descending GPA.
  • 6:59 - 7:02
    Then we add another clause called the order by clause.
  • 7:02 - 7:04
    We tell the attribute we'd like
  • 7:04 - 7:05
    to be ordering by and then
  • 7:05 - 7:08
    if we want it to be descending we write DESC.
  • 7:08 - 7:12
    The default behavior is actually ascending.
  • 7:12 - 7:13
    So if we run this query
  • 7:13 - 7:15
    now we get our results by
  • 7:15 - 7:19
    descending the GPA we
  • 7:19 - 7:21
    see all the 3.9's, 3.8, 3.7, and so forth.
  • 7:21 - 7:22
    Now we might still want
  • 7:22 - 7:23
    to further sort within all the
  • 7:23 - 7:25
    3.9s if we want
  • 7:25 - 7:27
    to do that we can specify another
  • 7:27 - 7:29
    attribute to sort each group by.
  • 7:29 - 7:30
    So, for example, if we
  • 7:30 - 7:31
    decide from that we
  • 7:31 - 7:35
    want to sort by enrollment
  • 7:35 - 7:36
    and ascending, we won't put
  • 7:36 - 7:38
    anything because ascending is the default.
  • 7:38 - 7:39
    And we execute.
  • 7:39 - 7:41
    Now we still have GPA
  • 7:41 - 7:43
    as descending as our primary
  • 7:43 - 7:44
    sort order and then within each
  • 7:44 - 7:47
    of those will be sorting by ascending enrollment.
  • 7:47 - 7:51
    This query introduces the like predicate.
  • 7:51 - 7:53
    Like is a built-in operator
  • 7:53 - 7:54
    in SQL that allows us
  • 7:54 - 7:57
    to do simple string matching on attribute values.
  • 7:57 - 7:58
    Let's suppose, for example, that we
  • 7:58 - 7:59
    wanted to find all students
  • 7:59 - 8:02
    who were applying for a major that had to do with bio.
  • 8:02 - 8:03
    Instead of listing all the
  • 8:03 - 8:05
    biology majors we can
  • 8:05 - 8:06
    simply pattern match the major
  • 8:06 - 8:08
    against the special string here
  • 8:08 - 8:10
    which says, match any major
  • 8:10 - 8:12
    where there's some set of characters,
  • 8:12 - 8:14
    followed by bio, followed by
  • 8:14 - 8:16
    some set of characters we execute
  • 8:16 - 8:18
    the query, and we'll find the
  • 8:18 - 8:19
    students who have applied for various
  • 8:19 - 8:21
    bio type majors.
  • 8:21 - 8:23
    Now, I want to introduce another construct.
  • 8:23 - 8:25
    I'm going to use the same query to
  • 8:25 - 8:28
    do it, which is the construct select star.
  • 8:28 - 8:30
    So far, we've always listed
  • 8:30 - 8:31
    explicitly the attributes that we
  • 8:31 - 8:33
    want to get in the result of a query.
  • 8:33 - 8:34
    But if we simply want to get
  • 8:34 - 8:38
    all attributes, then we can just write select star.
  • 8:38 - 8:39
    And when we do that, we
  • 8:39 - 8:41
    don't project away any attributes,
  • 8:41 - 8:45
    but we get all the attributes in the result of the from and where expression.
  • 8:45 - 8:48
    While we're at it, let's do a gigantic query.
  • 8:48 - 8:49
    We'll just do the cross-product
  • 8:49 - 8:51
    and student college without any
  • 8:51 - 8:52
    combination, and we'll do
  • 8:52 - 8:54
    select star to get all the attributes out.
  • 8:54 - 8:56
    So, here goes, and you can
  • 8:56 - 8:57
    see, we get all the attributes
  • 8:57 - 9:00
    and we get a whole lot of tuples as well.
  • 9:00 - 9:01
    Our last query is going to
  • 9:01 - 9:03
    demonstrate the ability to use
  • 9:03 - 9:05
    arithmetic within SQL clauses.
  • 9:05 - 9:07
    So we see here a query
  • 9:07 - 9:08
    that selects all the information
  • 9:08 - 9:10
    from the student relation but adds
  • 9:10 - 9:13
    to it a scaled GPA where
  • 9:13 - 9:14
    we're going to boost the student's
  • 9:14 - 9:15
    GPA if they're from a big
  • 9:15 - 9:18
    high school and reduce it if they're from a small one.
  • 9:18 - 9:20
    Specifically, we'll take their GPA, multiply
  • 9:20 - 9:21
    it by the size high school divided by a thousand.
  • 9:21 - 9:23
    So, let's run this
  • 9:23 - 9:24
    query and you can see
  • 9:24 - 9:25
    that we have the whole student table
  • 9:25 - 9:28
    here with an additional column that
  • 9:28 - 9:31
    has scaled their GPA based on the size of their high school.
  • 9:31 - 9:33
    Now, if we don't like the
  • 9:33 - 9:34
    label on this column, we
  • 9:34 - 9:36
    could change it and so
  • 9:36 - 9:37
    I'll use this query as an
  • 9:37 - 9:38
    example to demonstrate the 'as'
  • 9:38 - 9:40
    feature which allows us
  • 9:40 - 9:43
    to change the labeling of the schema in a query result.
  • 9:43 - 9:45
    Let's say as scaled GPA,
  • 9:45 - 9:46
    and we should get the same
  • 9:46 - 9:49
    result with a more nicely labeled attribute.
  • 9:49 - 9:52
    That concludes our video introducing the basic select statement.
  • 9:52 - 9:53
    We'll see many other features in
  • 9:53 -
    the upcoming six videos on SQL.
Title:
06-02-basic-select-statement.mp4
Video Language:
English
Duration:
09:57
Amara Bot edited English subtitles for 06-02-basic-select-statement.mp4
Amara Bot added a translation

English subtitles

Revisions