WEBVTT 00:00:01.050 --> 00:00:02.890 In this video, we'll learn about the Relational Model. 00:00:03.850 --> 00:00:05.340 The Relational Model is more than 00:00:05.570 --> 00:00:07.490 35 years old, and it's 00:00:07.690 --> 00:00:09.370 really the foundation of database management systems. 00:00:09.920 --> 00:00:12.010 It's spawned a many billion dollar industry. 00:00:13.610 --> 00:00:15.420 The relational model underlies all 00:00:15.700 --> 00:00:17.800 commercial database systems at this point in time. 00:00:18.550 --> 00:00:21.360 It's actually an extremely simple model and that's one of its benefits. 00:00:22.410 --> 00:00:23.740 Furthermore, it can be queried. 00:00:24.500 --> 00:00:25.320 By that I mean we can ask 00:00:25.560 --> 00:00:26.920 questions of databases in the 00:00:26.980 --> 00:00:28.860 model using High Level Languages. 00:00:29.990 --> 00:00:31.470 High Level Languages are simple, yet 00:00:31.840 --> 00:00:34.280 extremely expressive for asking questions over the database. 00:00:35.380 --> 00:00:36.900 And finally, very importantly there 00:00:37.450 --> 00:00:39.800 are extremely efficient implementations of 00:00:39.970 --> 00:00:41.700 the relational model and of 00:00:41.890 --> 00:00:43.290 the query languages on that model. 00:00:43.760 --> 00:00:45.000 So let's move ahead and 00:00:45.130 --> 00:00:47.440 understand the basic constructs in the relational model. 00:00:48.660 --> 00:00:51.440 So, the primary construct is in fact, the relation. 00:00:51.980 --> 00:00:53.390 A database consists of a 00:00:53.430 --> 00:00:54.930 set of relations or sometimes 00:00:55.390 --> 00:00:57.530 referred to as "tables", each of which has a name. 00:00:58.400 --> 00:01:00.620 So, we're gonna use two relations in our example. 00:01:01.990 --> 00:01:03.130 Our example is gonna be a fictitious database 00:01:03.580 --> 00:01:05.100 about students applying to colleges. 00:01:05.480 --> 00:01:07.470 For now we're just gonna look at the students and colleges themselves. 00:01:08.500 --> 00:01:09.460 So we're gonna have two tables, 00:01:09.970 --> 00:01:11.180 and let's call those tables 00:01:11.850 --> 00:01:14.230 the Student table and the College table. 00:01:15.340 --> 00:01:16.420 Now, as an aside, there's 00:01:16.620 --> 00:01:17.830 a healthy debate in the 00:01:18.230 --> 00:01:19.820 database world about whether tables 00:01:20.310 --> 00:01:22.620 relations ought to be named using the singular or the plural. 00:01:23.310 --> 00:01:24.980 I personally don't have a stake in that debate. 00:01:25.200 --> 00:01:26.050 I'm going to use the singular. 00:01:27.980 --> 00:01:29.750 Next, we have the concept of attributes. 00:01:30.340 --> 00:01:31.790 So every relation and relational 00:01:32.270 --> 00:01:33.870 database has a predefined set 00:01:34.390 --> 00:01:37.110 of columns or attributes each of which has a name. 00:01:37.760 --> 00:01:39.110 So, for our student table, 00:01:39.790 --> 00:01:40.720 let's say that each student 00:01:41.120 --> 00:01:42.010 is gonna have an ID, a 00:01:42.840 --> 00:01:46.310 name, a GPA and a photo. 00:01:46.990 --> 00:01:48.120 And for our college 00:01:48.710 --> 00:01:49.830 table, let's say that every 00:01:50.030 --> 00:01:50.880 college is going to have 00:01:51.000 --> 00:01:54.090 a name, a state, and an enrollment. 00:01:55.170 --> 00:01:56.440 We'll just abbreviate that ENR. 00:01:57.660 --> 00:01:58.530 So those are the labeled columns. 00:01:59.810 --> 00:02:01.010 Now the actual data itself 00:02:01.500 --> 00:02:02.320 is stored in what are 00:02:02.430 --> 00:02:04.740 called the tuples (or the rows) in the tables. 00:02:05.690 --> 00:02:06.820 So let's put a couple 00:02:06.990 --> 00:02:09.180 of the data tables, data tuples in our tables. 00:02:09.300 --> 00:02:11.530 So let's start with the 00:02:12.140 --> 00:02:13.630 students and lets say that 00:02:14.120 --> 00:02:15.430 our first student has ID 123. 00:02:15.600 --> 00:02:18.050 Name is Amy - 00:02:18.480 --> 00:02:21.790 GPA 3.9, and she's happy with that. 00:02:21.900 --> 00:02:22.600 So she has a smiley photo. 00:02:23.720 --> 00:02:25.290 And our second student is 234, 00:02:26.130 --> 00:02:28.320 Bob - his GPA 00:02:29.680 --> 00:02:29.880 is 3.4. 00:02:30.080 --> 00:02:30.850 He's not quite as happy. 00:02:32.050 --> 00:02:32.970 And typically of course a 00:02:33.130 --> 00:02:35.090 table will have thousands, maybe 00:02:35.360 --> 00:02:37.070 millions, even sometimes billions of 00:02:37.210 --> 00:02:40.390 rows, each row containing a value for each attribute. 00:02:41.310 --> 00:02:42.620 In our college table, let's suppose 00:02:43.110 --> 00:02:44.030 we have .... well of course 00:02:44.180 --> 00:02:46.630 we're going to start with Stanford in 00:02:46.770 --> 00:02:49.720 the state of California and Stanford's enrollment is 15,000. 00:02:52.310 --> 00:02:53.870 We'll include our cross-bay rival Berkeley 00:02:54.770 --> 00:02:55.520 again, in the state of California. 00:02:56.250 --> 00:02:58.110 Berkeley's enrollment is a whopping 36,000. 00:02:58.660 --> 00:03:00.570 And last of all, 00:03:01.270 --> 00:03:03.810 we are going to not be West Coast biased. 00:03:04.170 --> 00:03:05.500 We'll include MIT in the 00:03:05.560 --> 00:03:07.560 state of Massachusetts with an enrollment of 10,000. 00:03:08.330 --> 00:03:09.530 Now of course there's 00:03:09.670 --> 00:03:11.040 gonna be again many more 00:03:11.100 --> 00:03:12.270 tuples in the college table, and 00:03:12.400 --> 00:03:13.800 many more tuples in the student table. 00:03:15.050 --> 00:03:16.640 Okay, next, let me 00:03:16.870 --> 00:03:18.220 mention that in a relational 00:03:18.710 --> 00:03:19.900 database, typically each attribute or 00:03:19.930 --> 00:03:21.210 column has a type 00:03:21.910 --> 00:03:23.040 sometimes referred to as a domain. 00:03:23.940 --> 00:03:25.040 For example, the ID might 00:03:25.190 --> 00:03:26.350 be an integer, the name 00:03:26.600 --> 00:03:27.920 might be a string, GPA might 00:03:28.180 --> 00:03:29.600 be a float, photo might 00:03:29.930 --> 00:03:31.670 be a jpeg file. 00:03:32.930 --> 00:03:34.160 We do also in most relational 00:03:34.580 --> 00:03:36.530 databases have a concept of enumerated domain. 00:03:37.010 --> 00:03:38.430 So for example, the state might 00:03:38.720 --> 00:03:40.660 be an enumerated domain for 00:03:40.990 --> 00:03:42.550 the 50 abbreviations for states. 00:03:43.510 --> 00:03:45.020 Now, it's typical for relational 00:03:45.040 --> 00:03:46.480 databases to have just 00:03:46.670 --> 00:03:47.920 atomic types in their 00:03:48.380 --> 00:03:50.180 attributes as we have 00:03:50.400 --> 00:03:52.070 here, but many database 00:03:52.440 --> 00:03:53.740 systems do also support structured 00:03:54.280 --> 00:03:55.460 types inside attributes. 00:03:58.550 --> 00:03:59.390 Okay, a little bit more terminology. 00:04:00.300 --> 00:04:01.340 The schema of a 00:04:01.380 --> 00:04:04.110 database is the structure of the relation. 00:04:04.590 --> 00:04:07.010 So the schema includes the name 00:04:07.430 --> 00:04:09.000 of the relation and the attributes 00:04:09.290 --> 00:04:11.380 of the relation and the types of those attributes. 00:04:12.140 --> 00:04:13.310 Where the instance is the 00:04:13.410 --> 00:04:15.990 actual contents of the table at a given point in time. 00:04:16.350 --> 00:04:17.240 So, typically you set up 00:04:17.360 --> 00:04:19.020 a schema in advance, then the 00:04:19.490 --> 00:04:21.810 instances of the data will change over time. 00:04:23.920 --> 00:04:25.860 Now, I mentioned that most columns have types. 00:04:26.660 --> 00:04:28.000 But, there's also a special value 00:04:28.480 --> 00:04:29.680 that's in any type of 00:04:29.760 --> 00:04:31.240 any column and that's a 00:04:31.350 --> 00:04:32.590 special value known as null, and nulls 00:04:32.730 --> 00:04:34.340 are actually quite important in relational databases. 00:04:35.390 --> 00:04:36.760 Null values are used to 00:04:36.910 --> 00:04:38.270 denote that a particular 00:04:38.980 --> 00:04:41.280 value is maybe unknown or undefined. 00:04:42.650 --> 00:04:44.220 And, so let's suppose, let's add 00:04:44.580 --> 00:04:46.930 another tuple to our database. 00:04:47.630 --> 00:04:49.680 Let's say 345, another student 00:04:50.120 --> 00:04:51.670 named Craig, and for, 00:04:51.810 --> 00:04:53.220 whatever reason Craig doesn't have a GPA. 00:04:53.530 --> 00:04:54.630 Maybe Craig is home schooled, 00:04:55.120 --> 00:04:56.810 maybe Craig doesn't want to reveal his GPA. 00:04:57.540 --> 00:04:58.740 So then the database would contain 00:04:59.210 --> 00:05:00.530 a null value for Craig 00:05:01.620 --> 00:05:03.310 and we'll just put a neutral face there. 00:05:04.430 --> 00:05:06.930 Or, for example, maybe Bob doesn't 00:05:07.220 --> 00:05:08.250 want to have his photo in 00:05:08.340 --> 00:05:09.730 the database, so then Bob 00:05:10.560 --> 00:05:12.100 would have a null value 00:05:12.810 --> 00:05:14.890 for his photo, again nulls can go anywhere. 00:05:15.690 --> 00:05:16.750 Now null values are useful 00:05:17.160 --> 00:05:17.850 but one has to be very 00:05:18.290 --> 00:05:19.570 careful in a database system 00:05:19.920 --> 00:05:21.020 when you run queries 00:05:21.660 --> 00:05:23.200 over relations that have null values. 00:05:23.650 --> 00:05:24.710 In a later video we'll go 00:05:24.820 --> 00:05:25.910 into this in more detail, but I 00:05:25.990 --> 00:05:26.690 just wanted to give a just 00:05:26.940 --> 00:05:28.260 sort of example of what can happen. 00:05:29.130 --> 00:05:30.290 So, let's suppose we're 00:05:30.360 --> 00:05:31.610 asking a query over our 00:05:31.800 --> 00:05:32.900 student table of all students 00:05:33.410 --> 00:05:35.060 whose GPA is greater than 3.5. 00:05:36.040 --> 00:05:37.330 So when we run 00:05:37.460 --> 00:05:38.380 that query on our database 00:05:38.930 --> 00:05:40.020 obviously we'll get Amy out, 00:05:40.720 --> 00:05:43.070 obviously we won't get Bob out, but should we get Craig? 00:05:43.490 --> 00:05:44.500 The answer is No. 00:05:44.940 --> 00:05:45.840 We don't know for a fact 00:05:46.070 --> 00:05:47.140 that Craig's GPA is greater 00:05:47.400 --> 00:05:48.840 than 3.5, so we'll only get 00:05:48.980 --> 00:05:50.580 one student out from that query. 00:05:50.950 --> 00:05:52.640 Now let's suppose we 00:05:52.740 --> 00:05:53.750 had another query, where we 00:05:53.830 --> 00:05:54.900 were gonna ask for the GPA 00:05:55.760 --> 00:05:56.940 less than or equal to 3.5. 00:05:57.590 --> 00:05:59.360 So, similarly where we 00:05:59.660 --> 00:06:00.640 would not have Amy in result 00:06:01.090 --> 00:06:02.210 and we would certainly have Bob in 00:06:02.330 --> 00:06:04.080 the result and similarly would 00:06:04.230 --> 00:06:05.230 not have Craig in the 00:06:05.400 --> 00:06:06.380 result because we don't know that 00:06:06.490 --> 00:06:08.160 his GPA is less than or equal to 3.5. 00:06:08.720 --> 00:06:10.070 So far so good, but 00:06:10.400 --> 00:06:11.630 it gets a little weird is 00:06:11.760 --> 00:06:12.830 when we add an or here 00:06:13.050 --> 00:06:14.240 in our query, we say I 00:06:14.640 --> 00:06:15.700 want everyone who's GPA is greater 00:06:16.000 --> 00:06:18.700 than 3.5 or who's GPA is less than or equal to 3.5. 00:06:18.970 --> 00:06:20.660 And even though it 00:06:20.750 --> 00:06:22.120 looks like every tuple should 00:06:22.500 --> 00:06:23.790 satisfy this condition, that it's 00:06:23.920 --> 00:06:26.320 always true, that's not the case when we have null values. 00:06:27.070 --> 00:06:28.030 So, that's why one has to 00:06:28.130 --> 00:06:29.160 be careful when one uses 00:06:29.460 --> 00:06:31.110 null values in relational databases. 00:06:32.360 --> 00:06:33.830 Let me erase this now and 00:06:33.970 --> 00:06:35.110 let's move on to our next concept 00:06:35.740 --> 00:06:37.060 which is the concept of Key. 00:06:38.210 --> 00:06:41.160 Key is again another important concept in relational databases. 00:06:42.480 --> 00:06:43.590 And, a key is an 00:06:43.990 --> 00:06:45.070 attribute in of a relation 00:06:45.410 --> 00:06:48.600 where every value for that attribute is unique. 00:06:49.540 --> 00:06:50.330 So if we look at the 00:06:50.410 --> 00:06:51.960 student relation, we can feel 00:06:52.150 --> 00:06:54.560 pretty confident that the ID is going to be a key. 00:06:54.960 --> 00:06:56.120 In other words, every tuple is 00:06:56.250 --> 00:06:58.130 going to have a unique for ID. 00:06:59.640 --> 00:07:02.220 Thinking about the college relation, it's a little less clear. 00:07:02.610 --> 00:07:03.640 We might be tempted to say 00:07:03.730 --> 00:07:04.830 that the name of the college is 00:07:04.960 --> 00:07:06.330 an ID, that actually college 00:07:06.720 --> 00:07:09.410 names probably are not unique across the country. 00:07:10.000 --> 00:07:10.780 There's probably a lot of or 00:07:10.880 --> 00:07:13.130 several colleges named Washington college for example. 00:07:14.090 --> 00:07:15.060 You know what, we're allowed to 00:07:15.270 --> 00:07:16.720 have sets of attributes that 00:07:16.920 --> 00:07:18.830 are unique and that makes sense in the college relation. 00:07:19.780 --> 00:07:21.600 Most likely the combination of 00:07:21.670 --> 00:07:22.850 the name and state of 00:07:22.930 --> 00:07:24.200 a college is unique, and that's 00:07:24.490 --> 00:07:26.670 what we would identify as the key for the college relation. 00:07:27.720 --> 00:07:28.820 Now, you might wonder why it's 00:07:28.980 --> 00:07:32.170 even important to have attributes that are identified as keys. 00:07:32.570 --> 00:07:34.040 There's actually several uses for them. 00:07:34.700 --> 00:07:37.110 One of them is just to identify specific tuples. 00:07:37.550 --> 00:07:38.530 So if you want to run 00:07:38.970 --> 00:07:39.750 a query to get a specific 00:07:40.360 --> 00:07:41.900 tuple out of the database you 00:07:42.050 --> 00:07:44.650 would do that by asking for that tuple by its key. 00:07:45.050 --> 00:07:46.640 And related to that 00:07:46.870 --> 00:07:48.010 database systems for efficiency 00:07:48.770 --> 00:07:50.090 tend to build special index 00:07:50.560 --> 00:07:52.810 structures or store the database in a particular way. 00:07:53.080 --> 00:07:55.850 So it's very fast to find a tuple based on its key. 00:07:56.810 --> 00:07:58.250 And lastly, if one relation 00:07:58.900 --> 00:08:00.200 in a relational database wants 00:08:00.460 --> 00:08:01.470 to refer to tuples of another, 00:08:02.110 --> 00:08:04.360 there 's no concept of pointer in relational databases. 00:08:05.400 --> 00:08:06.800 Therefore, the first relation 00:08:07.260 --> 00:08:08.470 will typically refer to a 00:08:08.530 --> 00:08:10.690 tuple in the second relation by its unique key. 00:08:11.090 --> 00:08:13.460 As our videos develop we'll see the importance of keys. 00:08:15.100 --> 00:08:16.310 Okay, just to wrap up, 00:08:16.960 --> 00:08:18.520 I'll mention how one creates relations 00:08:19.100 --> 00:08:20.990 or tables in the SQL language. 00:08:21.440 --> 00:08:22.580 It's very simple, you just 00:08:22.890 --> 00:08:24.420 say "create table," give the 00:08:24.500 --> 00:08:26.200 name of the relation and a list of the attributes. 00:08:27.470 --> 00:08:28.910 And if you want to give types for the attributes. 00:08:29.760 --> 00:08:30.920 It's similar except you follow 00:08:31.190 --> 00:08:32.910 each attribute name with its type. 00:08:33.970 --> 00:08:35.100 So to wrap up, the relational 00:08:35.630 --> 00:08:37.080 model has been around a long time. 00:08:37.440 --> 00:08:39.090 Has started a huge industry. 00:08:39.350 --> 00:08:40.560 It's used by all database systems. 00:08:41.490 --> 00:08:42.330 As you've seen it's a very 00:08:42.490 --> 00:08:43.520 simple model and will shortly 00:08:43.950 --> 00:08:46.010 see that it can be queried with very nice languages. 00:08:46.650 --> 00:08:48.450 And, finally, it's been implemented very efficiently.