0:00:01.050,0:00:02.890 In this video, we'll learn about the Relational Model. 0:00:03.850,0:00:05.340 The Relational Model is more than 0:00:05.570,0:00:07.490 35 years old, and it's 0:00:07.690,0:00:09.370 really the foundation of database management systems. 0:00:09.920,0:00:12.010 It's spawned a many billion dollar industry. 0:00:13.610,0:00:15.420 The relational model underlies all 0:00:15.700,0:00:17.800 commercial database systems at this point in time. 0:00:18.550,0:00:21.360 It's actually an extremely simple model and that's one of its benefits. 0:00:22.410,0:00:23.740 Furthermore, it can be queried. 0:00:24.500,0:00:25.320 By that I mean we can ask 0:00:25.560,0:00:26.920 questions of databases in the 0:00:26.980,0:00:28.860 model using High Level Languages. 0:00:29.990,0:00:31.470 High Level Languages are simple, yet 0:00:31.840,0:00:34.280 extremely expressive for asking questions over the database. 0:00:35.380,0:00:36.900 And finally, very importantly there 0:00:37.450,0:00:39.800 are extremely efficient implementations of 0:00:39.970,0:00:41.700 the relational model and of 0:00:41.890,0:00:43.290 the query languages on that model. 0:00:43.760,0:00:45.000 So let's move ahead and 0:00:45.130,0:00:47.440 understand the basic constructs in the relational model. 0:00:48.660,0:00:51.440 So, the primary construct is in fact, the relation. 0:00:51.980,0:00:53.390 A database consists of a 0:00:53.430,0:00:54.930 set of relations or sometimes 0:00:55.390,0:00:57.530 referred to as "tables", each of which has a name. 0:00:58.400,0:01:00.620 So, we're gonna use two relations in our example. 0:01:01.990,0:01:03.130 Our example is gonna be a fictitious database 0:01:03.580,0:01:05.100 about students applying to colleges. 0:01:05.480,0:01:07.470 For now we're just gonna look at the students and colleges themselves. 0:01:08.500,0:01:09.460 So we're gonna have two tables, 0:01:09.970,0:01:11.180 and let's call those tables 0:01:11.850,0:01:14.230 the Student table and the College table. 0:01:15.340,0:01:16.420 Now, as an aside, there's 0:01:16.620,0:01:17.830 a healthy debate in the 0:01:18.230,0:01:19.820 database world about whether tables 0:01:20.310,0:01:22.620 relations ought to be named using the singular or the plural. 0:01:23.310,0:01:24.980 I personally don't have a stake in that debate. 0:01:25.200,0:01:26.050 I'm going to use the singular. 0:01:27.980,0:01:29.750 Next, we have the concept of attributes. 0:01:30.340,0:01:31.790 So every relation and relational 0:01:32.270,0:01:33.870 database has a predefined set 0:01:34.390,0:01:37.110 of columns or attributes each of which has a name. 0:01:37.760,0:01:39.110 So, for our student table, 0:01:39.790,0:01:40.720 let's say that each student 0:01:41.120,0:01:42.010 is gonna have an ID, a 0:01:42.840,0:01:46.310 name, a GPA and a photo. 0:01:46.990,0:01:48.120 And for our college 0:01:48.710,0:01:49.830 table, let's say that every 0:01:50.030,0:01:50.880 college is going to have 0:01:51.000,0:01:54.090 a name, a state, and an enrollment. 0:01:55.170,0:01:56.440 We'll just abbreviate that ENR. 0:01:57.660,0:01:58.530 So those are the labeled columns. 0:01:59.810,0:02:01.010 Now the actual data itself 0:02:01.500,0:02:02.320 is stored in what are 0:02:02.430,0:02:04.740 called the tuples (or the rows) in the tables. 0:02:05.690,0:02:06.820 So let's put a couple 0:02:06.990,0:02:09.180 of the data tables, data tuples in our tables. 0:02:09.300,0:02:11.530 So let's start with the 0:02:12.140,0:02:13.630 students and lets say that 0:02:14.120,0:02:15.430 our first student has ID 123. 0:02:15.600,0:02:18.050 Name is Amy - 0:02:18.480,0:02:21.790 GPA 3.9, and she's happy with that. 0:02:21.900,0:02:22.600 So she has a smiley photo. 0:02:23.720,0:02:25.290 And our second student is 234, 0:02:26.130,0:02:28.320 Bob - his GPA 0:02:29.680,0:02:29.880 is 3.4. 0:02:30.080,0:02:30.850 He's not quite as happy. 0:02:32.050,0:02:32.970 And typically of course a 0:02:33.130,0:02:35.090 table will have thousands, maybe 0:02:35.360,0:02:37.070 millions, even sometimes billions of 0:02:37.210,0:02:40.390 rows, each row containing a value for each attribute. 0:02:41.310,0:02:42.620 In our college table, let's suppose 0:02:43.110,0:02:44.030 we have .... well of course 0:02:44.180,0:02:46.630 we're going to start with Stanford in 0:02:46.770,0:02:49.720 the state of California and Stanford's enrollment is 15,000. 0:02:52.310,0:02:53.870 We'll include our cross-bay rival Berkeley 0:02:54.770,0:02:55.520 again, in the state of California. 0:02:56.250,0:02:58.110 Berkeley's enrollment is a whopping 36,000. 0:02:58.660,0:03:00.570 And last of all, 0:03:01.270,0:03:03.810 we are going to not be West Coast biased. 0:03:04.170,0:03:05.500 We'll include MIT in the 0:03:05.560,0:03:07.560 state of Massachusetts with an enrollment of 10,000. 0:03:08.330,0:03:09.530 Now of course there's 0:03:09.670,0:03:11.040 gonna be again many more 0:03:11.100,0:03:12.270 tuples in the college table, and 0:03:12.400,0:03:13.800 many more tuples in the student table. 0:03:15.050,0:03:16.640 Okay, next, let me 0:03:16.870,0:03:18.220 mention that in a relational 0:03:18.710,0:03:19.900 database, typically each attribute or 0:03:19.930,0:03:21.210 column has a type 0:03:21.910,0:03:23.040 sometimes referred to as a domain. 0:03:23.940,0:03:25.040 For example, the ID might 0:03:25.190,0:03:26.350 be an integer, the name 0:03:26.600,0:03:27.920 might be a string, GPA might 0:03:28.180,0:03:29.600 be a float, photo might 0:03:29.930,0:03:31.670 be a jpeg file. 0:03:32.930,0:03:34.160 We do also in most relational 0:03:34.580,0:03:36.530 databases have a concept of enumerated domain. 0:03:37.010,0:03:38.430 So for example, the state might 0:03:38.720,0:03:40.660 be an enumerated domain for 0:03:40.990,0:03:42.550 the 50 abbreviations for states. 0:03:43.510,0:03:45.020 Now, it's typical for relational 0:03:45.040,0:03:46.480 databases to have just 0:03:46.670,0:03:47.920 atomic types in their 0:03:48.380,0:03:50.180 attributes as we have 0:03:50.400,0:03:52.070 here, but many database 0:03:52.440,0:03:53.740 systems do also support structured 0:03:54.280,0:03:55.460 types inside attributes. 0:03:58.550,0:03:59.390 Okay, a little bit more terminology. 0:04:00.300,0:04:01.340 The schema of a 0:04:01.380,0:04:04.110 database is the structure of the relation. 0:04:04.590,0:04:07.010 So the schema includes the name 0:04:07.430,0:04:09.000 of the relation and the attributes 0:04:09.290,0:04:11.380 of the relation and the types of those attributes. 0:04:12.140,0:04:13.310 Where the instance is the 0:04:13.410,0:04:15.990 actual contents of the table at a given point in time. 0:04:16.350,0:04:17.240 So, typically you set up 0:04:17.360,0:04:19.020 a schema in advance, then the 0:04:19.490,0:04:21.810 instances of the data will change over time. 0:04:23.920,0:04:25.860 Now, I mentioned that most columns have types. 0:04:26.660,0:04:28.000 But, there's also a special value 0:04:28.480,0:04:29.680 that's in any type of 0:04:29.760,0:04:31.240 any column and that's a 0:04:31.350,0:04:32.590 special value known as null, and nulls 0:04:32.730,0:04:34.340 are actually quite important in relational databases. 0:04:35.390,0:04:36.760 Null values are used to 0:04:36.910,0:04:38.270 denote that a particular 0:04:38.980,0:04:41.280 value is maybe unknown or undefined. 0:04:42.650,0:04:44.220 And, so let's suppose, let's add 0:04:44.580,0:04:46.930 another tuple to our database. 0:04:47.630,0:04:49.680 Let's say 345, another student 0:04:50.120,0:04:51.670 named Craig, and for, 0:04:51.810,0:04:53.220 whatever reason Craig doesn't have a GPA. 0:04:53.530,0:04:54.630 Maybe Craig is home schooled, 0:04:55.120,0:04:56.810 maybe Craig doesn't want to reveal his GPA. 0:04:57.540,0:04:58.740 So then the database would contain 0:04:59.210,0:05:00.530 a null value for Craig 0:05:01.620,0:05:03.310 and we'll just put a neutral face there. 0:05:04.430,0:05:06.930 Or, for example, maybe Bob doesn't 0:05:07.220,0:05:08.250 want to have his photo in 0:05:08.340,0:05:09.730 the database, so then Bob 0:05:10.560,0:05:12.100 would have a null value 0:05:12.810,0:05:14.890 for his photo, again nulls can go anywhere. 0:05:15.690,0:05:16.750 Now null values are useful 0:05:17.160,0:05:17.850 but one has to be very 0:05:18.290,0:05:19.570 careful in a database system 0:05:19.920,0:05:21.020 when you run queries 0:05:21.660,0:05:23.200 over relations that have null values. 0:05:23.650,0:05:24.710 In a later video we'll go 0:05:24.820,0:05:25.910 into this in more detail, but I 0:05:25.990,0:05:26.690 just wanted to give a just 0:05:26.940,0:05:28.260 sort of example of what can happen. 0:05:29.130,0:05:30.290 So, let's suppose we're 0:05:30.360,0:05:31.610 asking a query over our 0:05:31.800,0:05:32.900 student table of all students 0:05:33.410,0:05:35.060 whose GPA is greater than 3.5. 0:05:36.040,0:05:37.330 So when we run 0:05:37.460,0:05:38.380 that query on our database 0:05:38.930,0:05:40.020 obviously we'll get Amy out, 0:05:40.720,0:05:43.070 obviously we won't get Bob out, but should we get Craig? 0:05:43.490,0:05:44.500 The answer is No. 0:05:44.940,0:05:45.840 We don't know for a fact 0:05:46.070,0:05:47.140 that Craig's GPA is greater 0:05:47.400,0:05:48.840 than 3.5, so we'll only get 0:05:48.980,0:05:50.580 one student out from that query. 0:05:50.950,0:05:52.640 Now let's suppose we 0:05:52.740,0:05:53.750 had another query, where we 0:05:53.830,0:05:54.900 were gonna ask for the GPA 0:05:55.760,0:05:56.940 less than or equal to 3.5. 0:05:57.590,0:05:59.360 So, similarly where we 0:05:59.660,0:06:00.640 would not have Amy in result 0:06:01.090,0:06:02.210 and we would certainly have Bob in 0:06:02.330,0:06:04.080 the result and similarly would 0:06:04.230,0:06:05.230 not have Craig in the 0:06:05.400,0:06:06.380 result because we don't know that 0:06:06.490,0:06:08.160 his GPA is less than or equal to 3.5. 0:06:08.720,0:06:10.070 So far so good, but 0:06:10.400,0:06:11.630 it gets a little weird is 0:06:11.760,0:06:12.830 when we add an or here 0:06:13.050,0:06:14.240 in our query, we say I 0:06:14.640,0:06:15.700 want everyone who's GPA is greater 0:06:16.000,0:06:18.700 than 3.5 or who's GPA is less than or equal to 3.5. 0:06:18.970,0:06:20.660 And even though it 0:06:20.750,0:06:22.120 looks like every tuple should 0:06:22.500,0:06:23.790 satisfy this condition, that it's 0:06:23.920,0:06:26.320 always true, that's not the case when we have null values. 0:06:27.070,0:06:28.030 So, that's why one has to 0:06:28.130,0:06:29.160 be careful when one uses 0:06:29.460,0:06:31.110 null values in relational databases. 0:06:32.360,0:06:33.830 Let me erase this now and 0:06:33.970,0:06:35.110 let's move on to our next concept 0:06:35.740,0:06:37.060 which is the concept of Key. 0:06:38.210,0:06:41.160 Key is again another important concept in relational databases. 0:06:42.480,0:06:43.590 And, a key is an 0:06:43.990,0:06:45.070 attribute in of a relation 0:06:45.410,0:06:48.600 where every value for that attribute is unique. 0:06:49.540,0:06:50.330 So if we look at the 0:06:50.410,0:06:51.960 student relation, we can feel 0:06:52.150,0:06:54.560 pretty confident that the ID is going to be a key. 0:06:54.960,0:06:56.120 In other words, every tuple is 0:06:56.250,0:06:58.130 going to have a unique for ID. 0:06:59.640,0:07:02.220 Thinking about the college relation, it's a little less clear. 0:07:02.610,0:07:03.640 We might be tempted to say 0:07:03.730,0:07:04.830 that the name of the college is 0:07:04.960,0:07:06.330 an ID, that actually college 0:07:06.720,0:07:09.410 names probably are not unique across the country. 0:07:10.000,0:07:10.780 There's probably a lot of or 0:07:10.880,0:07:13.130 several colleges named Washington college for example. 0:07:14.090,0:07:15.060 You know what, we're allowed to 0:07:15.270,0:07:16.720 have sets of attributes that 0:07:16.920,0:07:18.830 are unique and that makes sense in the college relation. 0:07:19.780,0:07:21.600 Most likely the combination of 0:07:21.670,0:07:22.850 the name and state of 0:07:22.930,0:07:24.200 a college is unique, and that's 0:07:24.490,0:07:26.670 what we would identify as the key for the college relation. 0:07:27.720,0:07:28.820 Now, you might wonder why it's 0:07:28.980,0:07:32.170 even important to have attributes that are identified as keys. 0:07:32.570,0:07:34.040 There's actually several uses for them. 0:07:34.700,0:07:37.110 One of them is just to identify specific tuples. 0:07:37.550,0:07:38.530 So if you want to run 0:07:38.970,0:07:39.750 a query to get a specific 0:07:40.360,0:07:41.900 tuple out of the database you 0:07:42.050,0:07:44.650 would do that by asking for that tuple by its key. 0:07:45.050,0:07:46.640 And related to that 0:07:46.870,0:07:48.010 database systems for efficiency 0:07:48.770,0:07:50.090 tend to build special index 0:07:50.560,0:07:52.810 structures or store the database in a particular way. 0:07:53.080,0:07:55.850 So it's very fast to find a tuple based on its key. 0:07:56.810,0:07:58.250 And lastly, if one relation 0:07:58.900,0:08:00.200 in a relational database wants 0:08:00.460,0:08:01.470 to refer to tuples of another, 0:08:02.110,0:08:04.360 there 's no concept of pointer in relational databases. 0:08:05.400,0:08:06.800 Therefore, the first relation 0:08:07.260,0:08:08.470 will typically refer to a 0:08:08.530,0:08:10.690 tuple in the second relation by its unique key. 0:08:11.090,0:08:13.460 As our videos develop we'll see the importance of keys. 0:08:15.100,0:08:16.310 Okay, just to wrap up, 0:08:16.960,0:08:18.520 I'll mention how one creates relations 0:08:19.100,0:08:20.990 or tables in the SQL language. 0:08:21.440,0:08:22.580 It's very simple, you just 0:08:22.890,0:08:24.420 say "create table," give the 0:08:24.500,0:08:26.200 name of the relation and a list of the attributes. 0:08:27.470,0:08:28.910 And if you want to give types for the attributes. 0:08:29.760,0:08:30.920 It's similar except you follow 0:08:31.190,0:08:32.910 each attribute name with its type. 0:08:33.970,0:08:35.100 So to wrap up, the relational 0:08:35.630,0:08:37.080 model has been around a long time. 0:08:37.440,0:08:39.090 Has started a huge industry. 0:08:39.350,0:08:40.560 It's used by all database systems. 0:08:41.490,0:08:42.330 As you've seen it's a very 0:08:42.490,0:08:43.520 simple model and will shortly 0:08:43.950,0:08:46.010 see that it can be queried with very nice languages. 0:08:46.650,0:08:48.450 And, finally, it's been implemented very efficiently.