1 00:00:01,050 --> 00:00:02,890 In this video, we'll learn about the Relational Model. 2 00:00:03,850 --> 00:00:05,340 The Relational Model is more than 3 00:00:05,570 --> 00:00:07,490 35 years old, and it's 4 00:00:07,690 --> 00:00:09,370 really the foundation of database management systems. 5 00:00:09,920 --> 00:00:12,010 It's spawned a many billion dollar industry. 6 00:00:13,610 --> 00:00:15,420 The relational model underlies all 7 00:00:15,700 --> 00:00:17,800 commercial database systems at this point in time. 8 00:00:18,550 --> 00:00:21,360 It's actually an extremely simple model and that's one of its benefits. 9 00:00:22,410 --> 00:00:23,740 Furthermore, it can be queried. 10 00:00:24,500 --> 00:00:25,320 By that I mean we can ask 11 00:00:25,560 --> 00:00:26,920 questions of databases in the 12 00:00:26,980 --> 00:00:28,860 model using High Level Languages. 13 00:00:29,990 --> 00:00:31,470 High Level Languages are simple, yet 14 00:00:31,840 --> 00:00:34,280 extremely expressive for asking questions over the database. 15 00:00:35,380 --> 00:00:36,900 And finally, very importantly there 16 00:00:37,450 --> 00:00:39,800 are extremely efficient implementations of 17 00:00:39,970 --> 00:00:41,700 the relational model and of 18 00:00:41,890 --> 00:00:43,290 the query languages on that model. 19 00:00:43,760 --> 00:00:45,000 So let's move ahead and 20 00:00:45,130 --> 00:00:47,440 understand the basic constructs in the relational model. 21 00:00:48,660 --> 00:00:51,440 So, the primary construct is in fact, the relation. 22 00:00:51,980 --> 00:00:53,390 A database consists of a 23 00:00:53,430 --> 00:00:54,930 set of relations or sometimes 24 00:00:55,390 --> 00:00:57,530 referred to as "tables", each of which has a name. 25 00:00:58,400 --> 00:01:00,620 So, we're gonna use two relations in our example. 26 00:01:01,990 --> 00:01:03,130 Our example is gonna be a fictitious database 27 00:01:03,580 --> 00:01:05,100 about students applying to colleges. 28 00:01:05,480 --> 00:01:07,470 For now we're just gonna look at the students and colleges themselves. 29 00:01:08,500 --> 00:01:09,460 So we're gonna have two tables, 30 00:01:09,970 --> 00:01:11,180 and let's call those tables 31 00:01:11,850 --> 00:01:14,230 the Student table and the College table. 32 00:01:15,340 --> 00:01:16,420 Now, as an aside, there's 33 00:01:16,620 --> 00:01:17,830 a healthy debate in the 34 00:01:18,230 --> 00:01:19,820 database world about whether tables 35 00:01:20,310 --> 00:01:22,620 relations ought to be named using the singular or the plural. 36 00:01:23,310 --> 00:01:24,980 I personally don't have a stake in that debate. 37 00:01:25,200 --> 00:01:26,050 I'm going to use the singular. 38 00:01:27,980 --> 00:01:29,750 Next, we have the concept of attributes. 39 00:01:30,340 --> 00:01:31,790 So every relation and relational 40 00:01:32,270 --> 00:01:33,870 database has a predefined set 41 00:01:34,390 --> 00:01:37,110 of columns or attributes each of which has a name. 42 00:01:37,760 --> 00:01:39,110 So, for our student table, 43 00:01:39,790 --> 00:01:40,720 let's say that each student 44 00:01:41,120 --> 00:01:42,010 is gonna have an ID, a 45 00:01:42,840 --> 00:01:46,310 name, a GPA and a photo. 46 00:01:46,990 --> 00:01:48,120 And for our college 47 00:01:48,710 --> 00:01:49,830 table, let's say that every 48 00:01:50,030 --> 00:01:50,880 college is going to have 49 00:01:51,000 --> 00:01:54,090 a name, a state, and an enrollment. 50 00:01:55,170 --> 00:01:56,440 We'll just abbreviate that ENR. 51 00:01:57,660 --> 00:01:58,530 So those are the labeled columns. 52 00:01:59,810 --> 00:02:01,010 Now the actual data itself 53 00:02:01,500 --> 00:02:02,320 is stored in what are 54 00:02:02,430 --> 00:02:04,740 called the tuples (or the rows) in the tables. 55 00:02:05,690 --> 00:02:06,820 So let's put a couple 56 00:02:06,990 --> 00:02:09,180 of the data tables, data tuples in our tables. 57 00:02:09,300 --> 00:02:11,530 So let's start with the 58 00:02:12,140 --> 00:02:13,630 students and lets say that 59 00:02:14,120 --> 00:02:15,430 our first student has ID 123. 60 00:02:15,600 --> 00:02:18,050 Name is Amy - 61 00:02:18,480 --> 00:02:21,790 GPA 3.9, and she's happy with that. 62 00:02:21,900 --> 00:02:22,600 So she has a smiley photo. 63 00:02:23,720 --> 00:02:25,290 And our second student is 234, 64 00:02:26,130 --> 00:02:28,320 Bob - his GPA 65 00:02:29,680 --> 00:02:29,880 is 3.4. 66 00:02:30,080 --> 00:02:30,850 He's not quite as happy. 67 00:02:32,050 --> 00:02:32,970 And typically of course a 68 00:02:33,130 --> 00:02:35,090 table will have thousands, maybe 69 00:02:35,360 --> 00:02:37,070 millions, even sometimes billions of 70 00:02:37,210 --> 00:02:40,390 rows, each row containing a value for each attribute. 71 00:02:41,310 --> 00:02:42,620 In our college table, let's suppose 72 00:02:43,110 --> 00:02:44,030 we have .... well of course 73 00:02:44,180 --> 00:02:46,630 we're going to start with Stanford in 74 00:02:46,770 --> 00:02:49,720 the state of California and Stanford's enrollment is 15,000. 75 00:02:52,310 --> 00:02:53,870 We'll include our cross-bay rival Berkeley 76 00:02:54,770 --> 00:02:55,520 again, in the state of California. 77 00:02:56,250 --> 00:02:58,110 Berkeley's enrollment is a whopping 36,000. 78 00:02:58,660 --> 00:03:00,570 And last of all, 79 00:03:01,270 --> 00:03:03,810 we are going to not be West Coast biased. 80 00:03:04,170 --> 00:03:05,500 We'll include MIT in the 81 00:03:05,560 --> 00:03:07,560 state of Massachusetts with an enrollment of 10,000. 82 00:03:08,330 --> 00:03:09,530 Now of course there's 83 00:03:09,670 --> 00:03:11,040 gonna be again many more 84 00:03:11,100 --> 00:03:12,270 tuples in the college table, and 85 00:03:12,400 --> 00:03:13,800 many more tuples in the student table. 86 00:03:15,050 --> 00:03:16,640 Okay, next, let me 87 00:03:16,870 --> 00:03:18,220 mention that in a relational 88 00:03:18,710 --> 00:03:19,900 database, typically each attribute or 89 00:03:19,930 --> 00:03:21,210 column has a type 90 00:03:21,910 --> 00:03:23,040 sometimes referred to as a domain. 91 00:03:23,940 --> 00:03:25,040 For example, the ID might 92 00:03:25,190 --> 00:03:26,350 be an integer, the name 93 00:03:26,600 --> 00:03:27,920 might be a string, GPA might 94 00:03:28,180 --> 00:03:29,600 be a float, photo might 95 00:03:29,930 --> 00:03:31,670 be a jpeg file. 96 00:03:32,930 --> 00:03:34,160 We do also in most relational 97 00:03:34,580 --> 00:03:36,530 databases have a concept of enumerated domain. 98 00:03:37,010 --> 00:03:38,430 So for example, the state might 99 00:03:38,720 --> 00:03:40,660 be an enumerated domain for 100 00:03:40,990 --> 00:03:42,550 the 50 abbreviations for states. 101 00:03:43,510 --> 00:03:45,020 Now, it's typical for relational 102 00:03:45,040 --> 00:03:46,480 databases to have just 103 00:03:46,670 --> 00:03:47,920 atomic types in their 104 00:03:48,380 --> 00:03:50,180 attributes as we have 105 00:03:50,400 --> 00:03:52,070 here, but many database 106 00:03:52,440 --> 00:03:53,740 systems do also support structured 107 00:03:54,280 --> 00:03:55,460 types inside attributes. 108 00:03:58,550 --> 00:03:59,390 Okay, a little bit more terminology. 109 00:04:00,300 --> 00:04:01,340 The schema of a 110 00:04:01,380 --> 00:04:04,110 database is the structure of the relation. 111 00:04:04,590 --> 00:04:07,010 So the schema includes the name 112 00:04:07,430 --> 00:04:09,000 of the relation and the attributes 113 00:04:09,290 --> 00:04:11,380 of the relation and the types of those attributes. 114 00:04:12,140 --> 00:04:13,310 Where the instance is the 115 00:04:13,410 --> 00:04:15,990 actual contents of the table at a given point in time. 116 00:04:16,350 --> 00:04:17,240 So, typically you set up 117 00:04:17,360 --> 00:04:19,020 a schema in advance, then the 118 00:04:19,490 --> 00:04:21,810 instances of the data will change over time. 119 00:04:23,920 --> 00:04:25,860 Now, I mentioned that most columns have types. 120 00:04:26,660 --> 00:04:28,000 But, there's also a special value 121 00:04:28,480 --> 00:04:29,680 that's in any type of 122 00:04:29,760 --> 00:04:31,240 any column and that's a 123 00:04:31,350 --> 00:04:32,590 special value known as null, and nulls 124 00:04:32,730 --> 00:04:34,340 are actually quite important in relational databases. 125 00:04:35,390 --> 00:04:36,760 Null values are used to 126 00:04:36,910 --> 00:04:38,270 denote that a particular 127 00:04:38,980 --> 00:04:41,280 value is maybe unknown or undefined. 128 00:04:42,650 --> 00:04:44,220 And, so let's suppose, let's add 129 00:04:44,580 --> 00:04:46,930 another tuple to our database. 130 00:04:47,630 --> 00:04:49,680 Let's say 345, another student 131 00:04:50,120 --> 00:04:51,670 named Craig, and for, 132 00:04:51,810 --> 00:04:53,220 whatever reason Craig doesn't have a GPA. 133 00:04:53,530 --> 00:04:54,630 Maybe Craig is home schooled, 134 00:04:55,120 --> 00:04:56,810 maybe Craig doesn't want to reveal his GPA. 135 00:04:57,540 --> 00:04:58,740 So then the database would contain 136 00:04:59,210 --> 00:05:00,530 a null value for Craig 137 00:05:01,620 --> 00:05:03,310 and we'll just put a neutral face there. 138 00:05:04,430 --> 00:05:06,930 Or, for example, maybe Bob doesn't 139 00:05:07,220 --> 00:05:08,250 want to have his photo in 140 00:05:08,340 --> 00:05:09,730 the database, so then Bob 141 00:05:10,560 --> 00:05:12,100 would have a null value 142 00:05:12,810 --> 00:05:14,890 for his photo, again nulls can go anywhere. 143 00:05:15,690 --> 00:05:16,750 Now null values are useful 144 00:05:17,160 --> 00:05:17,850 but one has to be very 145 00:05:18,290 --> 00:05:19,570 careful in a database system 146 00:05:19,920 --> 00:05:21,020 when you run queries 147 00:05:21,660 --> 00:05:23,200 over relations that have null values. 148 00:05:23,650 --> 00:05:24,710 In a later video we'll go 149 00:05:24,820 --> 00:05:25,910 into this in more detail, but I 150 00:05:25,990 --> 00:05:26,690 just wanted to give a just 151 00:05:26,940 --> 00:05:28,260 sort of example of what can happen. 152 00:05:29,130 --> 00:05:30,290 So, let's suppose we're 153 00:05:30,360 --> 00:05:31,610 asking a query over our 154 00:05:31,800 --> 00:05:32,900 student table of all students 155 00:05:33,410 --> 00:05:35,060 whose GPA is greater than 3.5. 156 00:05:36,040 --> 00:05:37,330 So when we run 157 00:05:37,460 --> 00:05:38,380 that query on our database 158 00:05:38,930 --> 00:05:40,020 obviously we'll get Amy out, 159 00:05:40,720 --> 00:05:43,070 obviously we won't get Bob out, but should we get Craig? 160 00:05:43,490 --> 00:05:44,500 The answer is No. 161 00:05:44,940 --> 00:05:45,840 We don't know for a fact 162 00:05:46,070 --> 00:05:47,140 that Craig's GPA is greater 163 00:05:47,400 --> 00:05:48,840 than 3.5, so we'll only get 164 00:05:48,980 --> 00:05:50,580 one student out from that query. 165 00:05:50,950 --> 00:05:52,640 Now let's suppose we 166 00:05:52,740 --> 00:05:53,750 had another query, where we 167 00:05:53,830 --> 00:05:54,900 were gonna ask for the GPA 168 00:05:55,760 --> 00:05:56,940 less than or equal to 3.5. 169 00:05:57,590 --> 00:05:59,360 So, similarly where we 170 00:05:59,660 --> 00:06:00,640 would not have Amy in result 171 00:06:01,090 --> 00:06:02,210 and we would certainly have Bob in 172 00:06:02,330 --> 00:06:04,080 the result and similarly would 173 00:06:04,230 --> 00:06:05,230 not have Craig in the 174 00:06:05,400 --> 00:06:06,380 result because we don't know that 175 00:06:06,490 --> 00:06:08,160 his GPA is less than or equal to 3.5. 176 00:06:08,720 --> 00:06:10,070 So far so good, but 177 00:06:10,400 --> 00:06:11,630 it gets a little weird is 178 00:06:11,760 --> 00:06:12,830 when we add an or here 179 00:06:13,050 --> 00:06:14,240 in our query, we say I 180 00:06:14,640 --> 00:06:15,700 want everyone who's GPA is greater 181 00:06:16,000 --> 00:06:18,700 than 3.5 or who's GPA is less than or equal to 3.5. 182 00:06:18,970 --> 00:06:20,660 And even though it 183 00:06:20,750 --> 00:06:22,120 looks like every tuple should 184 00:06:22,500 --> 00:06:23,790 satisfy this condition, that it's 185 00:06:23,920 --> 00:06:26,320 always true, that's not the case when we have null values. 186 00:06:27,070 --> 00:06:28,030 So, that's why one has to 187 00:06:28,130 --> 00:06:29,160 be careful when one uses 188 00:06:29,460 --> 00:06:31,110 null values in relational databases. 189 00:06:32,360 --> 00:06:33,830 Let me erase this now and 190 00:06:33,970 --> 00:06:35,110 let's move on to our next concept 191 00:06:35,740 --> 00:06:37,060 which is the concept of Key. 192 00:06:38,210 --> 00:06:41,160 Key is again another important concept in relational databases. 193 00:06:42,480 --> 00:06:43,590 And, a key is an 194 00:06:43,990 --> 00:06:45,070 attribute in of a relation 195 00:06:45,410 --> 00:06:48,600 where every value for that attribute is unique. 196 00:06:49,540 --> 00:06:50,330 So if we look at the 197 00:06:50,410 --> 00:06:51,960 student relation, we can feel 198 00:06:52,150 --> 00:06:54,560 pretty confident that the ID is going to be a key. 199 00:06:54,960 --> 00:06:56,120 In other words, every tuple is 200 00:06:56,250 --> 00:06:58,130 going to have a unique for ID. 201 00:06:59,640 --> 00:07:02,220 Thinking about the college relation, it's a little less clear. 202 00:07:02,610 --> 00:07:03,640 We might be tempted to say 203 00:07:03,730 --> 00:07:04,830 that the name of the college is 204 00:07:04,960 --> 00:07:06,330 an ID, that actually college 205 00:07:06,720 --> 00:07:09,410 names probably are not unique across the country. 206 00:07:10,000 --> 00:07:10,780 There's probably a lot of or 207 00:07:10,880 --> 00:07:13,130 several colleges named Washington college for example. 208 00:07:14,090 --> 00:07:15,060 You know what, we're allowed to 209 00:07:15,270 --> 00:07:16,720 have sets of attributes that 210 00:07:16,920 --> 00:07:18,830 are unique and that makes sense in the college relation. 211 00:07:19,780 --> 00:07:21,600 Most likely the combination of 212 00:07:21,670 --> 00:07:22,850 the name and state of 213 00:07:22,930 --> 00:07:24,200 a college is unique, and that's 214 00:07:24,490 --> 00:07:26,670 what we would identify as the key for the college relation. 215 00:07:27,720 --> 00:07:28,820 Now, you might wonder why it's 216 00:07:28,980 --> 00:07:32,170 even important to have attributes that are identified as keys. 217 00:07:32,570 --> 00:07:34,040 There's actually several uses for them. 218 00:07:34,700 --> 00:07:37,110 One of them is just to identify specific tuples. 219 00:07:37,550 --> 00:07:38,530 So if you want to run 220 00:07:38,970 --> 00:07:39,750 a query to get a specific 221 00:07:40,360 --> 00:07:41,900 tuple out of the database you 222 00:07:42,050 --> 00:07:44,650 would do that by asking for that tuple by its key. 223 00:07:45,050 --> 00:07:46,640 And related to that 224 00:07:46,870 --> 00:07:48,010 database systems for efficiency 225 00:07:48,770 --> 00:07:50,090 tend to build special index 226 00:07:50,560 --> 00:07:52,810 structures or store the database in a particular way. 227 00:07:53,080 --> 00:07:55,850 So it's very fast to find a tuple based on its key. 228 00:07:56,810 --> 00:07:58,250 And lastly, if one relation 229 00:07:58,900 --> 00:08:00,200 in a relational database wants 230 00:08:00,460 --> 00:08:01,470 to refer to tuples of another, 231 00:08:02,110 --> 00:08:04,360 there 's no concept of pointer in relational databases. 232 00:08:05,400 --> 00:08:06,800 Therefore, the first relation 233 00:08:07,260 --> 00:08:08,470 will typically refer to a 234 00:08:08,530 --> 00:08:10,690 tuple in the second relation by its unique key. 235 00:08:11,090 --> 00:08:13,460 As our videos develop we'll see the importance of keys. 236 00:08:15,100 --> 00:08:16,310 Okay, just to wrap up, 237 00:08:16,960 --> 00:08:18,520 I'll mention how one creates relations 238 00:08:19,100 --> 00:08:20,990 or tables in the SQL language. 239 00:08:21,440 --> 00:08:22,580 It's very simple, you just 240 00:08:22,890 --> 00:08:24,420 say "create table," give the 241 00:08:24,500 --> 00:08:26,200 name of the relation and a list of the attributes. 242 00:08:27,470 --> 00:08:28,910 And if you want to give types for the attributes. 243 00:08:29,760 --> 00:08:30,920 It's similar except you follow 244 00:08:31,190 --> 00:08:32,910 each attribute name with its type. 245 00:08:33,970 --> 00:08:35,100 So to wrap up, the relational 246 00:08:35,630 --> 00:08:37,080 model has been around a long time. 247 00:08:37,440 --> 00:08:39,090 Has started a huge industry. 248 00:08:39,350 --> 00:08:40,560 It's used by all database systems. 249 00:08:41,490 --> 00:08:42,330 As you've seen it's a very 250 00:08:42,490 --> 00:08:43,520 simple model and will shortly 251 00:08:43,950 --> 00:08:46,010 see that it can be queried with very nice languages. 252 00:08:46,650 --> 00:08:48,450 And, finally, it's been implemented very efficiently.