0:00:00.039,0:00:02.065 In the previous video, we learned the basics of XML. 0:00:02.065,0:00:04.036 In this video, we're 0:00:04.036,0:00:06.003 going to learn about Document Type Descriptors, 0:00:06.003,0:00:11.023 also known as DTDs, and also ID and ID ref attributes. 0:00:11.023,0:00:13.028 We learned that well-formed XML 0:00:13.028,0:00:14.072 is XML that adheres to 0:00:14.072,0:00:16.077 basic structural requirements: a single 0:00:16.077,0:00:18.073 root element, matched tags with 0:00:18.073,0:00:20.082 proper nesting, and unique 0:00:20.082,0:00:23.004 attributes within each element. 0:00:23.004,0:00:26.048 Now we're going to learn about what's known as valid XML. 0:00:26.048,0:00:27.096 Valid XML has to adhere 0:00:27.096,0:00:30.019 to the same basic structural requirements 0:00:30.019,0:00:32.000 as well-formed XML, but it 0:00:32.000,0:00:35.026 also adheres to content specific specifications. 0:00:35.026,0:00:38.054 And we're going to learn two languages for those specifications. 0:00:38.054,0:00:39.099 One of them is Document Type 0:00:39.099,0:00:41.092 Descriptors or DTDs, and the 0:00:41.092,0:00:44.084 other, a more powerful language, is XML schema. 0:00:44.084,0:00:46.036 Specifications in XML 0:00:46.036,0:00:50.075 schema are known as XSDs, for XML Schema Descriptions. 0:00:50.075,0:00:52.003 So as a reminder, here's how 0:00:52.003,0:00:54.039 things worked with well-formed XML documents. 0:00:54.039,0:00:55.073 We sent the document to a 0:00:55.073,0:00:57.014 parser and the parser would 0:00:57.014,0:00:58.038 either return that the document 0:00:58.038,0:01:02.001 was not well-formed or it would return parsed XML. 0:01:02.001,0:01:03.099 Now let's consider what happens with valid XML. 0:01:03.099,0:01:05.092 Now we use a validating 0:01:05.092,0:01:07.011 XML parser, and we have 0:01:07.011,0:01:08.032 an additional input to the 0:01:08.032,0:01:10.005 process, which is a 0:01:10.005,0:01:12.096 specification, either a DTD or an XSD. 0:01:12.096,0:01:15.049 So that's also fed to the parser, along with the document. 0:01:15.049,0:01:17.007 The parser can again 0:01:17.007,0:01:18.052 say the document is 0:01:18.052,0:01:22.006 not well formed if it doesn't meet the basic structural requirements. 0:01:22.006,0:01:23.019 It could also say that the 0:01:23.019,0:01:24.075 document is not valid, meaning 0:01:24.075,0:01:26.039 the structure of the document doesn't 0:01:26.039,0:01:28.006 match the content specific specification. 0:01:28.006,0:01:30.033 If everything is good, then 0:01:30.033,0:01:33.025 once again "parsed XML" is returned. 0:01:33.025,0:01:36.048 Now let's talk about the document-type descriptors, or DTDs. 0:01:36.048,0:01:37.041 We see a DTD in 0:01:37.041,0:01:38.046 the lower-left corner of the 0:01:38.046,0:01:39.057 video, but we won't look 0:01:39.057,0:01:40.091 at it in any detail, because we'll 0:01:40.091,0:01:44.008 be doing demos of DTDs a little later on. 0:01:44.008,0:01:45.004 A DTD is a language 0:01:45.004,0:01:47.078 that's kind of like a grammar, and 0:01:47.078,0:01:49.004 what you can specify in that language is for 0:01:49.004,0:01:51.025 a particular document what elements 0:01:51.025,0:01:52.086 you want that document to contain, 0:01:52.086,0:01:54.058 the tags of the elements, 0:01:54.058,0:01:55.008 what attributes can be in 0:01:55.008,0:01:59.006 the elements, how the different types of elements can be nested. 0:01:59.006,0:02:00.008 Sometimes the ordering of the 0:02:00.008,0:02:01.094 elements might want to be 0:02:01.094,0:02:06.017 specified, and sometimes the number of occurrences of different elements. 0:02:06.017,0:02:07.078 DTDs also allow the 0:02:07.078,0:02:09.000 introduction of special types of 0:02:09.000,0:02:11.091 attributes, called id and idrefs. 0:02:11.091,0:02:13.019 And, effectively, what these allow you 0:02:13.019,0:02:15.007 to do is specify pointers within 0:02:15.007,0:02:19.003 a document, although these pointers are untyped. 0:02:19.003,0:02:20.039 Before moving to the demo, 0:02:20.039,0:02:21.045 let's talk a little bit about 0:02:21.045,0:02:22.098 the positives and negatives about 0:02:22.098,0:02:24.035 choosing to use a DTD 0:02:24.035,0:02:26.026 or and XSD for one's XML data. 0:02:26.026,0:02:27.055 After all, if you're 0:02:27.055,0:02:29.022 building an application that encodes 0:02:29.022,0:02:30.052 its data in XML, you'll have 0:02:30.052,0:02:32.002 to decide whether you want the 0:02:32.002,0:02:33.069 XML to just be well formed 0:02:33.069,0:02:34.094 or whether you want to 0:02:34.094,0:02:37.000 have specifications and require the 0:02:37.000,0:02:40.035 XML to be valid to satisfy those specifications. 0:02:40.035,0:02:41.081 So, let's put a few positives 0:02:41.081,0:02:44.043 of choosing a later of requiring a DTD or an XSD. 0:02:44.043,0:02:46.058 First of all, one of 0:02:46.058,0:02:47.065 them is that when you write your 0:02:47.065,0:02:49.049 program, you can assume 0:02:49.049,0:02:52.056 that the data adheres to a specific structure. 0:02:52.056,0:02:54.048 So programs can assume a 0:02:54.048,0:02:56.052 structure and so the 0:02:56.052,0:02:57.064 programs themselves are simpler because they don't 0:02:57.064,0:03:00.069 have to be doing a lot of error checking on the data. 0:03:00.069,0:03:01.095 They'll know that before the data 0:03:01.095,0:03:03.062 reaches the program, it's been 0:03:03.062,0:03:07.025 run through a validator and it does satisfy a particular structure. 0:03:07.025,0:03:08.084 Second of all, we talked 0:03:08.084,0:03:10.098 at some time ago about 0:03:10.098,0:03:13.013 the cascading style sheet language 0:03:13.013,0:03:15.092 and the extensible style sheet languages. 0:03:15.092,0:03:17.088 These are languages that take XML 0:03:17.088,0:03:19.008 and they run rules on it 0:03:19.008,0:03:22.029 to process it into a different form, often HTML. 0:03:22.029,0:03:24.017 When you write those rules, if 0:03:24.017,0:03:25.019 you note that the data 0:03:25.019,0:03:26.079 has a certain structure, then those 0:03:26.079,0:03:28.044 rules can be simpler, so like 0:03:28.044,0:03:30.017 the programs they also can 0:03:30.017,0:03:33.047 assume particular structure and it makes them simpler. 0:03:33.047,0:03:35.017 Now, another use for DTDs 0:03:35.017,0:03:36.081 or XSDs is as a 0:03:36.081,0:03:39.014 specification language for conveying 0:03:39.014,0:03:41.061 what XML might need to look like. 0:03:41.061,0:03:43.081 So, as an example if you're 0:03:43.081,0:03:45.059 performing data exchange using 0:03:45.059,0:03:47.011 XML, maybe a company is 0:03:47.011,0:03:48.097 going to receive purchase orders in 0:03:48.097,0:03:50.024 XML, the company can 0:03:50.024,0:03:51.042 actually use the DTD as 0:03:51.042,0:03:53.015 a specification for what 0:03:53.015,0:03:54.059 the XML needs to look 0:03:54.059,0:03:56.099 like when it arrives at 0:03:56.099,0:03:59.058 the program it's going to operate on it. 0:03:59.058,0:04:01.022 Also documentation, it can 0:04:01.022,0:04:02.043 be useful to use one of 0:04:02.043,0:04:04.018 the specifications to just document 0:04:04.018,0:04:06.042 what the data itself looks like. 0:04:06.042,0:04:08.008 In general, really what 0:04:08.008,0:04:11.013 we have here is the benefits of typing. 0:04:11.013,0:04:13.045 We're talking about strongly typed data 0:04:13.045,0:04:17.094 versus loosely-typed data, if you want to think of it that way. 0:04:17.094,0:04:21.003 Now let's look at when we might prefer not to use a DTD. 0:04:21.003,0:04:22.065 So what I'm going describe down 0:04:22.065,0:04:25.046 here is the benefits of not using a DTD. 0:04:25.046,0:04:27.084 So the biggest benefit is flexibility. 0:04:27.084,0:04:30.011 So a DTD makes your 0:04:30.011,0:04:33.015 XML data have to conform to a specification. 0:04:33.015,0:04:34.093 If you want more flexibility or 0:04:34.093,0:04:36.078 you want ease of change 0:04:36.078,0:04:37.075 in the way that the data is 0:04:37.075,0:04:39.014 formatted without running into 0:04:39.014,0:04:40.059 a lot of errors, then, if 0:04:40.059,0:04:42.018 that's what you want, 0:04:42.018,0:04:45.004 then the DTD can be constraining. 0:04:45.004,0:04:46.082 Another fact is that DTDs can 0:04:46.082,0:04:48.008 be fairly messy and this 0:04:48.008,0:04:49.014 is not going to be obvious 0:04:49.014,0:04:50.024 to you yet until we get 0:04:50.024,0:04:52.099 into the demo, but if 0:04:52.099,0:04:55.048 the data is irregular, very irregular, then 0:04:55.048,0:04:57.009 specifying its structure can 0:04:57.009,0:05:00.051 be hard, especially for irregular documents. 0:05:00.051,0:05:02.066 Actually, when we see 0:05:02.066,0:05:04.099 the schema language, we'll 0:05:04.099,0:05:06.081 discover that XSDs can be, 0:05:06.081,0:05:10.066 I would say, really messy, so they can actually get very large. 0:05:10.066,0:05:11.077 It's possible to have a 0:05:11.077,0:05:13.007 document where the specification of 0:05:13.007,0:05:14.096 the structure of the document is 0:05:14.096,0:05:16.033 much, much larger than the 0:05:16.033,0:05:18.016 document itself, which seems not 0:05:18.016,0:05:19.039 entirely intuitive, but when we get to 0:05:19.039,0:05:22.007 learn about XSDs, I think you'll see how that can happen. 0:05:22.007,0:05:23.078 So, overall, this is 0:05:23.078,0:05:26.002 the benefits of nil typing. 0:05:26.002,0:05:28.038 It' s really quite similar to 0:05:28.038,0:05:31.078 the analogy in programming languages. 0:05:31.078,0:05:33.002 The remainder of this video will 0:05:33.002,0:05:35.094 teach about the DTDs themselves through a set of examples. 0:05:35.094,0:05:36.083 We'll have a separate video 0:05:36.083,0:05:39.044 for learning about XML schema and XSDs. 0:05:39.044,0:05:41.066 So, here we are 0:05:41.066,0:05:43.033 with our first document that we're 0:05:43.033,0:05:45.079 going to look at with a document type descriptor. 0:05:45.079,0:05:47.061 We have on the left the document itself. 0:05:47.061,0:05:49.017 We have on the right the document-type 0:05:49.017,0:05:50.033 descriptor, and then we have 0:05:50.033,0:05:51.096 in the lower right a command 0:05:51.096,0:05:55.015 line shell that we're going to use to validate the document. 0:05:55.015,0:05:56.028 So this is similar data to 0:05:56.028,0:05:57.049 what we saw on the last video, 0:05:57.049,0:05:59.005 but let's go through it just to see what we have. 0:05:59.005,0:06:01.022 We have an outermost element called 0:06:01.022,0:06:04.083 bookstore, and we have two books in our bookstore. 0:06:04.083,0:06:08.027 The first book has an ISBN number, price and editions. 0:06:08.027,0:06:09.065 As attributes and then it 0:06:09.065,0:06:12.001 has a sub-element called title, another 0:06:12.001,0:06:13.062 sub-element called authors with two 0:06:13.062,0:06:16.031 authors underneath; first names and last names. 0:06:16.031,0:06:18.005 The second book element is 0:06:18.005,0:06:20.067 similar, except it doesn't have a edition. 0:06:20.067,0:06:23.032 It also has, as we see, a remark. 0:06:23.032,0:06:24.084 Now let's take a look at 0:06:24.084,0:06:25.062 the DTD and I'm just going 0:06:25.062,0:06:27.081 to walk through DTD, not 0:06:27.081,0:06:29.019 too slowly, not too fast, and 0:06:29.019,0:06:30.079 explain exactly what it's doing. 0:06:30.079,0:06:31.096 So the start of the 0:06:31.096,0:06:33.027 DTD says this a 0:06:33.027,0:06:35.017 DTD named bookstore and the 0:06:35.017,0:06:37.007 root element is called bookstore, 0:06:37.007,0:06:40.008 and now we have the first grammar-like construct. 0:06:40.008,0:06:42.016 So these constructs, in fact, are 0:06:42.016,0:06:44.053 a little bit like regular expressions if you know them. 0:06:44.053,0:06:45.049 What this says is that 0:06:45.049,0:06:47.025 a bookstore element has as 0:06:47.025,0:06:49.011 its sub-element any number 0:06:49.011,0:06:51.028 of elements that are called book or magazine. 0:06:51.028,0:06:53.066 We have book or magazine. 0:06:53.066,0:06:55.059 We don't have any magazines yet but we'll add one. 0:06:55.059,0:06:58.069 And then this star says, zero or more instances. 0:06:58.069,0:07:02.015 It's the clean and close operator for those of you familiar with regular expression. 0:07:02.015,0:07:04.034 Now let's talk about 0:07:04.034,0:07:07.091 what the book element[br]has, so that's our next specification. 0:07:07.091,0:07:09.039 The book element has a 0:07:09.039,0:07:11.089 title followed by authors, 0:07:11.089,0:07:13.073 followed by an optional remark. 0:07:13.073,0:07:14.052 So now we don't have an 0:07:14.052,0:07:15.007 "or", we have a comma, and 0:07:15.007,0:07:16.077 that says that these are going to 0:07:16.077,0:07:17.099 be in that order - title, 0:07:17.099,0:07:19.031 authors, and remark and the 0:07:19.031,0:07:22.021 question mark says that the remark is optional. 0:07:22.021,0:07:24.074 Next we have the attributes of our book elements. 0:07:24.074,0:07:26.043 So this bang attribute list 0:07:26.043,0:07:27.064 says we're going to describe 0:07:27.064,0:07:28.085 the attributes and we're going 0:07:28.085,0:07:31.038 to have three of them: the ISBN, 0:07:31.038,0:07:33.007 the price, and the edition. 0:07:33.007,0:07:35.016 C data is the type of the attribute. 0:07:35.016,0:07:36.024 It's just a string. 0:07:36.024,0:07:37.072 And then required says that 0:07:37.072,0:07:39.028 the attribute must be present, whereas 0:07:39.028,0:07:41.042 implied says it doesn't have to be present. 0:07:41.042,0:07:45.023 As you may remember, we have one book that doesn't have an edition. 0:07:45.023,0:07:46.056 Our magazines are simply going 0:07:46.056,0:07:47.066 to have titles and they're going 0:07:47.066,0:07:49.089 to have attributes that are month and year. 0:07:49.089,0:07:51.095 Again, we don't have any magazines yet. 0:07:51.095,0:07:53.074 A title is going to 0:07:53.074,0:07:55.058 consist of string data. 0:07:55.058,0:07:58.025 So here we see our title of first course and database system. 0:07:58.025,0:08:02.001 You can think of that as the leaf data in the XML tree. 0:08:02.001,0:08:03.068 And when you have a leaf that 0:08:03.068,0:08:05.036 consists of text data, this is 0:08:05.036,0:08:06.007 what you put in the DTD 0:08:06.007,0:08:08.009 - just take my word for it: 0:08:08.009,0:08:10.078 hash PC data in parentheses. 0:08:10.078,0:08:14.031 Now our authors are an element that still has structure . 0:08:14.031,0:08:16.068 Our authors have a sub-element, 0:08:16.068,0:08:18.016 author sub-elements or elements, 0:08:18.016,0:08:19.064 and we're going to 0:08:19.064,0:08:21.002 specify here that the 0:08:21.002,0:08:23.007 author's element must have one 0:08:23.007,0:08:25.023 or more author subelements. 0:08:25.023,0:08:26.058 So that's what the plus 0:08:26.058,0:08:29.054 is saying here, again taken from regular expressions. 0:08:29.054,0:08:32.016 "Plus" means one or more instances. 0:08:32.016,0:08:33.053 We have the remark, which 0:08:33.053,0:08:36.037 is just going to be pc data or string data. 0:08:36.037,0:08:38.004 We have our authors which consist 0:08:38.004,0:08:40.002 of a first name sub-element and 0:08:40.002,0:08:42.086 a last-name sub-element, and in that order. 0:08:42.086,0:08:46.018 And then finally, our first names and last names are also strengths. 0:08:46.018,0:08:47.067 So, this is the entire 0:08:47.067,0:08:49.005 DTD and it describes 0:08:49.005,0:08:51.064 in detail the structure 0:08:51.064,0:08:53.026 of our document. 0:08:53.026,0:08:54.053 Now we have a command, we're 0:08:54.053,0:08:57.002 using something called xmllint, 0:08:57.002,0:09:00.009 that will check to see if the document meets the structure. 0:09:00.009,0:09:02.021 We'll just run that command 0:09:02.021,0:09:03.087 here with a couple of options, and 0:09:03.087,0:09:05.015 it doesn't give us any output 0:09:05.015,0:09:09.049 which actually means that our document is correct. 0:09:09.049,0:09:13.014 Well be making some edits and seeing when our document is not correct what happens when we run the command. 0:09:13.014,0:09:14.078 So let's make our first edit, 0:09:14.078,0:09:16.014 let's say that we decide that 0:09:16.014,0:09:17.071 we want the additional attribute 0:09:17.071,0:09:21.033 of our books to be "required" rather than "applied". 0:09:21.033,0:09:23.009 So we'll change the DTD. 0:09:23.009,0:09:27.007 We'll save the file and now when we run our command. 0:09:27.007,0:09:28.087 So as expected we got an 0:09:28.087,0:09:30.031 error, and the error said 0:09:30.031,0:09:33.031 that one of our book elements does not have attribute addition. 0:09:33.031,0:09:36.073 Now that addition is required, every book element ought to have it. 0:09:36.073,0:09:39.038 So let's add an addition to our second book. 0:09:39.038,0:09:41.028 Let 's say that it's 0:09:41.028,0:09:43.003 the second edition, save the 0:09:43.003,0:09:44.079 file, we'll validate our 0:09:44.079,0:09:48.035 document again, and now everything is good. Let's 0:09:48.035,0:09:49.076 do an edit to the document 0:09:49.076,0:09:51.018 this time to see what 0:09:51.018,0:09:52.013 happens when we change the 0:09:52.013,0:09:54.086 order of the first name and the last name. 0:09:54.086,0:09:58.068 So we've swapped Jeffrey Ullman to be Ullman Jeffery. 0:09:58.068,0:10:00.007 We validate our document, and now 0:10:00.007,0:10:02.005 we see we got an error 0:10:02.005,0:10:04.007 because the elements are not in the correct order. 0:10:04.007,0:10:06.046 In this case, let's undo that 0:10:06.046,0:10:09.029 change, rather than change our DTD. 0:10:09.029,0:10:11.028 Let's try another edit to our document. 0:10:11.028,0:10:13.035 Let's add a remark to our first book. 0:10:13.035,0:10:14.064 But what we'll do is 0:10:14.064,0:10:16.038 we'll leave the remark empty, so 0:10:16.038,0:10:18.005 we'll add a opening and then 0:10:18.005,0:10:24.021 directly a closing tag, and let's see if that validates. 0:10:24.021,0:10:25.021 So, it did validate. 0:10:25.021,0:10:26.068 And in fact when we have 0:10:26.068,0:10:27.087 PC data as the type 0:10:27.087,0:10:32.039 of an element it's perfectly acceptable to have a empty element. 0:10:32.039,0:10:34.086 As a final change, let's add a magazine to our database. 0:10:34.086,0:10:37.046 You'll have to bear with me as I type. 0:10:37.046,0:10:39.008 I'm always a little bit slow. 0:10:39.008,0:10:40.043 So we see over here that 0:10:40.043,0:10:41.056 when we have a magazine there are 0:10:41.056,0:10:44.052 two required attributes, the month and the year. 0:10:44.052,0:10:45.091 So, let's say the month is 0:10:45.091,0:10:48.001 January and the year, 0:10:48.001,0:10:50.096 let's make that 2011, 0:10:50.096,0:10:53.094 and then we have a title for our magazine. 0:10:53.094,0:10:54.017 Here. 0:10:54.017,0:10:55.073 We'll go down here. 0:10:55.073,0:11:00.052 Our title, let's make it National Geographic. 0:11:00.052,0:11:03.066 We'll close the tag, title tag. 0:11:03.066,0:11:05.061 And then, sorry again about my typing. 0:11:05.061,0:11:08.039 Let's go ahead and validate the document. 0:11:08.039,0:11:11.081 we saw premature end of something or other. 0:11:11.081,0:11:13.022 We forgot our closing tag for 0:11:13.022,0:11:17.072 magazine, let's put that in. 0:11:17.072,0:11:19.009 My terrible typing, and here we go. 0:11:19.009,0:11:23.004 Let's validate, and we're done. 0:11:23.004,0:11:26.077 Now we're gonna learn about and id rep attributes. 0:11:26.077,0:11:28.031 The document on the left side 0:11:28.031,0:11:29.056 contains the same data as 0:11:29.056,0:11:32.041 our previous document but completely restructured. 0:11:32.041,0:11:33.099 Instead of having authors as 0:11:33.099,0:11:35.064 subelements of book elements, 0:11:35.064,0:11:37.059 we're going to have our authors listed separately, 0:11:37.059,0:11:41.055 and then effectively point from the books to the authors of the book. 0:11:41.055,0:11:42.004 We'll take a look at the 0:11:42.004,0:11:43.083 data first, and then 0:11:43.083,0:11:47.011 we'll look at the DTD that describes the data. 0:11:47.011,0:11:48.037 Let's actually start with the 0:11:48.037,0:11:51.043 author, so our bookstore element 0:11:51.043,0:11:55.006 here has two subelements that are books and three that are authors. 0:11:55.006,0:11:56.091 So, looking at the authors, we have 0:11:56.091,0:11:58.014 the first name and last name 0:11:58.014,0:11:59.095 as sub-elements as usual, but 0:11:59.095,0:12:02.038 we've added what we call the ident attribute. 0:12:02.038,0:12:03.059 That's not a keyword; we've just 0:12:03.059,0:12:05.026 called the attribute ident, and 0:12:05.026,0:12:07.005 then for each of the three authors, 0:12:07.005,0:12:08.083 we've given a string value 0:12:08.083,0:12:10.018 to that attribute that we're going 0:12:10.018,0:12:12.094 to use effectively for the pointers in the book. 0:12:12.094,0:12:16.021 So we have our three authors, now let's take a look at the books. 0:12:16.021,0:12:18.042 Our book has the ISBN number and price. 0:12:18.042,0:12:21.032 I've taken the addition out for now. 0:12:21.032,0:12:23.082 special attribute called authors. 0:12:23.082,0:12:25.084 Authors is an ID reps 0:12:25.084,0:12:27.069 attribute, and it's value 0:12:27.069,0:12:28.098 can refer to one or 0:12:28.098,0:12:31.029 more strings that are ID attributes. 0:12:31.029,0:12:32.062 attributes in another element. 0:12:32.062,0:12:33.066 So that's what we're doing here. 0:12:33.066,0:12:36.077 We're referring to the two author elements here. 0:12:36.077,0:12:40.044 And in our second book we're referring to the three author elements. 0:12:40.044,0:12:41.007 We still have the title subelement 0:12:41.007,0:12:44.091 and we still have the remarks subelement. 0:12:44.091,0:12:46.027 And furthermore, we have one 0:12:46.027,0:12:47.087 other cute thing here, which is, 0:12:47.087,0:12:49.081 instead of referring to 0:12:49.081,0:12:51.015 the book by name within the 0:12:51.015,0:12:52.057 remark when we're talking about 0:12:52.057,0:12:56.001 the other book, we have another type of pointer. 0:12:56.001,0:12:57.062 So we'll specify that the 0:12:57.062,0:12:59.088 ISBN is an ID 0:12:59.088,0:13:01.064 for books and then this 0:13:01.064,0:13:03.061 is an id reps attribute 0:13:03.061,0:13:07.083 that's referring to the id of the other book. 0:13:07.083,0:13:11.063 The DTD on the right that describes the structure of this document. 0:13:11.063,0:13:12.092 This time our bookstore is 0:13:12.092,0:13:14.031 going to contain zero or more 0:13:14.031,0:13:17.038 books followed by zero or more authors. 0:13:17.038,0:13:18.077 Our books contain a title and 0:13:18.077,0:13:20.083 an optional remark is subelements and 0:13:20.083,0:13:22.097 now they contain three attributes, 0:13:22.097,0:13:24.056 the IDBN which is 0:13:24.056,0:13:26.072 now a special type of 0:13:26.072,0:13:28.061 attribute called and ID, the 0:13:28.061,0:13:30.001 price,which is the string 0:13:30.001,0:13:31.036 value as usual and the 0:13:31.036,0:13:32.077 authors which is the special type 0:13:32.077,0:13:34.085 called id reps. Let's keep 0:13:34.085,0:13:37.082 going, our title is just string Value as usual. 0:13:37.082,0:13:41.055 A remark, here this is a actually interesting construct. 0:13:41.055,0:13:43.081 A remark consist of the 0:13:43.081,0:13:46.002 PC data which is string, 0:13:46.002,0:13:47.058 or a book reference and then 0:13:47.058,0:13:50.009 zero more instances of those. 0:13:50.009,0:13:51.016 This is the type of construct 0:13:51.016,0:13:52.073 that can be used to mix 0:13:52.073,0:13:55.019 strings and sub elements within an element. 0:13:55.019,0:13:56.035 So anytime you want an 0:13:56.035,0:13:57.063 element that might have some 0:13:57.063,0:14:00.089 strings and then another element and then more string value. 0:14:00.089,0:14:01.082 That's how it's done. 0:14:01.082,0:14:05.097 PC data or the element type zero or more. 0:14:05.097,0:14:08.002 Then we have our book reference 0:14:08.002,0:14:09.091 which is actually an empty element it's 0:14:09.091,0:14:11.039 only interesting because is has 0:14:11.039,0:14:12.039 an attribute so let's go 0:14:12.039,0:14:13.046 back here we see our book 0:14:13.046,0:14:14.077 wrap here it actually doesn't 0:14:14.077,0:14:16.049 have any data or sub 0:14:16.049,0:14:17.072 elements, but it has an 0:14:17.072,0:14:20.099 attribute called book and that is an ID ref. 0:14:20.099,0:14:22.074 That means it refers to an 0:14:22.074,0:14:26.002 ID attribute of another, another 0:14:26.002,0:14:27.004 element. 0:14:27.004,0:14:28.085 Now we have our authors the first 0:14:28.085,0:14:30.046 name and the last name and 0:14:30.046,0:14:33.018 our author attributes have again 0:14:33.018,0:14:35.089 an ID and we're calling it the ident. 0:14:35.089,0:14:39.039 And finally the first name and last name are string values. 0:14:39.039,0:14:40.009 This may seem overwhelming but the 0:14:40.009,0:14:43.045 key points in this DTD 0:14:43.045,0:14:44.031 are the ID the attributes. 0:14:44.031,0:14:46.051 So the ID attributes, the ISBN 0:14:46.051,0:14:48.028 attributes in the book, and 0:14:48.028,0:14:50.066 the ident, wherever it 0:14:50.066,0:14:52.049 went, ident attribute in the author 0:14:52.049,0:14:53.093 are special attributes, and by 0:14:53.093,0:14:54.094 the way, they do need to be 0:14:54.094,0:14:57.021 unique values for those attributes, 0:14:57.021,0:14:58.075 and they're special in that 0:14:58.075,0:15:01.000 ID refs attributes can refer 0:15:01.000,0:15:03.052 to them, and that will be checked as well. 0:15:03.052,0:15:04.064 Now, I did want to 0:15:04.064,0:15:05.081 point out that the book 0:15:05.081,0:15:08.043 reference here says ID ref singular. 0:15:08.043,0:15:09.009 When you have a singular 0:15:09.009,0:15:11.019 ID ref then the string has 0:15:11.019,0:15:13.058 to be exactly one ID value. 0:15:13.058,0:15:15.066 When you have the plural ID refs. 0:15:15.066,0:15:17.019 Then the string of the 0:15:17.019,0:15:19.001 attribute is one or 0:15:19.001,0:15:21.038 more ID ref value, I'm 0:15:21.038,0:15:24.039 sorry one or more ID values separated by spaces. 0:15:24.039,0:15:27.071 So it's a little bit clunky, but it does seem to work. 0:15:27.071,0:15:31.044 Now let's go to our command line, and let's validate the document. 0:15:31.044,0:15:33.007 So the document is in fact valid. 0:15:33.007,0:15:34.005 That's what it means when we 0:15:34.005,0:15:35.065 get nothing back, and let's 0:15:35.065,0:15:36.089 make some changes, as we did 0:15:36.089,0:15:39.001 before, to explore what structure 0:15:39.001,0:15:42.002 is imposed and what's checked with this DTD in the presence. 0:15:42.002,0:15:44.006 IDs and ID refs. 0:15:44.006,0:15:46.031 As a first change, let's change 0:15:46.031,0:15:48.031 this ID, this identifier 0:15:48.031,0:15:51.004 HG to JU. 0:15:51.004,0:15:52.005 That should actually cause a couple of problems 0:15:52.005,0:15:53.033 when we do that let's 0:15:53.033,0:15:56.061 validate the document and see what happens. 0:15:56.061,0:15:58.094 And we do in fact get two different errors. 0:15:58.094,0:16:00.058 The first error says that 0:16:00.058,0:16:03.007 we have two instances of "JU". 0:16:03.007,0:16:04.026 As you can see here, we 0:16:04.026,0:16:06.004 now have JU twice where 0:16:06.004,0:16:08.007 ID values do have to be unique. 0:16:08.007,0:16:10.089 They have to be globally unique throughout the document. 0:16:10.089,0:16:12.029 The second error that occurred 0:16:12.029,0:16:14.045 when we changed HG to JU 0:16:14.045,0:16:17.027 is we effectively have a dangling pointer. 0:16:17.027,0:16:19.004 We refer to HG here 0:16:19.004,0:16:21.049 in this ID refs attribute but there's 0:16:21.049,0:16:24.026 no longer an element whose value is HG. 0:16:24.026,0:16:25.084 So that's an error as well. 0:16:25.084,0:16:27.072 So let's change it back to 0:16:27.072,0:16:31.001 HG just so our document is valid again. 0:16:31.001,0:16:34.076 Now let's make another change, let's take our book reference. 0:16:34.076,0:16:37.076 We can see that our book reference is referring to the other book. 0:16:37.076,0:16:39.019 We're in the complete book here 0:16:39.019,0:16:40.046 and the comment, the remark is 0:16:40.046,0:16:41.075 referring to the first course 0:16:41.075,0:16:44.047 through the ISBN number, but let's 0:16:44.047,0:16:47.055 change this string instead to refer to HG. 0:16:47.055,0:16:49.053 So now we're actually referring 0:16:49.053,0:16:51.087 to an author rather than another book. 0:16:51.087,0:16:54.023 Let's check if the document validates. 0:16:54.023,0:16:55.044 In fact it does. 0:16:55.044,0:16:56.064 And that shows that the 0:16:56.064,0:16:59.008 pointers when you have a DTD are untyped. 0:16:59.008,0:17:01.004 So it does check to make 0:17:01.004,0:17:02.007 sure that this is an 0:17:02.007,0:17:03.072 id of another element, but we 0:17:03.072,0:17:05.005 weren't able to specify that 0:17:05.005,0:17:07.019 it should be a book element 0:17:07.019,0:17:08.063 in our DTD, and since we're 0:17:08.063,0:17:10.004 not able to specify it, of 0:17:10.004,0:17:11.091 course it's not possible to check it. 0:17:11.091,0:17:13.022 We will see that in XML 0:17:13.022,0:17:14.086 schema, we can have typed 0:17:14.086,0:17:17.096 pointers but it's not possible to have them in DTDs. 0:17:17.096,0:17:19.016 The last change I'm going to 0:17:19.016,0:17:20.066 show is to add a 0:17:20.066,0:17:22.081 second book reference within our remark. 0:17:22.081,0:17:24.017 So as I pointed out over 0:17:24.017,0:17:26.034 here, when we write PC data 0:17:26.034,0:17:28.014 or in an element type 0:17:28.014,0:17:29.061 followed by the [xx] closure, the 0:17:29.061,0:17:31.035 zero or more star, that 0:17:31.035,0:17:34.031 means we can freely mix text and sub-elements. 0:17:34.031,0:17:39.071 So just right in the middle here, let's put a book reference. 0:17:39.071,0:17:41.041 and we can put, let's say 0:17:41.041,0:17:45.067 book equals JU, and that 0:17:45.067,0:17:46.092 will be the end of our reference 0:17:46.092,0:17:48.062 there and now we 0:17:48.062,0:17:50.027 see that we have text followed 0:17:50.027,0:17:51.067 by a subelement followed by more 0:17:51.067,0:17:53.031 text then so on. 0:17:53.031,0:17:56.065 That should validate fine, and in fact it does. 0:17:56.065,0:17:58.081 That completes our demonstration of 0:17:58.081,9:59:59.000 XML documents with DTDs.