0:00:07.374,0:00:11.044 Welcome, uh, to this, uh,[br]cloud computing concepts course. 0:00:11.044,0:00:14.247 In this, uh, lecture we'll be[br]covering several, uh, concepts, 0:00:14.247,0:00:19.018 uh, that are assumed[br]as a part of, uh, the topics 0:00:19.018,0:00:20.220 that we'll be discussing 0:00:20.220,0:00:22.389 in the cloud computing concepts, uh, course. 0:00:23.656,0:00:25.792 So, we'll be covering several[br]of the basics, uh, 0:00:25.792,0:00:28.128 they're considered to be basics[br]in computer science. 0:00:28.128,0:00:29.295 Uh, typically, 0:00:29.295,0:00:31.564 some of these concepts,[br]are concepts that, uh, 0:00:31.564,0:00:33.933 computer science students learn[br]in their first couple of years 0:00:33.933,0:00:36.035 in, uh,[br]a computer science degree, 0:00:36.035,0:00:38.271 undergraduate degree program. 0:00:38.271,0:00:39.072 So for those of you, 0:00:39.072,0:00:41.207 who are already familiar[br]with this material, 0:00:41.207,0:00:44.043 uh, this could be a refresher[br]or you could just, uh, skip it, 0:00:44.043,0:00:46.479 if you already know,[br]uh, this stuff. 0:00:46.479,0:00:48.648 In any case, uh,[br]you can always use this 0:00:48.648,0:00:50.617 orientation lecture[br]as a reference, 0:00:50.617,0:00:54.954 uh, to keep coming back to if, you encounter a, uh, term in, 0:00:54.954,0:00:57.957 uh, during the regular lectures that doesn't make sense to you, 0:00:57.957,0:00:59.459 or that you'd like[br]to be reminded about. 0:01:01.127,0:01:02.529 So, we'll discuss a few topics. 0:01:02.529,0:01:04.364 We'll discuss[br]some basic data structures, 0:01:04.364,0:01:06.065 uh, what processes are. 0:01:06.065,0:01:08.468 We'll discuss a little bit about[br]computer architecture, 0:01:08.468,0:01:11.905 and then, um, a little bit[br]of math with Big O notation, 0:01:11.905,0:01:13.273 basic probability 0:01:13.273,0:01:15.175 and a few other[br]miscellaneous topics. 0:01:15.175,0:01:16.009 So, let's get started here. 0:01:17.410,0:01:18.778 So, bas-basic data structures, 0:01:18.778,0:01:20.246 we'll discuss two di-different data structures. 0:01:20.246,0:01:22.015 The first one is a queue. 0:01:22.015,0:01:22.916 A queue, is essentially, 0:01:22.916,0:01:25.385 a First-in First-out[br]data structure. 0:01:25.385,0:01:27.687 Elements of the queue could[br]be, uh, any data types. 0:01:27.687,0:01:31.157 In this example, here, I have[br]elements that are in tiers. 0:01:31.157,0:01:34.694 So, in this, uh, example right[br]now, the queue has 5 elements, 0:01:34.694,0:01:37.197 3, 5, 8, 0, 7. 0:01:37.197,0:01:39.799 This is an ordered,[br]list of elements. 0:01:39.799,0:01:41.768 When you remove an element[br]from the queue, 0:01:41.768,0:01:43.503 you remove it from the head[br]of the queue. 0:01:43.503,0:01:46.406 In this case, the head[br]is the element three. 0:01:46.406,0:01:48.107 Uh, when you insert[br]a new element, 0:01:48.107,0:01:50.510 you insert it in the tail[br]of the queue, which means that, 0:01:50.510,0:01:52.212 if, I insert a new element,[br]say one, 0:01:52.212,0:01:54.914 it's going to get inserted[br]right after seven in this queue. 0:01:56.316,0:01:58.017 So removal always comes[br]from the head, 0:01:58.017,0:02:00.420 while insertion always happens at the tail. 0:02:00.420,0:02:02.555 Uh, so, for instance,[br]given this particular queue, 0:02:02.555,0:02:04.557 consisting of five elements,[br]if you-if you, 0:02:04.557,0:02:07.961 dequeue or remove an item, uh, then that item will be three. 0:02:07.961,0:02:09.863 At that point of time[br]the queue will only consist 0:02:09.863,0:02:13.733 of 5, 8, 0, and 7,[br]with the head pointing to 5. 0:02:13.733,0:02:16.803 The next item that you dequeue[br]or remove will be five, okay? 0:02:16.803,0:02:19.639 After that the queue will consist of only 8, 0, and 7. 0:02:19.639,0:02:23.009 Then, uh, the next item[br]that you dequeue will be eight, 0:02:23.009,0:02:25.545 um, and so on and so forth. 0:02:25.545,0:02:29.349 Uh, dequeues and, or removals,[br]as well as, insertions, 0:02:29.349,0:02:31.851 uh, can occur, uh,[br]concurrently, so you can, 0:02:31.851,0:02:33.286 uh, remove an element, 0:02:33.286,0:02:35.088 but at the same time also insert an element at the tail. 0:02:36.990,0:02:38.224 So, that's a queue, 0:02:38.224,0:02:40.193 which is a First-in First-out data structure. 0:02:40.193,0:02:42.095 A different data structure[br]is the stack, 0:02:42.095,0:02:44.664 which is the First-in Last-out data structure. 0:02:44.664,0:02:47.100 Uh, imagine a stack of dishes[br]that you, uh, 0:02:47.100,0:02:48.201 place on your table. 0:02:48.201,0:02:49.736 Uh, the dish[br]that you put on the top, 0:02:49.736,0:02:51.371 that you add the last, 0:02:51.371,0:02:53.840 will be the first one[br]that you can remove, right? 0:02:53.840,0:02:55.475 And that's essentially a stack. 0:02:55.475,0:02:58.912 So, uh, remove or also known,[br]as a pop happens from the top. 0:02:58.912,0:03:01.214 An insert or a push[br]also happens at the top. 0:03:01.214,0:03:05.318 Okay, so in this case, um, uh, if you, uh, push a new element 0:03:05.318,0:03:08.655 into this particular stack,[br]um, uh, say nine. 0:03:08.655,0:03:10.456 The nine will go above three. 0:03:10.456,0:03:12.659 After that, if you pop,[br]or remove an element, 0:03:12.659,0:03:14.294 that's going to be[br]the last element that you added, 0:03:14.294,0:03:15.762 which is nine, in this case. 0:03:15.762,0:03:17.463 The next pop after that[br]will get three, 0:03:17.463,0:03:18.765 because nine was just removed. 0:03:18.765,0:03:21.768 And three is now the top[br]of the, uh, stack. 0:03:21.768,0:03:23.069 Uh, after you've removed three, 0:03:23.069,0:03:24.938 the next pop or removal[br]will get you five, 0:03:24.938,0:03:26.639 uh, and so on and so forth. 0:03:26.639,0:03:28.541 After that you'll get[br]8 and then 0 and 7. 0:03:28.541,0:03:31.444 Okay, so, once again,[br]the removal always returns you 0:03:31.444,0:03:33.146 the last item that was added. 0:03:33.146,0:03:35.181 And, uh, insertion or push[br]adds an item 0:03:35.181,0:03:37.584 right to the top, of the stack. 0:03:37.584,0:03:39.185 So these two data structures,[br]queue and stack 0:03:39.185,0:03:40.820 are used very widely[br]in computer science 0:03:40.820,0:03:44.424 and, um, in fact, we'll be using the notion of a stack, uh, 0:03:44.424,0:03:47.927 right away when we'll discuss, uh, uh, processes, um, 0:03:47.927,0:03:50.029 uh, soon in this particular orientation lecture, itself. 0:03:51.397,0:03:54.701 So, speaking of processes, um,[br]let's discuss a process next. 0:03:54.701,0:03:58.571 Uh, a process is essentially[br]a program in action. 0:03:58.571,0:04:00.673 Uh, you may write a program[br]in a language, 0:04:00.673,0:04:03.476 programming language,[br]such as, C or C++. 0:04:03.476,0:04:05.445 And that's sh-[br]an example is shown here. 0:04:05.445,0:04:08.648 This example code, here, uh,[br]consists of a main function, 0:04:08.648,0:04:12.251 which, uh, calls a function,[br]uh, called f1. 0:04:12.251,0:04:14.253 F1 itself has[br]a local integer variable x, 0:04:14.253,0:04:17.624 and then, f1 itself calls[br]a different function, f2. 0:04:18.625,0:04:20.692 Okay, this is shown[br]pictorially here, in the circle, 0:04:20.692,0:04:23.596 where main calls f1[br]and f1 calls f2. 0:04:23.596,0:04:24.964 'Kay, this is the code[br]that you would write, 0:04:24.964,0:04:26.499 and then, y'know,[br]you would compile the code 0:04:26.499,0:04:28.368 and then, uh, execute it. 0:04:28.368,0:04:31.137 And when you're executing it,[br]when your program is in action, 0:04:31.137,0:04:33.139 that is a process. 0:04:33.139,0:04:35.041 So, a process consists[br]of your code, 0:04:35.041,0:04:38.611 but also several other elements that, uh, help your code 0:04:38.611,0:04:41.280 to execute on the computer,[br]on which you are executing it. 0:04:41.280,0:04:43.483 So, we are going[br]to discuss some, uh, elements 0:04:43.483,0:04:45.451 o-of a process[br]which are critical. 0:04:46.319,0:04:47.520 Here, are some[br]of these elements. 0:04:47.520,0:04:50.156 First, um, uh,[br]let's look at the code. 0:04:50.156,0:04:52.392 The code itself, uh,[br]is, uh, static. 0:04:52.392,0:04:54.894 Once you write the code, um,[br]nothing changes it, 0:04:54.894,0:04:57.664 uh, we are not considering[br]the variable values 0:04:57.664,0:04:59.032 to be a part of the code. 0:04:59.032,0:05:00.433 I'll talk about that[br]in a little bit. 0:05:00.433,0:05:01.601 But the code itself, is static; 0:05:01.601,0:05:04.037 it does not change[br]once you write it. 0:05:04.037,0:05:06.339 Uh, but, uh,[br]there is a program counter, 0:05:06.339,0:05:08.141 which is typically maintained[br]by the computer 0:05:08.141,0:05:10.943 on which you're running your, uh, process, uh, 0:05:10.943,0:05:11.944 which points to 0:05:11.944,0:05:14.113 which is the line number[br]inside the code, 0:05:14.113,0:05:16.215 where the program[br]is currently executing, 0:05:16.215,0:05:18.618 or rather where, the process[br]is currently executing. 0:05:18.618,0:05:21.020 'Kay, this is called[br]a Program Counter, or the PC. 0:05:21.020,0:05:22.055 Okay, this is a part, 0:05:22.055,0:05:24.290 an important part[br]of the process state. 0:05:24.290,0:05:25.725 So, if you were[br]to take a core dump, 0:05:25.725,0:05:27.226 or stop the process right now, 0:05:27.226,0:05:28.895 and take a snapshot[br]of that process. 0:05:28.895,0:05:31.164 That snapshot must include,[br]the program counter, 0:05:31.164,0:05:33.399 because it says,[br]where in the code is, 0:05:33.399,0:05:35.168 uh, the execution currently at. 0:05:36.769,0:05:40.406 Next, uh, when, um, er-r,[br]um, functions call each other, 0:05:40.406,0:05:44.544 um, or in an object oriented, um, uh, program, 0:05:44.544,0:05:47.213 uh, methods call each other, they use a stack. 0:05:47.213,0:05:49.215 So every process contains,[br]uh, a stack. 0:05:49.215,0:05:51.884 Uh, more specifically,[br]um, a proce- 0:05:51.884,0:05:54.087 a process might contain[br]multiple threads. 0:05:54.087,0:05:56.222 Each thread will contain[br]its own stack. 0:05:56.222,0:05:57.090 But, let's just say 0:05:57.090,0:05:58.691 there is one thread[br]in this process. 0:05:58.691,0:06:00.660 Uh, so, a process[br]contains one stack, 0:06:00.660,0:06:03.463 and this stack is used[br]by these functions or methods, 0:06:03.463,0:06:07.066 to pass arguments and return, uh, the result values. 0:06:07.066,0:06:09.402 So, for instance,[br]when main calls f1, 0:06:09.402,0:06:13.339 main will push the arguments, uh, to f1, on top of the stack. 0:06:13.339,0:06:15.141 When f1 starts executing 0:06:15.141,0:06:16.409 it will pop, or remove, 0:06:16.409,0:06:17.744 the elements from the top[br]of the stack 0:06:17.744,0:06:20.113 and use it to, uh, execute. 0:06:20.113,0:06:23.516 Uh, similarly, uh,[br]when f1 calls f2, 0:06:23.516,0:06:28.354 um, f1 will push the arguments[br]on top of the stack, for f2, 0:06:28.354,0:06:31.390 and then f2 will, um, uh,[br]pop them, execute, 0:06:31.390,0:06:34.260 and then push the result values on top of the stack. 0:06:34.260,0:06:36.162 The result values[br]then popped by f1. 0:06:36.162,0:06:39.132 This is the way in which f1[br]gets back the result, 0:06:39.132,0:06:42.635 um, the final int return value from f2, to itself. 0:06:42.635,0:06:46.506 And finally when f-f1 needs[br]to return a resolved value to, 0:06:46.506,0:06:48.775 uh, main, it pushes it[br]on top of the stack 0:06:48.775,0:06:51.010 when the execution[br]returns to main, 0:06:51.010,0:06:52.545 uh, main, uh, pops it 0:06:52.545,0:06:54.347 or removes it from[br]the top of the stack 0:06:54.347,0:06:57.717 and main has[br]the final return value from f1. 0:06:57.717,0:06:59.018 'Kay, so, the stack 0:06:59.018,0:07:00.720 is an important part[br]of the process state as well, 0:07:00.720,0:07:02.989 because it tells you where[br]in the program execution 0:07:02.989,0:07:05.091 you are, with respect[br]to functions calling each other. 0:07:06.859,0:07:08.094 Finally, uh, y'know, uh, 0:07:08.094,0:07:09.729 functions have[br]local variable, like x. 0:07:09.729,0:07:12.031 There might also be[br]global variables, um, 0:07:12.031,0:07:13.933 uh, and of course in object oriented programs, 0:07:13.933,0:07:15.434 you have, uh, objects, 0:07:15.434,0:07:17.403 which store a lot[br]of fields in them. 0:07:17.403,0:07:19.071 A lot of these, uh,[br]pieces of data, 0:07:19.071,0:07:20.940 are stored in,[br]what is known as a heap. 0:07:20.940,0:07:23.442 Okay, so, uh, a heap[br]is essentially a lot of data, 0:07:23.442,0:07:26.312 uh, that was created[br]by functions or methods, 0:07:26.312,0:07:29.382 uh, or, uh, um, by objects, um. 0:07:29.382,0:07:33.119 And it holds all of the data[br]and, y'know, when an object, 0:07:33.119,0:07:34.687 when a function's executing, 0:07:34.687,0:07:36.322 uh, say f1 is executing, 0:07:36.322,0:07:39.158 uh, the object x might be created inside the heap, 0:07:39.158,0:07:40.326 when f1 is done executing. 0:07:40.326,0:07:43.930 X, uh, the object of x[br]will be removed from the heap. 0:07:43.930,0:07:45.398 Similarly,[br]there're also registers, 0:07:45.398,0:07:46.632 and I'll talk more[br]about registers 0:07:46.632,0:07:48.201 when we discuss[br]computer architecture, 0:07:48.201,0:07:49.869 but registers hold some[br]of the, uh, recent values 0:07:49.869,0:07:52.805 that have been accessed, uh,[br]by the process, 0:07:52.805,0:07:53.472 p1, in this case. 0:07:55.441,0:07:57.310 Okay, uh, speaking[br]of computer architecture. 0:07:57.310,0:07:58.511 Let's discuss[br]a simplified version 0:07:58.511,0:07:59.846 of computer architecture. 0:07:59.846,0:08:02.615 Computer architectures today,[br]uh, can be very complicated. 0:08:02.615,0:08:04.517 But, um, typically when we ta- 0:08:04.517,0:08:06.786 when we think of how[br]a process executes, 0:08:06.786,0:08:08.487 in a computer architecture, 0:08:08.487,0:08:10.022 we can think[br]of a simplified view. 0:08:10.022,0:08:11.023 So there is a CPU, 0:08:11.023,0:08:12.625 which executes[br]the action instructions, 0:08:12.625,0:08:14.794 which are present[br]in your, uh, code. 0:08:14.794,0:08:16.162 There're also a bunch[br]of registers, 0:08:16.162,0:08:18.064 that are co-located,[br]uh, with the CPU. 0:08:18.064,0:08:20.199 These are, uh,[br]small pieces of memory, 0:08:20.199,0:08:22.835 that can be accessed[br]very quickly by the CPU. 0:08:22.835,0:08:25.471 Typically, uh, there's only[br]a small number of registers. 0:08:25.471,0:08:27.773 Typically, not more than[br]a few tens of registers. 0:08:27.773,0:08:29.308 Uh, there's also a cache, 0:08:29.308,0:08:31.043 which is,[br]a slightly larger memory 0:08:31.043,0:08:32.678 than the collection[br]of registers. 0:08:32.678,0:08:34.746 And the larger[br]a map piece of memory is, 0:08:34.746,0:08:36.482 uh, the slower[br]it is to access it. 0:08:36.482,0:08:38.717 So accessing a cache[br]is slower than registers, 0:08:38.717,0:08:40.318 uh, than accessing registers. 0:08:40.318,0:08:43.389 But accessing a cache[br]is still pretty fast. 0:08:43.389,0:08:45.524 Uh, then beyond the cache[br]there is the main memory, 0:08:45.524,0:08:47.960 or the Random Access Memory,[br]or the RAM. 0:08:47.960,0:08:49.829 Uh, which is even larger[br]than the cache, 0:08:49.829,0:08:51.664 uh, and the ca-[br]the memory itself, 0:08:51.664,0:08:53.232 therefore, is slower[br]than the cache, 0:08:53.232,0:08:54.700 but still, it's pretty fast. 0:08:54.700,0:08:56.002 And then finally,[br]there's hard disk, 0:08:56.002,0:08:58.738 or if you would like,[br]a solid stave-state drive, 0:08:58.738,0:09:00.406 a flash, uh, drive, or whatever, 0:09:00.406,0:09:03.376 which is, a lot more memory than, uh, the main memory, 0:09:03.376,0:09:05.378 uh, but of course,[br]it's much slower. 0:09:05.378,0:09:08.547 So, as you go up from disk,[br]to memory, to cache, registers, 0:09:08.547,0:09:09.749 the speed increases, 0:09:09.749,0:09:11.384 the total amount[br]of, uh, storage space 0:09:11.384,0:09:12.418 that you have, decreases. 0:09:14.687,0:09:15.788 So, when you write a program, 0:09:15.788,0:09:19.025 such as, C++, or Java, or C,[br]and you compile it, 0:09:19.025,0:09:21.560 it gets compiled to low level machine instructions. 0:09:21.560,0:09:22.628 These machine instructions, 0:09:22.628,0:09:24.630 uh, might be specific[br]to the architecture, 0:09:24.630,0:09:25.898 the machine on which[br]you are running, 0:09:25.898,0:09:27.633 or it might be a, um, 0:09:27.633,0:09:30.670 ah, virtual machine,[br]like a JVM, kind of, um, code. 0:09:30.670,0:09:33.039 In any case, uh, these low level machine instructions 0:09:33.039,0:09:35.808 are the executable version[br]of your, uh, program, 0:09:35.808,0:09:38.778 and this is stored[br]in file system, um, uh, 0:09:38.778,0:09:41.280 in the file system on-[br]on your disk. 0:09:41.280,0:09:42.648 When your program[br]starts executing, 0:09:42.648,0:09:44.116 when it becomes a process, 0:09:44.116,0:09:46.852 uh, the CPU loads instructions in batches, 0:09:46.852,0:09:48.821 or, m-, in a simplifi-[br]more simplified way, 0:09:48.821,0:09:51.457 it loads instructions[br]one by one. 0:09:51.457,0:09:52.959 Uh, it loads them into memory, 0:09:52.959,0:09:55.261 and then into cache,[br]and into registers. 0:09:55.261,0:09:57.163 Typically, the cache[br]and the registers contain 0:09:57.163,0:09:59.398 the last few[br]accessed instructions. 0:09:59.398,0:10:01.600 Um, so they are, uhm, 0:10:01.600,0:10:02.568 they follow what is known as, 0:10:02.568,0:10:04.937 the least recently[br]used principle. 0:10:04.937,0:10:07.673 Now, as it executes y-[br]each instruction, the CPU, 0:10:07.673,0:10:09.842 which is executing the process, loads the data, 0:10:09.842,0:10:11.610 that is necessary[br]for the instruction, 0:10:11.610,0:10:13.312 into the memory[br]and then if necessary 0:10:13.312,0:10:15.348 into the cache[br]and the registers. 0:10:15.348,0:10:18.217 Um, and if there are[br]any necessary, uh, changes 0:10:18.217,0:10:20.019 that happen[br]to a variable like, x, 0:10:20.019,0:10:23.356 then, these are stored[br]into a memory, um, um, 0:10:23.356,0:10:25.191 uh, first into cache[br]and then into memory. 0:10:26.692,0:10:28.761 Once in a while, you may need[br]to store something on disks, 0:10:28.761,0:10:31.530 so that you have, um,[br]a permanent storage. 0:10:31.530,0:10:33.499 So in case,[br]the process goes offline, 0:10:33.499,0:10:35.501 then you still have some data. 0:10:35.501,0:10:37.403 Say for instance,[br]the program might open a file 0:10:37.403,0:10:38.571 and write something[br]into the file. 0:10:38.571,0:10:40.773 So, in this case,[br]uh, you may want to write those, 0:10:40.773,0:10:42.508 uh, updates to the file[br]onto the disk. 0:10:43.509,0:10:45.311 Now, of course this is a very highly simplified picture. 0:10:45.311,0:10:46.946 Computer architectures[br]can be far more, 0:10:46.946,0:10:48.914 uh, complex than this, today, 0:10:48.914,0:10:52.251 but for our practical purposes, in this course, and as far as, 0:10:52.251,0:10:55.955 uh, explaining how processes work, um, this will suffice. 0:10:58.758,0:11:00.593 So next, we move on[br]to a few mathematical topics. 0:11:00.593,0:11:03.429 The first mathematical topic is, a Big O notation. 0:11:03.429,0:11:04.363 A Big O notation is, 0:11:04.363,0:11:06.465 one of the most basic ways[br]of analyzing algorithms. 0:11:06.465,0:11:07.600 When I give you an algorithm, 0:11:07.600,0:11:10.002 and I say analyze[br]the algorithm mathematically, 0:11:10.002,0:11:12.471 essentially, I expect you to,[br]uh, come up with, 0:11:12.471,0:11:14.440 at least a Big O notation, 0:11:14.440,0:11:17.109 uh, and an analysis[br]of that algorithm. 0:11:17.109,0:11:18.044 The-the Big O notation, 0:11:18.044,0:11:20.646 describes an upper bound[br]in the behavior of an algorithm. 0:11:20.646,0:11:23.315 Um, as some variable is scaled 0:11:23.315,0:11:25.885 or increased in value[br]all the way to infinity. 0:11:25.885,0:11:28.888 Typically, this variable[br]is a size of the input. 0:11:28.888,0:11:30.823 Um, maybe the number[br]of input elements, 0:11:30.823,0:11:33.159 um, you'll see-sa-you'll see[br]a few examples of this. 0:11:34.393,0:11:38.197 So, essentially, this, uhh-oah,[br]Big O analysis tells us, 0:11:38.197,0:11:40.199 how does the, uh,[br]algorithm itself 0:11:40.199,0:11:43.769 scale as the size[br]of the input is increased? 0:11:43.769,0:11:45.638 Um, this analyzes the run time, 0:11:45.638,0:11:47.573 or typically, another performance metric, as well. 0:11:47.573,0:11:49.041 Most of the time,[br]we analyze run time, 0:11:49.041,0:11:50.543 sometimes, as you'll see[br]in the course, 0:11:50.543,0:11:52.845 we'll analyze things like[br]number of messages, 0:11:52.845,0:11:55.781 um, uh, even, uh, bandwidth, um, 0:11:55.781,0:11:59.185 uh, that is exchanged[br]i-in the total network itself. 0:11:59.185,0:12:00.619 The important thing about[br]Big O notation is that, 0:12:00.619,0:12:03.122 i-it captures[br]worst case performance. 0:12:03.122,0:12:04.890 Okay, so this is not[br]expected performance 0:12:04.890,0:12:06.258 or best case performance. 0:12:06.258,0:12:07.960 Big O is worst case performance. 0:12:07.960,0:12:09.028 Given this algorithm, 0:12:09.028,0:12:11.630 what is the worst that it can do in terms of, uh, 0:12:11.630,0:12:13.099 this performance metric[br]that we're measuring? 0:12:15.301,0:12:18.737 So, essentially, when I say,[br]an algorithm A is order of foo, 0:12:18.737,0:12:19.438 this means that, 0:12:19.438,0:12:23.642 Algorithm A takes < c foo[br]time to complete, 0:12:23.642,0:12:25.444 assuming time is the metric[br]we are measuring, 0:12:25.444,0:12:27.980 for some constant c,[br]a fixed constant c, 0:12:27.980,0:12:30.583 uh, beyond some input size N. 0:12:30.583,0:12:31.951 Okay? Typically, this foo 0:12:31.951,0:12:33.919 that is, uh, present[br]in the definition is s-, 0:12:33.919,0:12:36.255 uh, uh, some function[br]of input size N. 0:12:36.255,0:12:38.924 For instance, I could say that[br]an algorithm is order N, 0:12:38.924,0:12:41.961 which means that algorithm A[br]takes less than c times N 0:12:41.961,0:12:44.530 time to complete[br]for some constant c, 0:12:44.530,0:12:46.432 beyond some input size N. 0:12:46.432,0:12:48.300 Similarly the algorithm[br]might be order N^2. 0:12:48.300,0:12:51.337 Which means, that, uh,[br]it is quadratic in time, 0:12:51.337,0:12:53.405 and it takes time[br]less than c times N squared 0:12:53.405,0:12:56.242 for some constant c. 0:12:56.242,0:12:58.844 Once again, when we say order[br]N^2 or order N, 0:12:58.844,0:13:00.880 notice that we do not include the constant c, 0:13:00.880,0:13:01.881 in the definition itself. 0:13:01.881,0:13:03.516 This is because,[br]the constant is irrelevant, 0:13:03.516,0:13:05.384 as long as there[br]is a fixed constant, 0:13:05.384,0:13:06.986 some fixed constant. 0:13:06.986,0:13:09.054 That is good enough for us[br]as far as Big O notation 0:13:09.054,0:13:11.023 is, uh, concerned. 0:13:11.023,0:13:12.892 Okay, so we do not state[br]the constants in Big O notation. 0:13:14.059,0:13:15.261 So let's see[br]a couple of examples. 0:13:15.261,0:13:18.297 Um, so first,[br]I give you an unsorted list 0:13:18.297,0:13:19.665 o-of, say, integers. 0:13:19.665,0:13:21.467 And I ask you to search[br]for a specific element 0:13:21.467,0:13:22.768 in the list. 0:13:22.768,0:13:24.436 This essentially means,[br]that you have to iterate through 0:13:24.436,0:13:25.971 the list, one item at a time. 0:13:25.971,0:13:28.040 And search for the element[br]by trying to match it 0:13:28.040,0:13:30.643 with each element quality[br]present, in the list. 0:13:30.643,0:13:32.478 What is the worst[br]case performance? 0:13:32.478,0:13:34.280 That's what we need,[br]for Big O notation, right? 0:13:34.280,0:13:35.414 So, the worst case performance[br]happens when, 0:13:35.414,0:13:38.050 either element is not there,[br]at all, in the list, 0:13:38.050,0:13:39.385 so, you return a false. 0:13:39.385,0:13:42.588 Uh, or the element is the very[br]last one, in the list. 0:13:42.588,0:13:43.856 The very last one,[br]that you match against. 0:13:44.924,0:13:45.891 Therefore, the[br]worst case performance 0:13:45.891,0:13:47.259 involves N operations, 0:13:47.259,0:13:52.064 and the number of operations involved is < c* N, where c=2, 0:13:52.064,0:13:53.766 because the number[br]of operations is N, 0:13:53.766,0:13:54.833 where N is the size of the list. 0:13:55.968,0:13:58.103 Okay, so, the time as well as,[br]the number of operations, 0:13:58.103,0:14:01.040 uh, assuming each operation[br]takes one unit of time. 0:14:01.040,0:14:05.578 Both the time taken for this,[br]uh, um, uh, for this algorithm, 0:14:05.578,0:14:06.478 which iterates through the list. 0:14:06.478,0:14:09.949 As a list number[br]of operations are both order N. 0:14:09.949,0:14:11.217 That's why we see[br]order N over here. 0:14:13.285,0:14:14.253 Let's take[br]a different, uh, example 0:14:14.253,0:14:16.522 to see how[br]Big O notation works. 0:14:16.522,0:14:18.958 This leads with insertion[br]sorting of an unsorted list. 0:14:18.958,0:14:21.760 So suppose I give you a list[br]of elements, uh, say integers, 0:14:21.760,0:14:25.297 and this is unsorted[br]and I ask you to sort them, 0:14:25.297,0:14:29.001 say in increasing order of,[br]uh, the, uh, elements. 0:14:29.001,0:14:30.836 The insertion sorter algorithm[br]works as follows; 0:14:30.836,0:14:33.272 first, you create a new list, that is empty. 0:14:33.272,0:14:34.974 Then, you iterate[br]through the elements 0:14:34.974,0:14:36.775 in the unsorted original list, 0:14:36.775,0:14:38.210 you remove[br]one element at a time, 0:14:38.210,0:14:39.778 and you insert the element, 0:14:39.778,0:14:41.714 in the new list[br]at the appropriate position. 0:14:41.714,0:14:45.384 Essentially, you sort through[br]the new list, uh, linearly, 0:14:45.384,0:14:46.752 uh, searching[br]for the right position 0:14:46.752,0:14:49.088 where this element, should be. 0:14:49.088,0:14:50.322 So, if you do this, 0:14:50.322,0:14:52.458 the first element, of course, takes one operation to insert. 0:14:52.458,0:14:54.059 That's one insert operation. 0:14:54.059,0:14:55.694 The second element,[br]in the worst case, 0:14:55.694,0:14:57.129 takes two operations to insert. 0:14:57.129,0:14:58.530 One operation[br]for the comparison, 0:14:58.530,0:15:00.399 with the one element that's already present, 0:15:00.399,0:15:01.834 in the new sorted list 0:15:01.834,0:15:03.369 and one more operation[br]to insert. 0:15:03.369,0:15:05.104 Imagine the case,[br]the worst case, 0:15:05.104,0:15:06.772 where, this second element, 0:15:06.772,0:15:08.574 is inserted right[br]at the very end, of the list. 0:15:09.775,0:15:11.577 Similarly, the third element,[br]will take, uh, 0:15:11.577,0:15:12.578 three operations to insert, 0:15:12.578,0:15:13.612 where the first two elements, 0:15:13.612,0:15:15.848 compared with the two[br]already existing elements, 0:15:15.848,0:15:18.050 the third, uh, uh, operation, 0:15:18.050,0:15:21.987 uh, inserts it at the very end of, uh, the, ah, 0:15:21.987,0:15:23.322 uh, of the list. 0:15:23.322,0:15:24.223 If, you go on like this 0:15:24.223,0:15:26.292 and the i-th element[br]takes i operations to insert, 0:15:26.292,0:15:28.894 and you can calculate that[br]the total time required 0:15:28.894,0:15:32.665 to insert all the, m, elements is 1+2+3, so on until N. 0:15:32.665,0:15:36.769 And this is a well-known sum[br]which comes to N(N+1)/2, 0:15:36.769,0:15:39.405 which is, uh, <1*N^2. 0:15:39.405,0:15:42.074 So, with a value[br]of C equals one, uh, this, uh, 0:15:42.074,0:15:44.276 insertion sorting example, 0:15:44.276,0:15:47.579 is an algorithm that takes[br]order N^2 in time, 0:15:47.579,0:15:49.815 and also order N^2[br]in number of operations. 0:15:55.020,0:15:57.022 Next, we look at a little bit[br]of basic probability, 0:15:57.022,0:15:58.424 uh, before probability, 0:15:58.424,0:15:59.825 we need to discuss,[br]few definitions. 0:15:59.825,0:16:01.994 Uh, the first definition[br]is a set. 0:16:01.994,0:16:04.396 A set is, essentially,[br]a collection of things. 0:16:04.396,0:16:06.632 So for instance, a set S[br]might be the set of all humans, 0:16:06.632,0:16:09.368 who live in this world,[br]at this current moment. 0:16:09.368,0:16:12.004 A subset, is another set,[br]which is a collection of things, 0:16:12.004,0:16:14.173 that is a part[br]of the larger set. 0:16:14.173,0:16:17.042 So, I might design a-, uh,[br]da, uh, describe a set S2, 0:16:17.042,0:16:18.110 which says, 0:16:18.110,0:16:21.146 a set of all humans who live, currently, in Europe, today. 0:16:21.146,0:16:22.514 S2 is a subset of S, 0:16:22.514,0:16:25.117 because every element[br]that is present in S2 0:16:25.117,0:16:26.452 is also present in S, 0:16:26.452,0:16:28.053 but the reverse[br]is not necessarily true. 0:16:30.989,0:16:33.859 Okay, so now, um, probability, 0:16:33.859,0:16:37.029 any-any event, uh,[br]has a probability of happening. 0:16:37.029,0:16:40.199 Okay, so, umm,[br]let me give you an example. 0:16:40.199,0:16:42.368 If you wake up[br]at a random hour of the day, 0:16:42.368,0:16:44.937 um, so you just wake up[br]at some random hour of the day, 0:16:44.937,0:16:46.338 say you're jet lagged[br]or something. 0:16:46.338,0:16:49.375 Uh, and, uh, I ask you,[br]what is the probability, 0:16:49.375,0:16:52.244 that the event at that time[br]is between 10 a.m. and 11 a.m.? 0:16:52.244,0:16:53.045 Let's try to calculate this. 0:16:54.179,0:16:55.881 Well, there are[br]24 hours in a day, 0:16:55.881,0:16:58.884 so you have a set[br]that consists of 24 elements, 0:16:58.884,0:17:01.854 one for each hour, 12 a.m.,[br]1 a.m., 2 a.m., 0:17:01.854,0:17:04.990 all the way til,[br]10 a.m., 11 a.m., 11 p.m. 0:17:04.990,0:17:07.559 'Kay, so when I say[br]an element like, uh, 1 a.m. 0:17:07.559,0:17:10.996 I mean, the hour[br]between 1 a.m. and 1:59 am. 0:17:10.996,0:17:14.133 Similarly, 10 a.m. means,[br]10 a.m. and 10:59 a.m. 0:17:15.267,0:17:18.470 Out of the set of 24 elements,[br]you pick one, at random. 0:17:18.470,0:17:20.071 And the probability that[br]you're going to pick 10 a.m., 0:17:20.071,0:17:22.007 is essentially, 1 divided by 24, 0:17:22.007,0:17:24.108 because you're picking[br]one element at random 0:17:24.108,0:17:26.744 and there are 24 elements,[br]uh, in the set, 0:17:26.744,0:17:28.747 and the probability[br]that you pick, 0:17:28.747,0:17:32.518 um, that one special element,[br]10 a.m., is 1 over 24, 0:17:32.518,0:17:34.720 because you-you wanna pick,[br]you wanna be lucky, 0:17:34.720,0:17:37.289 o-of that, uh,[br]to pick that, uh, 10. 0:17:37.289,0:17:38.690 Okay, so,[br]the probability that you, 0:17:38.690,0:17:40.192 if you wake up[br]at a random hour of the day, 0:17:40.192,0:17:42.261 the time is between[br]10 a.m. and 11 a.m., 0:17:42.261,0:17:43.796 the-that probability[br]is 1 over 24. 0:17:46.765,0:17:48.567 Now, uh, there might be[br]multiple events 0:17:48.567,0:17:50.569 that you might need to calculate[br]the probability of, 0:17:50.569,0:17:52.638 uh, conjunctions or disjunctions of these events. 0:17:52.638,0:17:53.806 I'll describe[br]what these are, soon. 0:17:53.806,0:17:56.208 So say, E1 is one event[br]and E2 is another event, 0:17:56.208,0:17:58.644 and E1 and E-E2 are[br]independent of each other. 0:17:58.644,0:17:59.678 This means that, 0:17:59.678,0:18:02.381 E1 does not influence E2,[br]E2 does not influence E1. 0:18:02.381,0:18:04.850 Then the probability[br]that E1 and E2 both happen, 0:18:04.850,0:18:06.185 is essentially, 0:18:06.185,0:18:08.053 the multiplication[br]of these probabilities. 0:18:08.053,0:18:09.721 So you take[br]the probability of E1, 0:18:09.721,0:18:11.423 multiply it by[br]the probability of E2, 0:18:11.423,0:18:12.424 and that gives you, 0:18:12.424,0:18:15.227 the probability that[br]both E1 and E2 are true. 0:18:15.227,0:18:16.495 Let's discuss[br]an example of this. 0:18:16.495,0:18:17.930 Suppose you have three shirts. 0:18:17.930,0:18:20.065 They're colored[br]blue, green, and red. 0:18:20.065,0:18:23.101 And also you wake up at a random[br]hour and blindly pick a shirt, 0:18:23.101,0:18:25.504 you put your, uh, hand[br]into your, uh, closet, 0:18:25.504,0:18:27.606 and you blindly[br]pick out a shirt. 0:18:27.606,0:18:28.640 What is the probability that, 0:18:28.640,0:18:30.776 he woke up between[br]10 a.m. and 11 a.m. 0:18:30.776,0:18:34.046 and, that he picked[br]a green shirt to wear? 0:18:34.046,0:18:36.181 Well, the first probability[br]of the first event, 0:18:36.181,0:18:37.483 that he woke up[br]between 10 and 11, 0:18:37.483,0:18:38.951 is essential, 1 over 24, 0:18:38.951,0:18:40.853 like we calculated[br]on the previous slide. 0:18:40.853,0:18:43.055 The probability that you pick[br]the green shirt, is essentially, 0:18:43.055,0:18:45.390 one out of three, because there[br]are three colors of shirts, 0:18:45.390,0:18:46.525 uh, you have three shirts 0:18:46.525,0:18:48.660 and, uh, you want[br]to pick the green one, 0:18:48.660,0:18:49.862 you want to be lucky,[br]to pick the green one, 0:18:49.862,0:18:51.296 so that's one over three. 0:18:51.296,0:18:53.499 And so, the probability that[br]both these events are true, 0:18:53.499,0:18:54.867 that you woke up[br]between 10 and 11, 0:18:54.867,0:18:56.869 and that you wore[br]the green shirt 0:18:56.869,0:18:58.370 is the multiplication[br]of the probabilities 0:18:58.370,0:18:59.538 of those events. 0:18:59.538,0:19:03.542 So it's (1/24)*(1/3)=1/72. 0:19:05.010,0:19:05.744 One of the things, 0:19:05.744,0:19:06.712 that you have[br]to be careful about here, 0:19:06.712,0:19:09.548 is that, uh, if E1 and E2[br]are not independent, 0:19:09.548,0:19:11.683 meaning that they influence[br]each other in some way, 0:19:11.683,0:19:15.420 uh, then, you cannot multiply, uh, the, uh, probabilities 0:19:15.420,0:19:16.288 with, uh, each other. 0:19:16.288,0:19:20.459 Okay, so, for instance,[br]um, if, uh, 0:19:20.459,0:19:22.461 the shirts that you have[br]in your closet, 0:19:22.461,0:19:25.030 uh, change from one hour[br]to another, because, uh, 0:19:25.030,0:19:26.431 um, say, uh, 0:19:26.431,0:19:28.300 someone in your house[br]changes them around, 0:19:28.300,0:19:30.602 then you can't necessarily multiply these probabilities, 0:19:30.602,0:19:31.436 as they are here. 0:19:32.771,0:19:34.139 A different thing[br]that we need to do 0:19:34.139,0:19:36.875 with two event probabilities is,[br]uh, to OR them. 0:19:36.875,0:19:38.944 So, if, I give you two events,[br]E1 and E2 0:19:38.944,0:19:40.546 and I ask you[br]the probability that, 0:19:40.546,0:19:43.248 uh, of either E1 or E2 happening 0:19:43.248,0:19:46.285 then, you can say that[br]Prob(E1 OR E2)= 0:19:46.285,0:19:51.290 Prob(E1)+Prob(E2)-[br]Prob(E1 AND E2). 0:19:51.290,0:19:53.458 Okay, so, this is the intersection probability here. 0:19:53.458,0:19:55.827 If, you do not know[br]probability of E1 and E2, 0:19:55.827,0:19:57.162 this last term over here, 0:19:57.162,0:20:00.599 than, you can write the inequality Prob(E1 OR E2)<= 0:20:00.599,0:20:03.402 the sum of the constituent probabilities, okay? 0:20:04.570,0:20:06.872 Sometimes we use these[br]inequalities to upper bound, 0:20:06.872,0:20:10.676 uh, the, uh, probabilities[br]of the, um, uh, disjunctions, 0:20:10.676,0:20:11.677 as we see here. 0:20:13.545,0:20:14.980 Okay, let's disces-discuss[br]a couple of-of, 0:20:14.980,0:20:16.214 a few miscellaneous topics. 0:20:16.214,0:20:18.116 The first miscellaneous topic[br]is, a Domain Name System, 0:20:18.116,0:20:19.184 or known as DNS. 0:20:19.184,0:20:20.185 This is a collection of servers, 0:20:20.185,0:20:22.220 that are spread throughout[br]the world and, uh, 0:20:22.220,0:20:23.221 the DNS is very critical 0:20:23.221,0:20:25.591 to the operation[br]of the, uh, web. 0:20:25.591,0:20:26.692 Uh, typically the input 0:20:26.692,0:20:30.095 to the DU-to the DNS system[br]is a URL, say coursera.org. 0:20:30.095,0:20:31.563 Uh, the URL is a name, 0:20:31.563,0:20:32.764 it's a human readable string, 0:20:32.764,0:20:35.000 that uniquely identifies[br]the object. 0:20:35.000,0:20:36.468 Uh, typically,[br]when you go to your browser, 0:20:36.468,0:20:39.504 whether it's Safari, Chrome, Internet Explorer, Netscape, 0:20:39.504,0:20:41.840 Firefox, or whatever[br]your favorite browser is. 0:20:41.840,0:20:44.576 Uh, you might enter a URL[br]like coursera.org, 0:20:44.576,0:20:46.178 the moment you hit Enter[br]on your browser, 0:20:46.178,0:20:49.081 your browser goes off[br]and contacts a DNS, the DNS, 0:20:49.081,0:20:52.250 the DNS system, and, uhm, uh, gives the DNS system 0:20:52.250,0:20:55.721 the name of this URL,[br]which is coursera.org. 0:20:55.721,0:20:57.289 What does the DNS[br]give back to your browser? 0:20:57.289,0:20:59.224 It gives back the IP address[br]of a web server, 0:20:59.224,0:21:01.793 that hosts that content,[br]so that your machine, 0:21:01.793,0:21:04.296 your browser can tho-[br]then go and send a request, 0:21:04.296,0:21:05.631 to that IP address, 0:21:05.631,0:21:08.634 and fetch the actual content[br]of that web page. 0:21:08.634,0:21:09.935 So, the IP address is an ID, 0:21:09.935,0:21:12.371 it's a unique string that points to that particular object. 0:21:12.371,0:21:14.706 For instance,[br]the coursera.org, uh, web page. 0:21:14.706,0:21:18.577 Uh, and the, um, the ID itself[br]may not be human readable. 0:21:18.577,0:21:20.912 So, essentially, a DNS[br]is a system that translates 0:21:20.912,0:21:22.914 between a human readable name, 0:21:22.914,0:21:25.884 such as, a URL,[br]to, uh, to, uh, unique ID, 0:21:25.884,0:21:27.986 um, which may[br]or may not be human readable, 0:21:27.986,0:21:29.454 like the IP address. 0:21:30.656,0:21:32.224 Uh, in this case, IP address[br]may refer to either, 0:21:32.224,0:21:33.959 the actual web server[br]that is storing the content, 0:21:33.959,0:21:35.527 or maybe an indirection server 0:21:35.527,0:21:37.162 such as, a content distribution network server, 0:21:37.162,0:21:40.132 or, uh, such as,[br]an Akamai server that is, 0:21:40.132,0:21:42.834 uh, hosting or, uh, caching[br]the content on your behalf. 0:21:45.470,0:21:46.605 Uh, graphs, 0:21:46.605,0:21:48.840 um, so when we say graphs,[br]in computer science, 0:21:48.840,0:21:51.943 we don't necessarily mean plots,[br]which have x and y axes. 0:21:51.943,0:21:54.079 Uh, a graph,[br]is essentially, a network. 0:21:54.079,0:21:57.149 So here I am showing you a graph[br]of some cities, 0:21:57.149,0:22:00.318 um, Amsterdam, Boston,[br]Chicago, Delhi, and Edmonton, 0:22:00.318,0:22:03.689 and the travel times,[br]uh, between those cities by air, 0:22:03.689,0:22:06.291 um, rounded up[br]to the nearest hour. 0:22:06.291,0:22:08.894 Okay, so, uh, traveling[br]from Amsterdam to Boston 0:22:08.894,0:22:09.961 takes about six hours. 0:22:09.961,0:22:11.496 Traveli-traveling[br]from Chicago to Delhi 0:22:11.496,0:22:12.831 takes about 14 hours. 0:22:12.831,0:22:14.499 Say these are the flights[br]that are available 0:22:14.499,0:22:16.468 on some particular airline. 0:22:16.468,0:22:17.736 Right? 0:22:17.736,0:22:19.304 So, you see both cities here,[br]as well as, 0:22:19.304,0:22:21.406 uh, connections between,[br]uh, pairs of cities, 0:22:21.406,0:22:25.577 uh, and also, um, some values along those connections, 0:22:25.577,0:22:27.579 which are, the number of hours it takes to transit 0:22:27.579,0:22:28.346 between those cities. 0:22:29.581,0:22:31.183 Uh, these all have names,[br]in the terms, 0:22:31.183,0:22:32.217 in the world of graphs. 0:22:32.217,0:22:35.253 So, each city,[br]would be a node or a vertex. 0:22:35.253,0:22:38.123 Uh, each connection[br]between a pair of nodes, 0:22:38.123,0:22:40.158 would be called an edge, 0:22:40.158,0:22:44.096 and the value along an edge[br]is called the edge weight. 0:22:44.096,0:22:46.832 Also, when I say[br]that A is adjacent to B, 0:22:46.832,0:22:51.636 this means that A has an edge that connects it to B, directly. 0:22:51.636,0:22:53.405 Typically, an edge[br]only goes between, 0:22:53.405,0:22:55.941 uh, uh two nodes[br]or two vertices. 0:22:55.941,0:22:58.076 An edge does not cross[br]three, uh, nodes. 0:22:58.076,0:23:03.181 So here, for instance, I have,[br]um, uh, 1, 2, 3, 4, 5 vertices, 0:23:03.181,0:23:04.483 and how many edges do I have? 0:23:04.483,0:23:08.987 I have an A-E edge, an A-B edge,[br]a B-E edge, a B-C edge 0:23:08.987,0:23:09.888 and a C-D edge, 0:23:09.888,0:23:12.457 so, I also have, uh, five,[br]uh, different edges, 0:23:12.457,0:23:14.626 each with its own,[br]uh, edge weight. 0:23:17.662,0:23:19.331 So, in this, uh, lecture,[br]we have covered, 0:23:19.331,0:23:22.000 the basic data structure,[br]such as, uh, queue and a stack, 0:23:22.000,0:23:24.436 we have seen that processes[br]are programs in action, 0:23:24.436,0:23:26.805 and that they consist[br]of multiple different pieces, 0:23:26.805,0:23:28.940 uh, processes of course are under computer architecture, 0:23:28.940,0:23:30.041 and it's important[br]for you to know, 0:23:30.041,0:23:32.744 at least, a simplified version of the computer architecture, 0:23:32.744,0:23:35.080 so that you know what's,[br]ah, operating where, 0:23:35.080,0:23:36.648 uh, when you run a process. 0:23:36.648,0:23:37.816 We've seen some[br]mathematical topics, 0:23:37.816,0:23:38.950 such as, Big O notation 0:23:38.950,0:23:40.418 which, analyzes the worst[br]case performance 0:23:40.418,0:23:41.920 of any given algorithm, 0:23:41.920,0:23:43.588 and then some basic probability. 0:23:43.588,0:23:46.057 And finally, we have seen[br]a few miscellaneous topics. 0:23:46.057,0:23:48.093 I hope you can come back-[br]keep coming back to this, 0:23:48.093,0:23:49.494 as a reference,[br]whenever you need it, 0:23:49.494,0:23:51.329 uh, throughout,[br]uh, the, uh, course.