Welcome, uh, to this, uh,
cloud computing concepts course.

In this, uh, lecture we'll be
covering several, uh, concepts,

uh, that are assumed
as a part of, uh, the topics

that we'll be discussing

in the cloud computing concepts, uh, course.

So, we'll be covering several
of the basics, uh,

they're considered to be basics
in computer science.

Uh, typically,

some of these concepts,
are concepts that, uh,

computer science students learn
in their first couple of years

in, uh,
a computer science degree,

undergraduate degree program.

So for those of you,

who are already familiar
with this material,

uh, this could be a refresher
or you could just, uh, skip it,

if you already know,
uh, this stuff.

In any case, uh,
you can always use this

orientation lecture
as a reference,

uh, to keep coming back to if, you encounter a, uh, term in,

uh, during the regular lectures that doesn't make sense to you,

or that you'd like
to be reminded about.

So, we'll discuss a few topics.

We'll discuss
some basic data structures,

uh, what processes are.

We'll discuss a little bit about
computer architecture,

and then, um, a little bit
of math with Big O notation,

basic probability

and a few other
miscellaneous topics.

So, let's get started here.

So, bas-basic data structures,

we'll discuss two di-different data structures.

The first one is a queue.

A queue, is essentially,

a First-in First-out
data structure.

Elements of the queue could
be, uh, any data types.

In this example, here, I have
elements that are in tiers.

So, in this, uh, example right
now, the queue has 5 elements,

3, 5, 8, 0, 7.

This is an ordered,
list of elements.

When you remove an element
from the queue,

you remove it from the head
of the queue.

In this case, the head
is the element three.

Uh, when you insert
a new element,

you insert it in the tail
of the queue, which means that,

if, I insert a new element,
say one,

it's going to get inserted
right after seven in this queue.

So removal always comes
from the head,

while insertion always happens at the tail.

Uh, so, for instance,
given this particular queue,

consisting of five elements,
if you-if you,

dequeue or remove an item, uh, then that item will be three.

At that point of time
the queue will only consist

of 5, 8, 0, and 7,
with the head pointing to 5.

The next item that you dequeue
or remove will be five, okay?

After that the queue will consist of only 8, 0, and 7.

Then, uh, the next item
that you dequeue will be eight,

um, and so on and so forth.

Uh, dequeues and, or removals,
as well as, insertions,

uh, can occur, uh,
concurrently, so you can,

uh, remove an element,

but at the same time also insert an element at the tail.

So, that's a queue,

which is a First-in First-out data structure.

A different data structure
is the stack,

which is the First-in Last-out data structure.

Uh, imagine a stack of dishes
that you, uh,

place on your table.

Uh, the dish
that you put on the top,

that you add the last,

will be the first one
that you can remove, right?

And that's essentially a stack.

So, uh, remove or also known,
as a pop happens from the top.

An insert or a push
also happens at the top.

Okay, so in this case, um, uh, if you, uh, push a new element

into this particular stack,
um, uh, say nine.

The nine will go above three.

After that, if you pop,
or remove an element,

that's going to be
the last element that you added,

which is nine, in this case.

The next pop after that
will get three,

because nine was just removed.

And three is now the top
of the, uh, stack.

Uh, after you've removed three,

the next pop or removal
will get you five,

uh, and so on and so forth.

After that you'll get
8 and then 0 and 7.

Okay, so, once again,
the removal always returns you

the last item that was added.

And, uh, insertion or push
adds an item

right to the top, of the stack.

So these two data structures,
queue and stack

are used very widely
in computer science

and, um, in fact, we'll be using the notion of a stack, uh,

right away when we'll discuss, uh, uh, processes, um,

uh, soon in this particular orientation lecture, itself.

So, speaking of processes, um,
let's discuss a process next.

Uh, a process is essentially
a program in action.

Uh, you may write a program
in a language,

programming language,
such as, C or C++.

And that's sh-
an example is shown here.

This example code, here, uh,
consists of a main function,

which, uh, calls a function,
uh, called f1.

F1 itself has
a local integer variable x,

and then, f1 itself calls
a different function, f2.

Okay, this is shown
pictorially here, in the circle,

where main calls f1
and f1 calls f2.

'Kay, this is the code
that you would write,

and then, y'know,
you would compile the code

and then, uh, execute it.

And when you're executing it,
when your program is in action,

that is a process.

So, a process consists
of your code,

but also several other elements that, uh, help your code

to execute on the computer,
on which you are executing it.

So, we are going
to discuss some, uh, elements

o-of a process
which are critical.

Here, are some
of these elements.

First, um, uh,
let's look at the code.

The code itself, uh,
is, uh, static.

Once you write the code, um,
nothing changes it,

uh, we are not considering
the variable values

to be a part of the code.

I'll talk about that
in a little bit.

But the code itself, is static;

it does not change
once you write it.

Uh, but, uh,
there is a program counter,

which is typically maintained
by the computer

on which you're running your, uh, process, uh,

which points to

which is the line number
inside the code,

where the program
is currently executing,

or rather where, the process
is currently executing.

'Kay, this is called
a Program Counter, or the PC.

Okay, this is a part,

an important part
of the process state.

So, if you were
to take a core dump,

or stop the process right now,

and take a snapshot
of that process.

That snapshot must include,
the program counter,

because it says,
where in the code is,

uh, the execution currently at.

Next, uh, when, um, er-r,
um, functions call each other,

um, or in an object oriented, um, uh, program,

uh, methods call each other, they use a stack.

So every process contains,
uh, a stack.

Uh, more specifically,
um, a proce-

a process might contain
multiple threads.

Each thread will contain
its own stack.

But, let's just say

there is one thread
in this process.

Uh, so, a process
contains one stack,

and this stack is used
by these functions or methods,

to pass arguments and return, uh, the result values.

So, for instance,
when main calls f1,

main will push the arguments, uh, to f1, on top of the stack.

When f1 starts executing

it will pop, or remove,

the elements from the top
of the stack

and use it to, uh, execute.

Uh, similarly, uh,
when f1 calls f2,

um, f1 will push the arguments
on top of the stack, for f2,

and then f2 will, um, uh,
pop them, execute,

and then push the result values on top of the stack.

The result values
then popped by f1.

This is the way in which f1
gets back the result,

um, the final int return value from f2, to itself.

And finally when f-f1 needs
to return a resolved value to,

uh, main, it pushes it
on top of the stack

when the execution
returns to main,

uh, main, uh, pops it

or removes it from
the top of the stack

and main has
the final return value from f1.

'Kay, so, the stack

is an important part
of the process state as well,

because it tells you where
in the program execution

you are, with respect
to functions calling each other.

Finally, uh, y'know, uh,

functions have
local variable, like x.

There might also be
global variables, um,

uh, and of course in object oriented programs,

you have, uh, objects,

which store a lot
of fields in them.

A lot of these, uh,
pieces of data,

are stored in,
what is known as a heap.

Okay, so, uh, a heap
is essentially a lot of data,

uh, that was created
by functions or methods,

uh, or, uh, um, by objects, um.

And it holds all of the data
and, y'know, when an object,

when a function's executing,

uh, say f1 is executing,

uh, the object x might be created inside the heap,

when f1 is done executing.

X, uh, the object of x
will be removed from the heap.

Similarly,
there're also registers,

and I'll talk more
about registers

when we discuss
computer architecture,

but registers hold some
of the, uh, recent values

that have been accessed, uh,
by the process,

p1, in this case.

Okay, uh, speaking
of computer architecture.

Let's discuss
a simplified version

of computer architecture.

Computer architectures today,
uh, can be very complicated.

But, um, typically when we ta-

when we think of how
a process executes,

in a computer architecture,

we can think
of a simplified view.

So there is a CPU,

which executes
the action instructions,

which are present
in your, uh, code.

There're also a bunch
of registers,

that are co-located,
uh, with the CPU.

These are, uh,
small pieces of memory,

that can be accessed
very quickly by the CPU.

Typically, uh, there's only
a small number of registers.

Typically, not more than
a few tens of registers.

Uh, there's also a cache,

which is,
a slightly larger memory

than the collection
of registers.

And the larger
a map piece of memory is,

uh, the slower
it is to access it.

So accessing a cache
is slower than registers,

uh, than accessing registers.

But accessing a cache
is still pretty fast.

Uh, then beyond the cache
there is the main memory,

or the Random Access Memory,
or the RAM.

Uh, which is even larger
than the cache,

uh, and the ca-
the memory itself,

therefore, is slower
than the cache,

but still, it's pretty fast.

And then finally,
there's hard disk,

or if you would like,
a solid stave-state drive,

a flash, uh, drive, or whatever,

which is, a lot more memory than, uh, the main memory,

uh, but of course,
it's much slower.

So, as you go up from disk,
to memory, to cache, registers,

the speed increases,

the total amount
of, uh, storage space

that you have, decreases.

So, when you write a program,

such as, C++, or Java, or C,
and you compile it,

it gets compiled to low level machine instructions.

These machine instructions,

uh, might be specific
to the architecture,

the machine on which
you are running,

or it might be a, um,

ah, virtual machine,
like a JVM, kind of, um, code.

In any case, uh, these low level machine instructions

are the executable version
of your, uh, program,

and this is stored
in file system, um, uh,

in the file system on-
on your disk.

When your program
starts executing,

when it becomes a process,

uh, the CPU loads instructions in batches,

or, m-, in a simplifi-
more simplified way,

it loads instructions
one by one.

Uh, it loads them into memory,

and then into cache,
and into registers.

Typically, the cache
and the registers contain

the last few
accessed instructions.

Um, so they are, uhm,

they follow what is known as,

the least recently
used principle.

Now, as it executes y-
each instruction, the CPU,

which is executing the process, loads the data,

that is necessary
for the instruction,

into the memory
and then if necessary

into the cache
and the registers.

Um, and if there are
any necessary, uh, changes

that happen
to a variable like, x,

then, these are stored
into a memory, um, um,

uh, first into cache
and then into memory.

Once in a while, you may need
to store something on disks,

so that you have, um,
a permanent storage.

So in case,
the process goes offline,

then you still have some data.

Say for instance,
the program might open a file

and write something
into the file.

So, in this case,
uh, you may want to write those,

uh, updates to the file
onto the disk.

Now, of course this is a very highly simplified picture.

Computer architectures
can be far more,

uh, complex than this, today,

but for our practical purposes, in this course, and as far as,

uh, explaining how processes work, um, this will suffice.

So next, we move on
to a few mathematical topics.

The first mathematical topic is, a Big O notation.

A Big O notation is,

one of the most basic ways
of analyzing algorithms.

When I give you an algorithm,

and I say analyze
the algorithm mathematically,

essentially, I expect you to,
uh, come up with,

at least a Big O notation,

uh, and an analysis
of that algorithm.

The-the Big O notation,

describes an upper bound
in the behavior of an algorithm.

Um, as some variable is scaled

or increased in value
all the way to infinity.

Typically, this variable
is a size of the input.

Um, maybe the number
of input elements,

um, you'll see-sa-you'll see
a few examples of this.

So, essentially, this, uhh-oah,
Big O analysis tells us,

how does the, uh,
algorithm itself

scale as the size
of the input is increased?

Um, this analyzes the run time,

or typically, another performance metric, as well.

Most of the time,
we analyze run time,

sometimes, as you'll see
in the course,

we'll analyze things like
number of messages,

um, uh, even, uh, bandwidth, um,

uh, that is exchanged
i-in the total network itself.

The important thing about
Big O notation is that,

i-it captures
worst case performance.

Okay, so this is not
expected performance

or best case performance.

Big O is worst case performance.

Given this algorithm,

what is the worst that it can do in terms of, uh,

this performance metric
that we're measuring?

So, essentially, when I say,
an algorithm A is order of foo,

this means that,

Algorithm A takes &lt; c foo
time to complete,

assuming time is the metric
we are measuring,

for some constant c,
a fixed constant c,

uh, beyond some input size N.

Okay? Typically, this foo

that is, uh, present
in the definition is s-,

uh, uh, some function
of input size N.

For instance, I could say that
an algorithm is order N,

which means that algorithm A
takes less than c times N

time to complete
for some constant c,

beyond some input size N.

Similarly the algorithm
might be order N^2.

Which means, that, uh,
it is quadratic in time,

and it takes time
less than c times N squared

for some constant c.

Once again, when we say order
N^2 or order N,

notice that we do not include the constant c,

in the definition itself.

This is because,
the constant is irrelevant,

as long as there
is a fixed constant,

some fixed constant.

That is good enough for us
as far as Big O notation

is, uh, concerned.

Okay, so we do not state
the constants in Big O notation.

So let's see
a couple of examples.

Um, so first,
I give you an unsorted list

o-of, say, integers.

And I ask you to search
for a specific element

in the list.

This essentially means,
that you have to iterate through

the list, one item at a time.

And search for the element
by trying to match it

with each element quality
present, in the list.

What is the worst
case performance?

That's what we need,
for Big O notation, right?

So, the worst case performance
happens when,

either element is not there,
at all, in the list,

so, you return a false.

Uh, or the element is the very
last one, in the list.

The very last one,
that you match against.

Therefore, the
worst case performance

involves N operations,

and the number of operations involved is &lt; c* N, where c=2,

because the number
of operations is N,

where N is the size of the list.

Okay, so, the time as well as,
the number of operations,

uh, assuming each operation
takes one unit of time.

Both the time taken for this,
uh, um, uh, for this algorithm,

which iterates through the list.

As a list number
of operations are both order N.

That's why we see
order N over here.

Let's take
a different, uh, example

to see how
Big O notation works.

This leads with insertion
sorting of an unsorted list.

So suppose I give you a list
of elements, uh, say integers,

and this is unsorted
and I ask you to sort them,

say in increasing order of,
uh, the, uh, elements.

The insertion sorter algorithm
works as follows;

first, you create a new list, that is empty.

Then, you iterate
through the elements

in the unsorted original list,

you remove
one element at a time,

and you insert the element,

in the new list
at the appropriate position.

Essentially, you sort through
the new list, uh, linearly,

uh, searching
for the right position

where this element, should be.

So, if you do this,

the first element, of course, takes one operation to insert.

That's one insert operation.

The second element,
in the worst case,

takes two operations to insert.

One operation
for the comparison,

with the one element that's already present,

in the new sorted list

and one more operation
to insert.

Imagine the case,
the worst case,

where, this second element,

is inserted right
at the very end, of the list.

Similarly, the third element,
will take, uh,

three operations to insert,

where the first two elements,

compared with the two
already existing elements,

the third, uh, uh, operation,

uh, inserts it at the very end of, uh, the, ah,

uh, of the list.

If, you go on like this

and the i-th element
takes i operations to insert,

and you can calculate that
the total time required

to insert all the, m, elements is 1+2+3, so on until N.

And this is a well-known sum
which comes to N(N+1)/2,

which is, uh, &lt;1*N^2.

So, with a value
of C equals one, uh, this, uh,

insertion sorting example,

is an algorithm that takes
order N^2 in time,

and also order N^2
in number of operations.

Next, we look at a little bit
of basic probability,

uh, before probability,

we need to discuss,
few definitions.

Uh, the first definition
is a set.

A set is, essentially,
a collection of things.

So for instance, a set S
might be the set of all humans,

who live in this world,
at this current moment.

A subset, is another set,
which is a collection of things,

that is a part
of the larger set.

So, I might design a-, uh,
da, uh, describe a set S2,

which says,

a set of all humans who live, currently, in Europe, today.

S2 is a subset of S,

because every element
that is present in S2

is also present in S,

but the reverse
is not necessarily true.

Okay, so now, um, probability,

any-any event, uh,
has a probability of happening.

Okay, so, umm,
let me give you an example.

If you wake up
at a random hour of the day,

um, so you just wake up
at some random hour of the day,

say you're jet lagged
or something.

Uh, and, uh, I ask you,
what is the probability,

that the event at that time
is between 10 a.m. and 11 a.m.?

Let's try to calculate this.

Well, there are
24 hours in a day,

so you have a set
that consists of 24 elements,

one for each hour, 12 a.m.,
1 a.m., 2 a.m.,

all the way til,
10 a.m., 11 a.m., 11 p.m.

'Kay, so when I say
an element like, uh, 1 a.m.

I mean, the hour
between 1 a.m. and 1:59 am.

Similarly, 10 a.m. means,
10 a.m. and 10:59 a.m.

Out of the set of 24 elements,
you pick one, at random.

And the probability that
you're going to pick 10 a.m.,

is essentially, 1 divided by 24,

because you're picking
one element at random

and there are 24 elements,
uh, in the set,

and the probability
that you pick,

um, that one special element,
10 a.m., is 1 over 24,

because you-you wanna pick,
you wanna be lucky,

o-of that, uh,
to pick that, uh, 10.

Okay, so,
the probability that you,

if you wake up
at a random hour of the day,

the time is between
10 a.m. and 11 a.m.,

the-that probability
is 1 over 24.

Now, uh, there might be
multiple events

that you might need to calculate
the probability of,

uh, conjunctions or disjunctions of these events.

I'll describe
what these are, soon.

So say, E1 is one event
and E2 is another event,

and E1 and E-E2 are
independent of each other.

This means that,

E1 does not influence E2,
E2 does not influence E1.

Then the probability
that E1 and E2 both happen,

is essentially,

the multiplication
of these probabilities.

So you take
the probability of E1,

multiply it by
the probability of E2,

and that gives you,

the probability that
both E1 and E2 are true.

Let's discuss
an example of this.

Suppose you have three shirts.

They're colored
blue, green, and red.

And also you wake up at a random
hour and blindly pick a shirt,

you put your, uh, hand
into your, uh, closet,

and you blindly
pick out a shirt.

What is the probability that,

he woke up between
10 a.m. and 11 a.m.

and, that he picked
a green shirt to wear?

Well, the first probability
of the first event,

that he woke up
between 10 and 11,

is essential, 1 over 24,

like we calculated
on the previous slide.

The probability that you pick
the green shirt, is essentially,

one out of three, because there
are three colors of shirts,

uh, you have three shirts

and, uh, you want
to pick the green one,

you want to be lucky,
to pick the green one,

so that's one over three.

And so, the probability that
both these events are true,

that you woke up
between 10 and 11,

and that you wore
the green shirt

is the multiplication
of the probabilities

of those events.

So it's (1/24)*(1/3)=1/72.

One of the things,

that you have
to be careful about here,

is that, uh, if E1 and E2
are not independent,

meaning that they influence
each other in some way,

uh, then, you cannot multiply, uh, the, uh, probabilities

with, uh, each other.

Okay, so, for instance,
um, if, uh,

the shirts that you have
in your closet,

uh, change from one hour
to another, because, uh,

um, say, uh,

someone in your house
changes them around,

then you can't necessarily multiply these probabilities,

as they are here.

A different thing
that we need to do

with two event probabilities is,
uh, to OR them.

So, if, I give you two events,
E1 and E2

and I ask you
the probability that,

uh, of either E1 or E2 happening

then, you can say that
Prob(E1 OR E2)=

Prob(E1)+Prob(E2)-
Prob(E1 AND E2).

Okay, so, this is the intersection probability here.

If, you do not know
probability of E1 and E2,

this last term over here,

than, you can write the inequality Prob(E1 OR E2)&lt;=

the sum of the constituent probabilities, okay?

Sometimes we use these
inequalities to upper bound,

uh, the, uh, probabilities
of the, um, uh, disjunctions,

as we see here.

Okay, let's disces-discuss
a couple of-of,

a few miscellaneous topics.

The first miscellaneous topic
is, a Domain Name System,

or known as DNS.

This is a collection of servers,

that are spread throughout
the world and, uh,

the DNS is very critical

to the operation
of the, uh, web.

Uh, typically the input

to the DU-to the DNS system
is a URL, say coursera.org.

Uh, the URL is a name,

it's a human readable string,

that uniquely identifies
the object.

Uh, typically,
when you go to your browser,

whether it's Safari, Chrome, Internet Explorer, Netscape,

Firefox, or whatever
your favorite browser is.

Uh, you might enter a URL
like coursera.org,

the moment you hit Enter
on your browser,

your browser goes off
and contacts a DNS, the DNS,

the DNS system, and, uhm, uh, gives the DNS system

the name of this URL,
which is coursera.org.

What does the DNS
give back to your browser?

It gives back the IP address
of a web server,

that hosts that content,
so that your machine,

your browser can tho-
then go and send a request,

to that IP address,

and fetch the actual content
of that web page.

So, the IP address is an ID,

it's a unique string that points to that particular object.

For instance,
the coursera.org, uh, web page.

Uh, and the, um, the ID itself
may not be human readable.

So, essentially, a DNS
is a system that translates

between a human readable name,

such as, a URL,
to, uh, to, uh, unique ID,

um, which may
or may not be human readable,

like the IP address.

Uh, in this case, IP address
may refer to either,

the actual web server
that is storing the content,

or maybe an indirection server

such as, a content distribution network server,

or, uh, such as,
an Akamai server that is,

uh, hosting or, uh, caching
the content on your behalf.

Uh, graphs,

um, so when we say graphs,
in computer science,

we don't necessarily mean plots,
which have x and y axes.

Uh, a graph,
is essentially, a network.

So here I am showing you a graph
of some cities,

um, Amsterdam, Boston,
Chicago, Delhi, and Edmonton,

and the travel times,
uh, between those cities by air,

um, rounded up
to the nearest hour.

Okay, so, uh, traveling
from Amsterdam to Boston

takes about six hours.

Traveli-traveling
from Chicago to Delhi

takes about 14 hours.

Say these are the flights
that are available

on some particular airline.

Right?

So, you see both cities here,
as well as,

uh, connections between,
uh, pairs of cities,

uh, and also, um, some values along those connections,

which are, the number of hours it takes to transit

between those cities.

Uh, these all have names,
in the terms,

in the world of graphs.

So, each city,
would be a node or a vertex.

Uh, each connection
between a pair of nodes,

would be called an edge,

and the value along an edge
is called the edge weight.

Also, when I say
that A is adjacent to B,

this means that A has an edge that connects it to B, directly.

Typically, an edge
only goes between,

uh, uh two nodes
or two vertices.

An edge does not cross
three, uh, nodes.

So here, for instance, I have,
um, uh, 1, 2, 3, 4, 5 vertices,

and how many edges do I have?

I have an A-E edge, an A-B edge,
a B-E edge, a B-C edge

and a C-D edge,

so, I also have, uh, five,
uh, different edges,

each with its own,
uh, edge weight.

So, in this, uh, lecture,
we have covered,

the basic data structure,
such as, uh, queue and a stack,

we have seen that processes
are programs in action,

and that they consist
of multiple different pieces,

uh, processes of course are under computer architecture,

and it's important
for you to know,

at least, a simplified version of the computer architecture,

so that you know what's,
ah, operating where,

uh, when you run a process.

We've seen some
mathematical topics,

such as, Big O notation

which, analyzes the worst
case performance

of any given algorithm,

and then some basic probability.

And finally, we have seen
a few miscellaneous topics.

I hope you can come back-
keep coming back to this,

as a reference,
whenever you need it,

uh, throughout,
uh, the, uh, course.