1
00:00:00,233 --> 00:00:05,220
Hello. After having spent a lot of time on
the relatively simple two-dimensional

2
00:00:05,220 --> 00:00:09,428
problem of handwritten digit recognition,
we are now ready to tackle the general

3
00:00:09,428 --> 00:00:14,825
problem which is data finding 3D objects
and scenes. So, the settings in which we

4
00:00:14,825 --> 00:00:21,501
studied the problem these days is, is most
commonly that is so-called PASCAL object

5
00:00:21,501 --> 00:00:28,222
detection challenge. This is been going on
for about, this is been going on for about

6
00:00:28,222 --> 00:00:33,687
five years or so. And what these folks
have done is collected a set of about

7
00:00:33,687 --> 00:00:39,843
10,000 images where in each of these
images, they marked a certain set of

8
00:00:39,843 --> 00:00:44,554
objects and these object categories
include dining table, dog, horse,

9
00:00:44,554 --> 00:00:50,235
motorbike, person, potted plant, sheep,
etc. So, they have twenty different

10
00:00:50,235 --> 00:00:56,504
categories. For each object belonging to a
category, they have marked the bounding

11
00:00:56,504 --> 00:01:02,087
box. So, for example, here is the bounding
box corresponding to the dock in this

12
00:01:02,087 --> 00:01:06,892
image and there are bounding box
corresponding to a horse here and also

13
00:01:06,892 --> 00:01:10,429
there'll be bounding boxes corresponding
to the people because in this image, we

14
00:01:10,429 --> 00:01:18,413
have horses and people. The goal is to
detect these objects and so what a

15
00:01:18,413 --> 00:01:22,975
computer programmer supposed to do is,
let's say, we are trying to find dogs.

16
00:01:22,975 --> 00:01:27,983
What you are supposed to do is to mark
bounding boxes corresponding to where the

17
00:01:27,983 --> 00:01:35,261
dogs are in the image. And then, we'll be
judged by whether the dog is in the right

18
00:01:35,261 --> 00:01:39,586
location. So, the bounding box has to
overlap sufficiently with the correct

19
00:01:39,586 --> 00:01:45,993
bounding box. So, this is the, the
dominant data set for studying 3D object

20
00:01:45,993 --> 00:01:50,719
recognition. Now, let's see what
techniques we can use for addressing this

21
00:01:50,719 --> 00:01:56,856
problem. So, we start with of course, the
basic paradigm of the multi-scale sliding

22
00:01:56,856 --> 00:02:01,706
window. And this paradigm had been
introduced for face direction back in the

23
00:02:01,706 --> 00:02:08,244
90s. And since then, it's also been used
for pedestrian detection and so forth. So,

24
00:02:08,244 --> 00:02:13,505
the basic idea here is that we're going to
consider a window. Let's say, starting in

25
00:02:13,505 --> 00:02:20,981
the top-left corner of the image. So, this
green, this green boxes correspond to one

26
00:02:20,981 --> 00:02:28,387
of those windows and then we are going to
ev aluate the answer to that question. Is

27
00:02:28,387 --> 00:02:34,254
there are face there? Or there is a bus in
there? And shift the Windows lightly, ask

28
00:02:34,254 --> 00:02:39,791
the same question. And since the people
could be a, a variety of sizes, we have to

29
00:02:39,791 --> 00:02:45,815
read on this process for different sizes
for Windows just as to detect small

30
00:02:45,815 --> 00:02:53,717
objects as well as large objects. A good
and standard building block is a linear

31
00:02:53,717 --> 00:02:58,106
support vector machine trained on
Histogram or oriented gradient features.

32
00:02:58,106 --> 00:03:05,048
And this is a frame work introduced by
Dalal & Triggs in 2005 and they have

33
00:03:05,048 --> 00:03:09,319
details in their paper about how they
compute each of the blocks and how they

34
00:03:09,319 --> 00:03:13,531
normalize and well, if few of you are
interested in the details, you should read

35
00:03:13,531 --> 00:03:18,866
that paper. Now, note that the Dalal &
Triggs approach was tested on pedestrians

36
00:03:18,866 --> 00:03:25,759
and in the case of pedestrians, a single
block is enough and you try to detect the

37
00:03:25,759 --> 00:03:31,736
whole object in one goal. Now, when we
deal with more complex objects like people

38
00:03:31,736 --> 00:03:37,715
in general poses or dogs and cats, we find
that these are very non-rigid. So, one

39
00:03:37,715 --> 00:03:43,373
single rigid template is not affected.
What we really want are part based

40
00:03:43,373 --> 00:03:50,220
approaches. Nowadays, there are two
dominant part based approaches. The first

41
00:03:50,220 --> 00:03:56,116
is the so-called deformable part models
due to Felzenszwalb et al. There is a

42
00:03:56,116 --> 00:04:03,207
paper of that article probably in 2010.
And another approach is so-called Poselets

43
00:04:03,207 --> 00:04:09,915
and this is due to Lubomir Bourdev and
various and other collaborators in my

44
00:04:09,915 --> 00:04:16,031
group. So, what's the basic idea? So, let
me get into Felzenszwalb's approach first.

45
00:04:16,031 --> 00:04:23,734
So, their basic idea is to have a root
filter which is trying to find the object

46
00:04:23,734 --> 00:04:28,670
that hold. And then there will be a set of
path filters which might correspond to

47
00:04:28,670 --> 00:04:36,951
say, trying to detect faces or legs and so
forth but these path filters have to fire

48
00:04:36,951 --> 00:04:43,412
in certain spacial relationships with
respect to the root vector. So, the oral

49
00:04:43,412 --> 00:04:50,087
detector is the combination of holistic
detector and a set of part filters which

50
00:04:50,087 --> 00:04:54,584
had to be in the certain relationship with
respected to the whole object. And this

51
00:04:54,584 --> 00:05:00,743
requires training both the root filter and
the radiant spot filters an d this can be

52
00:05:00,743 --> 00:05:05,168
done using a so-called LatentSVM approach
which, and it does not require any extra

53
00:05:05,168 --> 00:05:11,663
annotation. And note that I said, parts
such as faces and legs. So, that's me

54
00:05:11,663 --> 00:05:16,611
getting carried away. The vector parts
need not to correspond to anything

55
00:05:16,611 --> 00:05:23,751
semantically meaningful. In the case of
the Poselets approach, the idea is to have

56
00:05:23,751 --> 00:05:29,735
semantically meaningful part, parts and,
and so the way they go about doing this is

57
00:05:29,735 --> 00:05:34,534
by making use of extra annotation. So,
suppose you have images of people that

58
00:05:34,534 --> 00:05:38,503
needs images might be annotated with key
points corresponding to left shoulder,

59
00:05:38,503 --> 00:05:42,688
right shoulder, left elbow, right elbow
and so on. While other object

60
00:05:42,688 --> 00:05:47,721
categorically will be other key points
such as, for example, for an airplane, you

61
00:05:47,721 --> 00:05:53,775
might have a key point on the tip of the
nose or the tip of the wings and so on and

62
00:05:53,775 --> 00:05:56,801
so forth. This requires extra work because
somebody has to go through to all the

63
00:05:56,801 --> 00:06:01,851
images in the test and then mark these key
points but the consequence will be that

64
00:06:01,851 --> 00:06:07,950
we'll be able to do a few more things
afterwards. Here's a slide which shows how

65
00:06:07,950 --> 00:06:14,469
the object detection with discriminatively
trained part based models works. So, this

66
00:06:14,469 --> 00:06:21,024
is the DPM model of Felzenszwalb Girshick,
MacAllester, and Ramanan. And here this

67
00:06:21,024 --> 00:06:26,919
model has been illustrated with powerful
handle bicycle detection. So, in fact, you

68
00:06:26,919 --> 00:06:31,844
don't train just one model, you train a
mixture of models. So, there is a model

69
00:06:31,844 --> 00:06:39,714
here corresponding to the side view of a
bicycle. So, the root filter is shown here

70
00:06:39,714 --> 00:06:45,165
so this is kind of the root filter and
this has kind, is looking for, this is a

71
00:06:45,165 --> 00:06:49,751
hot template. It's looking for edges of
particular orientations as might we found

72
00:06:49,751 --> 00:06:57,181
on the side view of a bicycle. Then we
have various part filters. So, the part

73
00:06:57,181 --> 00:07:02,204
filters are in factual here. So, each of
the rectangles here, this kind of the

74
00:07:02,204 --> 00:07:06,869
rectangle corresponds to a part filter.
So, this might, here, corresponds to

75
00:07:06,869 --> 00:07:15,257
something like a template detective for
wheels. And so, what we have to have to

76
00:07:15,257 --> 00:07:20,969
come up with the final score is to combine
the score corresponding to the hot te

77
00:07:20,969 --> 00:07:26,485
mplate of the root filter as well as the
hot templates for each of the part. Note

78
00:07:26,485 --> 00:07:31,122
that this detector for the side view of a
bicycle will probably not do a good job in

79
00:07:31,122 --> 00:07:36,826
consider front views of bicycles like
here. And so for this, they will have a

80
00:07:36,826 --> 00:07:44,490
different mode. So, again the model is
shown here. And here the wheel, the parts

81
00:07:44,490 --> 00:07:50,273
maybe somewhat different. So, overall, you
have a mixture model with multiple model

82
00:07:50,273 --> 00:07:55,113
corresponding to different poses and that
each model, it says, consists of root

83
00:07:55,113 --> 00:08:01,483
filter and various part filters and there
is some subtlety in training because there

84
00:08:01,483 --> 00:08:07,163
are no annotations that were leveled about
key points and so forth. So, in terms in

85
00:08:07,163 --> 00:08:11,042
the learning approach here, you have to
guess where the part should be as the part

86
00:08:11,042 --> 00:08:15,420
of the process of training and you can
find details in there that needs to

87
00:08:15,420 --> 00:08:23,544
[inaudible]. How well does it do? Okay,
there is standard methodology that we use

88
00:08:23,544 --> 00:08:29,519
in computer vision by evaluating detection
process. And here is how we do this for

89
00:08:29,519 --> 00:08:35,255
the case of, say a motorcycle detector.
So, when computes the so-called precision

90
00:08:35,255 --> 00:08:40,238
recall cuts. So, the idea is that the
algorithm, the detection algorithm is

91
00:08:40,238 --> 00:08:47,430
going to come up with guesses of bounding
boxes where the motorbike maybe. And we

92
00:08:47,430 --> 00:08:52,147
can then evaluate for each of these guess
bounding box. Is it right or wrong and

93
00:08:52,147 --> 00:08:56,241
it's just to be right if its intersection
where you meet in respect to these two

94
00:08:56,241 --> 00:09:04,838
motorbike is within 50%. Then, we have a
choice of how strict to be in a threshold.

95
00:09:04,838 --> 00:09:10,295
We could pass through most of our
candidate guesses bounding boxes and if

96
00:09:10,295 --> 00:09:14,573
you guess enough of them then of course,
you are guaranteed to find all of the

97
00:09:14,573 --> 00:09:18,902
motorbikes. So, this rather seemed right.
So, the way we do this is that we could

98
00:09:18,902 --> 00:09:24,116
have to pick a threshold and with that
threshold, you can evaluate the precision

99
00:09:24,116 --> 00:09:30,775
and recall. So, precision and recall.
These terms have the following meaning.

100
00:09:30,775 --> 00:09:36,452
Precisions means what fraction of the
detections that you declared are actually

101
00:09:36,452 --> 00:09:44,117
true motorcycles. Recall is the question
of how many of the two motorcycles that

102
00:09:44,117 --> 00:09:49,205
you, did you manage to detect? So, I
really want precision to be 100 percent

103
00:09:49,205 --> 00:09:54,519
and recall to be 100%. In reality, it
doesn't well count that way. We're able to

104
00:09:54,519 --> 00:10:00,621
detect some fraction of the two motorbikes
so here, for example, at this point, the

105
00:10:00,621 --> 00:10:07,112
precision is 0.7. That means at this point
we're able to detect, the, the, the

106
00:10:07,112 --> 00:10:12,704
detection that we declare often 70 percent
accurate. Now, this point corresponds to

107
00:10:12,704 --> 00:10:19,082
recall of something like 55 percent
meaning that at that threshold, we hold 55

108
00:10:19,082 --> 00:10:24,847
possible for two motorbikes and as we made
the threshold more lenient, we are going

109
00:10:24,847 --> 00:10:28,644
to get more false [inaudible] but we will
manage to detect more of the two

110
00:10:28,644 --> 00:10:32,905
motorbikes. So, as these curves goes down
in this range, in this, for this

111
00:10:32,905 --> 00:10:38,603
particular detected data curve which is
image to detect something like 70 to 80

112
00:10:38,603 --> 00:10:45,605
percent of the true motorcycles. So the
curves in this figure corresponds to

113
00:10:45,605 --> 00:10:51,309
different algorithms and the way we
compare different algorithms is by

114
00:10:51,309 --> 00:10:59,018
measuring the area and the curve. And that
the ideal case, of course, would be 100%.

115
00:10:59,018 --> 00:11:04,432
In fact there is something like 50 percent
to 60 percent for these cases and that is

116
00:11:04,432 --> 00:11:09,840
what we call AP or Average Precision. And
that is how we compare different

117
00:11:09,840 --> 00:11:16,740
algorithms. Here is the precision recall
curve for a different category namely

118
00:11:16,740 --> 00:11:21,751
person detection and the, the different
curves correspond to different category

119
00:11:21,751 --> 00:11:27,215
items so this algorithm is probably not a
good one. This algorithm is a better

120
00:11:27,215 --> 00:11:32,941
algorithm. And notice in both the
examples, we are not able to detect all

121
00:11:32,941 --> 00:11:38,352
the people and if you look through this 30
percent of the people which are not

122
00:11:38,352 --> 00:11:43,059
detected by any approach, usually, there
is heavy occlusion or unusual pauses and

123
00:11:43,059 --> 00:11:53,453
media. So, there are phenomenas that make
life difficult for us. Finally, the Pascal

124
00:11:53,453 --> 00:12:04,331
BOC, people have computed the average
precision for every class. And they give

125
00:12:04,331 --> 00:12:09,887
two measures. Max means the best algorithm
for that category, So, max, so, the max

126
00:12:09,887 --> 00:12:13,830
motorbike is something like say 58%. That
means that the best algorithm for

127
00:12:13,830 --> 00:12:19,490
detecting motorbikes has an average
precision of 58%. And the median is, of

128
00:12:19,490 --> 00:12:23,888
course, the m edian of the different
algorithm that was submitted. So, we

129
00:12:23,888 --> 00:12:29,237
conclude that some categories are easier
than others. Motorbikes are probably the

130
00:12:29,237 --> 00:12:35,100
easiest. Their average precision is 58%.
And something like potted plant is really

131
00:12:35,100 --> 00:12:41,186
hard to detect and the average precision
there is sixteen%. So, if you want to say

132
00:12:41,186 --> 00:12:46,345
where are we going quite well. It's, I
think, all the categories where the

133
00:12:46,345 --> 00:12:51,849
precision is where the average precision
is over 505 percent and that is motorbike,

134
00:12:51,849 --> 00:12:59,961
bicycle, bus, airplane, horse, car, cat,
train, bus. So about 50%. You may like it

135
00:12:59,961 --> 00:13:05,372
or not in the sense that this is the case
of the class happen to half way. So, since

136
00:13:05,372 --> 00:13:11,567
it's about 50%, maybe you can call it
Boat. Let's a look a little bit at some of

137
00:13:11,567 --> 00:13:16,847
the difficult categories. So, here are the
category of boat and the average precision

138
00:13:16,847 --> 00:13:22,343
here is about [inaudible] and if you look
at the set of examples, you will see why,

139
00:13:22,343 --> 00:13:26,773
why this is so hot because there are so
much radiation in appearance from one boat

140
00:13:26,773 --> 00:13:32,169
to another and it's really difficult to
detect, train to detect damage on all

141
00:13:32,169 --> 00:13:39,397
these cases. Okay, and even more difficult
example. Chairs. So, here we are supposed

142
00:13:39,397 --> 00:13:46,742
to mark bounding boxes corresponding to
the chairs and here they are. Okay, now

143
00:13:46,742 --> 00:13:51,868
imagine you're looking for a hot template
which is going to detect the characters,

144
00:13:51,868 --> 00:13:56,981
the edges corresponding to a chair. You
really can see that there is no hope at

145
00:13:56,981 --> 00:14:03,092
managing that. Probably, the way humans
detect chairs is by making use of the fact

146
00:14:03,092 --> 00:14:07,545
that when I, there's a human sitting on
thatt in a certain pose and, so there are

147
00:14:07,545 --> 00:14:11,826
a lot of contextual information which
currently is not being captured by the, by

148
00:14:11,826 --> 00:14:19,887
the algorithms. I'll turn to images of
people now. Analyzing images of people is

149
00:14:19,887 --> 00:14:27,741
very important. It enables us to build
human good computer interaction APIs, it

150
00:14:27,741 --> 00:14:36,678
enables us to analyze video, recognize
actions and so on and so forth. It's laid

151
00:14:36,678 --> 00:14:42,557
hard by the fact that people appear in a
variety of poses, the variety of clothing

152
00:14:42,557 --> 00:14:47,892
can be occluded, can be small, can be
large, and so on. So, this is really

153
00:14:47,892 --> 00:14:51,613
challe nging category even though it's
perhaps, the most important category for

154
00:14:51,613 --> 00:14:57,418
object recognition. So, I'm going to show
you some research from an approach which

155
00:14:57,418 --> 00:15:02,704
is based on poselets, the other part based
paradigm that I refer to. So, the big idea

156
00:15:02,704 --> 00:15:08,611
is that we can build on the success of
face detector and pedestrian detectors.

157
00:15:08,611 --> 00:15:14,193
So, face detection, we know what's well.
And so also, the pedestrian detection when

158
00:15:14,193 --> 00:15:21,682
you're talking about a vertical standing
or walking pedestrian. So, essentially,

159
00:15:21,682 --> 00:15:25,760
both of these rely on, on pattern matching
and they captured pattern that are common

160
00:15:25,760 --> 00:15:30,504
and visually characteristic. But these are
not the only too common in characteristic

161
00:15:30,504 --> 00:15:35,269
patterns. Effectively, we can have
patterns corresponding to this pair legs.

162
00:15:35,269 --> 00:15:43,640
And if we can detect those, we are sure
that we are looking at a person. And or we

163
00:15:43,640 --> 00:15:47,166
can have a pattern which doesn't
correspond to single anatomical part. This

164
00:15:47,166 --> 00:15:52,793
is the half of the face and half of the
torso and the center of the shoulder. This

165
00:15:52,793 --> 00:15:56,616
is fine, I mean this is pretty
characteristic observation for a person.

166
00:15:56,616 --> 00:16:04,882
So, the way, of course, how we train face
detectors pause that we had images where

167
00:16:04,882 --> 00:16:08,998
all face had been marked out. So, then
the, just the face of the youth, just

168
00:16:08,998 --> 00:16:14,105
input positive examples for a machine
learning algorithm. But, how are we going

169
00:16:14,105 --> 00:16:19,079
to find all these configuration
corresponding to legs and face and

170
00:16:19,079 --> 00:16:25,127
shoulders and so on. So, the poselet idea
is, is exactly to train these detectors

171
00:16:25,127 --> 00:16:30,947
but we don't wish to determine these in
advance. But first, let me show you what

172
00:16:30,947 --> 00:16:38,204
examples of Poselets are. So, this is a
Poselet let implies a small part. And the

173
00:16:38,204 --> 00:16:44,428
way it works is that consider the human
pulse and let is being planted at a small

174
00:16:44,428 --> 00:16:51,193
part of it. So, the top rope was that
corresponds to face, upper body, and the

175
00:16:51,193 --> 00:16:56,327
hand in the certain configuration. Second
row corresponds to two legs. The third

176
00:16:56,327 --> 00:17:02,376
row, let row corresponds to the back view
of a person. So, in fact, we can have a

177
00:17:02,376 --> 00:17:11,535
very and, and a pretty long list of these
Poselets. Now, the, the value of these is

178
00:17:11,535 --> 00:17:17,128
that it enables us to do later tasks more
easily. So, for example, we can train

179
00:17:17,128 --> 00:17:22,246
agenda classifieds. So, we want to
distinguish men from women and that can be

180
00:17:22,246 --> 00:17:28,788
done from the face also from the view.
That back wheel for person. And the legs

181
00:17:28,788 --> 00:17:34,847
because up in the clothing want by many
women are different. So once we have this

182
00:17:34,847 --> 00:17:40,470
idea of training positive detectors. We
can actually train two versions of a

183
00:17:40,470 --> 00:17:47,194
positive detector. One is for male faces,
one for female faces. And, and we can do

184
00:17:47,194 --> 00:17:52,885
that for each detector and essentially
this gives us a handle on how to come up

185
00:17:52,885 --> 00:17:59,201
with the classifications of more finding
classification of people. So, I'm going to

186
00:17:59,201 --> 00:18:04,673
show you some results here. So, these are
actually results from this approach. So,

187
00:18:04,673 --> 00:18:09,086
the top row where the things are men and
the bottom row is where the things are

188
00:18:09,086 --> 00:18:14,669
women. So, there are some mistakes here.
So, for example, these, these are really

189
00:18:14,669 --> 00:18:23,273
women and so are these and so there are
some mistakes but it's surprisingly good.

190
00:18:23,273 --> 00:18:28,703
Here is what the detector thinks are
people wearing long pants in the top row

191
00:18:28,703 --> 00:18:34,159
and not wearing long pants in the bottom
row. So, notice that once we can start to

192
00:18:34,159 --> 00:18:38,549
do this, we get to the ability of
describing people. So, in an image, I want

193
00:18:38,549 --> 00:18:42,646
to be able to say that this image is a
person who is tall, blonde man with

194
00:18:42,646 --> 00:18:52,107
wearing the green trousers. Here in the
top row is what the algorithm thinks are

195
00:18:52,107 --> 00:18:58,569
people wearing hats and the bottom row are
people not wearing hats. This approach

196
00:18:58,569 --> 00:19:05,337
applies to detecting actions as well. So,
here are actions that are revealed in

197
00:19:05,337 --> 00:19:10,076
still images. So, you just have a single
frame here. So, the, so, this image

198
00:19:10,076 --> 00:19:14,836
correspond to a sitting person, he is the
person talking on the telephone, a person

199
00:19:14,836 --> 00:19:20,362
riding a horse, a person running and so
on. So, again, this Poselet paradigm can

200
00:19:20,362 --> 00:19:27,785
be adapted to this framework. And, for
example, we can train Poselets

201
00:19:27,785 --> 00:19:32,645
corresponding to phoning people, running
people, walking people, and riding cars. I

202
00:19:32,645 --> 00:19:40,973
should note that the problem of detecting
action is a much more general problem. And

203
00:19:40,973 --> 00:19:45,724
we obviously don't want to adjus t or make
use of the static information. If we have

204
00:19:45,724 --> 00:19:51,395
video and we can compute optical flow
vectors, then that would give us an extra

205
00:19:51,395 --> 00:19:57,407
handle on this problem. And the kinds of
actions we want to be able to recognizing

206
00:19:57,407 --> 00:20:01,101
through movement and posture change,
object manipulation, conversational

207
00:20:01,101 --> 00:20:06,625
gesture, sign language and etc. So, if you
want, you can think of object as nouns in

208
00:20:06,625 --> 00:20:12,882
English and actions as verbs in English.
And it turns out that's some of the

209
00:20:12,882 --> 00:20:15,985
techniques that have been applied for
object recognition carry over to this

210
00:20:15,985 --> 00:20:21,202
domain. So, techniques such as bags of
spatio-temporal words, these are

211
00:20:21,202 --> 00:20:26,314
generalizations of SIFT features to video.
These turn out to be quite useful and give

212
00:20:26,314 --> 00:20:34,568
some of the best results for action
recognition task. Let me conclude here. I

213
00:20:34,568 --> 00:20:41,598
think our community has made a lot of
progress and object recognition, action

214
00:20:41,598 --> 00:20:46,897
recognition and so on. But a lot that
needs to be done. There is this face that

215
00:20:46,897 --> 00:20:51,109
people in the multimedia information
systems community talk about, the

216
00:20:51,109 --> 00:20:58,860
so-called semantic gap. So, their point is
that typically where images and videos are

217
00:20:58,860 --> 00:21:04,003
presented as pixels, pixel brightness
values, pixel RGB values, and so on. There

218
00:21:04,003 --> 00:21:08,471
is what we are really interested in the
semantic content. What are the objects in

219
00:21:08,471 --> 00:21:12,942
the scene? What scene is it? What are the
events taking place and this is what we

220
00:21:12,942 --> 00:21:19,059
would like to live. And we're not there
yet. There's no way near human performance

221
00:21:19,059 --> 00:21:25,115
but I think we have made significant
progress and more continue to happen over

222
00:21:25,115 --> 00:21:29,115
the next few years. Thank you.