1
00:00:08,334 --> 00:00:09,417
Hi.

2
00:00:10,562 --> 00:00:13,951
We are living in an exciting era,

3
00:00:13,952 --> 00:00:19,281
where innovation and technology
has the potential to do the unimaginable,

4
00:00:19,282 --> 00:00:22,559
and it becomes even more unimaginable

5
00:00:22,560 --> 00:00:26,560
when it breaks down the gaps
between disability and ability.

6
00:00:28,345 --> 00:00:31,325
15% of the world population

7
00:00:32,564 --> 00:00:35,324
- 1 billion people around the world -

8
00:00:35,325 --> 00:00:37,184
lives with disabilities

9
00:00:37,185 --> 00:00:41,668
which makes people with disabilities
the largest minority in the world.

10
00:00:42,605 --> 00:00:45,264
And they are not living
on a different planet.

11
00:00:45,265 --> 00:00:50,145
They may be part of our families,
friends, or colleagues.

12
00:00:51,426 --> 00:00:55,985
Today, I'm going to tell you
how people with speech disabilities

13
00:00:55,986 --> 00:00:59,366
will have a way to better communicate.

14
00:00:59,375 --> 00:01:03,233
I was 7 years old
when my sister Amal was born.

15
00:01:03,234 --> 00:01:05,893
I was too young to see the challenges

16
00:01:05,894 --> 00:01:09,463
that my family was facing
on a daily basis,

17
00:01:09,464 --> 00:01:13,813
but I could see that Amal
couldn't crawl, or eat, or talk

18
00:01:13,814 --> 00:01:16,913
like any other baby her age.

19
00:01:16,914 --> 00:01:22,063
But with time, we adjusted
to raise a baby with cerebral palsy,

20
00:01:22,064 --> 00:01:26,492
while understanding her special
communication patterns and needs.

21
00:01:28,406 --> 00:01:29,845
Nine years later,

22
00:01:29,846 --> 00:01:33,469
my family was blessed
to have another baby, Ahmad.

23
00:01:34,469 --> 00:01:38,288
Ahmad decided to grow up
exactly like his sister Amal,

24
00:01:38,289 --> 00:01:42,838
being so smart, so sharp,
curious about everything around him,

25
00:01:42,839 --> 00:01:47,208
but he also decided to invent
his special communication patterns

26
00:01:47,209 --> 00:01:48,809
to communicate with us,

27
00:01:49,782 --> 00:01:53,081
and for the other people
that couldn't understand him,

28
00:01:53,082 --> 00:01:55,208
we had to translate.

29
00:01:55,209 --> 00:01:59,626
Amal and Ahmad say "num"
when they are hungry,

30
00:01:59,659 --> 00:02:04,528
and they say "ahh" to call
the name of Nora, my sister.

31
00:02:04,542 --> 00:02:08,833
And when they want to call
my name, they say "abeya".

32
00:02:08,834 --> 00:02:12,585
In case they want to go
to the bathroom, they say "kkhh".

33
00:02:13,366 --> 00:02:16,945
We understand most
of their special communication patterns,

34
00:02:16,946 --> 00:02:20,546
but it's only us, the close circle.

35
00:02:20,551 --> 00:02:25,131
And this is the case for most
of the people who have an unclear voice.

36
00:02:26,292 --> 00:02:29,471
One of those people is Urit.

37
00:02:29,472 --> 00:02:33,691
Urit is a 34-year-old woman
with cerebral palsy.

38
00:02:33,692 --> 00:02:35,946
She is living an independent life.

39
00:02:35,947 --> 00:02:41,003
She can drive her car, go to the gym,
and do a lot of other things.

40
00:02:42,917 --> 00:02:47,656
However, when it comes
to communicating using her voice,

41
00:02:47,657 --> 00:02:50,912
sometimes, it can become
harder than going to the gym,

42
00:02:50,913 --> 00:02:53,122
and more frustrating

43
00:02:53,123 --> 00:02:58,542
because she finds herself repeating
the same words again and again

44
00:02:58,543 --> 00:03:01,067
in order to be understood.

45
00:03:01,068 --> 00:03:04,738
We asked Urit to say
a few words in English.

46
00:03:06,370 --> 00:03:08,199
Let's listen to her together

47
00:03:08,200 --> 00:03:10,790
and see if you can understand
what she's trying to say.

48
00:03:11,856 --> 00:03:13,946
(unclear speech)

49
00:03:17,481 --> 00:03:21,861
I don't know how many of you
could understand her this first time,

50
00:03:21,862 --> 00:03:23,471
but let's listen to her again,

51
00:03:23,472 --> 00:03:27,521
and really focus and try to understand
what she's trying to say.

52
00:03:27,522 --> 00:03:29,488
(unclear speech)

53
00:03:33,251 --> 00:03:37,491
Try to memorize what she has just said;
we'll get to that later.

54
00:03:38,664 --> 00:03:41,883
With my siblings, and Urit,
and people that I get to know,

55
00:03:41,884 --> 00:03:46,443
I had the chance to see
a world full of challenges,

56
00:03:46,444 --> 00:03:49,454
- a world of special people with needs.

57
00:03:50,353 --> 00:03:53,772
And this allowed me
to examine the existent technology

58
00:03:53,773 --> 00:03:57,635
in search of an answer
for what my siblings were seeking.

59
00:03:58,542 --> 00:04:02,334
Unfortunately, the current
state of the art assistive technology,

60
00:04:02,335 --> 00:04:07,258
including speech recognition applications,
could not provide an answer.

61
00:04:08,485 --> 00:04:13,534
Since then, all the assistive technology
has completely bypassed the voice,

62
00:04:13,535 --> 00:04:17,411
opting to use other modes of communication

63
00:04:18,362 --> 00:04:22,361
[by] replacing the voice
with symbols and images,

64
00:04:22,362 --> 00:04:26,222
or movements of the body
in the head or in the eyes.

65
00:04:27,356 --> 00:04:31,806
This brings me to the other lightweight
alternative that does use the voice

66
00:04:32,695 --> 00:04:35,844
which is speech recognition applications.

67
00:04:35,845 --> 00:04:39,395
This technology works in two approaches.

68
00:04:40,281 --> 00:04:44,401
The first approach attempts
to discover which word has been said.

69
00:04:46,013 --> 00:04:49,302
The second approach relies on phonemes.

70
00:04:49,303 --> 00:04:54,443
Phonemes are all the sounds
we produce using our mouth and nose.

71
00:04:55,618 --> 00:04:59,806
Both approaches rely on statistical models

72
00:04:59,807 --> 00:05:03,136
from a large database of standard speech.

73
00:05:03,137 --> 00:05:05,959
But once the speech is not standard,

74
00:05:05,960 --> 00:05:09,659
- when I say not standard,
I mean it's enough to have an accent,

75
00:05:09,660 --> 00:05:11,739
like most of us here -

76
00:05:11,740 --> 00:05:13,590
this will not work.

77
00:05:14,444 --> 00:05:19,593
My colleagues and I developed
a new approach of assistive technology

78
00:05:19,594 --> 00:05:22,355
that does use the person's own voice

79
00:05:22,356 --> 00:05:26,175
and can understand
non-standard speech patterns,

80
00:05:26,176 --> 00:05:31,506
with the mission to give people
with a speech disability their voice back.

81
00:05:32,858 --> 00:05:36,407
So, whose life is this going to change?

82
00:05:36,408 --> 00:05:39,166
People with cerebral palsy,

83
00:05:39,167 --> 00:05:41,959
people with Parkinson's,
and Myasthenia Gravis,

84
00:05:41,972 --> 00:05:44,347
so many [other] neurological disorders,

85
00:05:44,348 --> 00:05:46,637
people who are born
with hearing disabilities,

86
00:05:46,638 --> 00:05:51,717
or people who suddenly have a stroke
and their whole life is changed,

87
00:05:51,718 --> 00:05:54,569
but not only theirs.

88
00:05:54,570 --> 00:05:58,803
Not only the people who have
difficulty expressing themselves,

89
00:05:58,804 --> 00:06:03,473
but everyone who interacts
with them on a daily basis.

90
00:06:03,474 --> 00:06:08,547
This will make it easier
for them to be socially included

91
00:06:08,548 --> 00:06:13,195
- because every one of us
wants to be socially included.

92
00:06:13,196 --> 00:06:17,508
And now, you may be asking yourself,
"How does it work?"

93
00:06:17,509 --> 00:06:22,078
"How come the current speech recognition
technology couldn't do the same?"

94
00:06:24,978 --> 00:06:27,598
Because our technology works
in a different way.

95
00:06:28,808 --> 00:06:32,217
So, each person has to go
through two phases.

96
00:06:32,218 --> 00:06:35,357
The first phase is called
the calibration phase,

97
00:06:35,358 --> 00:06:41,047
where the person has to teach the device
and the application his own patterns

98
00:06:41,048 --> 00:06:44,227
by entering the patterns
and building his own dictionary.

99
00:06:44,228 --> 00:06:45,920
This phase usually happens

100
00:06:45,921 --> 00:06:48,920
with the person
who understands him the most.

101
00:06:48,921 --> 00:06:51,090
Together they will build the dictionary.

102
00:06:51,091 --> 00:06:55,340
This generally takes
only one to three hours,

103
00:06:55,341 --> 00:06:58,280
and it depends on the speaking
capability of the speaker.

104
00:06:58,281 --> 00:07:00,022
After building the dictionary,

105
00:07:00,023 --> 00:07:03,642
we move to the second phase
which is the recognition phase.

106
00:07:03,643 --> 00:07:07,628
The application will be able to recognize
unintelligible speech patterns

107
00:07:07,629 --> 00:07:10,828
from the dictionary that is already built

108
00:07:10,829 --> 00:07:14,369
and translate them
into a clear voice in real time.

109
00:07:15,660 --> 00:07:19,819
Our approach is user-dependent
and language-independent

110
00:07:19,820 --> 00:07:23,470
which means it can work
in any language in the world,

111
00:07:24,347 --> 00:07:26,476
even the invented ones.

112
00:07:26,477 --> 00:07:29,726
And the key word here
is 'pattern-matching'.

113
00:07:29,727 --> 00:07:35,016
Once the person builds his own dictionary,
and says a word that already exists there,

114
00:07:35,017 --> 00:07:36,652
there will be a pattern-matching

115
00:07:36,653 --> 00:07:39,832
between what he says
and what it already exists.

116
00:07:39,833 --> 00:07:41,852
But here we found a problem.

117
00:07:41,853 --> 00:07:44,921
We found that people
with a speech disability

118
00:07:44,922 --> 00:07:48,012
pronounce different words in similar ways.

119
00:07:49,652 --> 00:07:53,601
And the challenge was
to differentiate between them.

120
00:07:53,602 --> 00:07:57,314
So we created a technology
called Adaptive Framing.

121
00:07:58,255 --> 00:08:03,825
Adaptive Framing technology can be adapted
to the width of the event in the pattern.

122
00:08:03,834 --> 00:08:09,543
In the existing technology, you can see
the L and the A in the same frame.

123
00:08:10,402 --> 00:08:15,011
But in our new technology, you can see
that the L and A are in different frames

124
00:08:15,012 --> 00:08:18,042
which increases the accuracy
of the pattern-matching.

125
00:08:18,844 --> 00:08:22,454
And this makes
our pattern-matching so much better.

126
00:08:23,463 --> 00:08:26,352
I suppose you still remember Urit, right?

127
00:08:26,353 --> 00:08:30,523
Let's listen to her again now,
but this time using Talkitt:

128
00:08:33,520 --> 00:08:34,568
(unclear speech)

129
00:08:34,570 --> 00:08:36,042
Now I can ...

130
00:08:36,043 --> 00:08:37,373
(unclear speech)

131
00:08:37,374 --> 00:08:38,374
... start

132
00:08:38,375 --> 00:08:39,881
(unclear speech)

133
00:08:39,881 --> 00:08:41,342
... speaking freely.

134
00:08:42,982 --> 00:08:44,542
(Applause)

135
00:08:55,552 --> 00:08:57,906
Talkitt is only one step

136
00:08:57,907 --> 00:09:02,026
towards bridging the gap
between disability and ability

137
00:09:02,027 --> 00:09:04,946
by letting people express their potential.

138
00:09:04,947 --> 00:09:07,085
The more we challenge our minds,

139
00:09:07,086 --> 00:09:11,512
the more gaps will collapse
to let us all have a normal life.

140
00:09:11,513 --> 00:09:12,622
Thank you.

141
00:09:12,623 --> 00:09:13,653
(Applause)