Welcome to my talk. Thanks for your nice introduction and the nice welcoming from you guys!
You see the talk has the allusive name "Surveillance and language"
which obviously alludes to Foucault with "Surveillance and punish" (Discipline and Punish in English)
However, long before Foucault presented the genesis of the disciplinary society,
you find a lovely moral tale in a children's book,
which is named "The Kid in the glass house" by Heinrich Oswalt, written in 1877 and very foreshadowing
In Frankfurt lives a glazier master,
Mr. Lebrecht Sheibenmann his name;
He had a little daughter,
Who never wanted to be washed.
And Gretchen came with sponge and soap,
So the bad girl ran away;
It even flipped the washing table -
The water flooded the house.
So Mr. Lebrecht Scheibenmann began
to build a strange house,
A house made only of glass, that, alas!
Was transparent throughout.
And in this glass house
the bad daughter was then seated.
So that, in order to see,
People stopped on the street.
So the kid was ashamed and ran around
In the entire house and screamed:
"Where can I hide?
You can see me from everywhere!
The roof, the cellar, every room
Is made of glass! you can always see me!"
The mother said: "My dear child!
There is a quick fix to that:
If people see you decent
They will pass by;
[...]
The daughter remembered that;
And tried to be seemly.
And because it no longer screamed while washing,
Other people never laughed;
Since everyone who peeked into the house,
Sees a kid that's very seemly.
And if you have your own child, you people,
That always screams while washing,
Just tell it Mr. Lebrecht Scheibenmann,
He will deliver you a glass house immediately.
Yes, there... tentative approaches to applausing laughs
Applause
Yes, interesting story, that is certainly fitting for our times
as Lebrecht Scheibenmann is named Keith Alexander and works for the NSA
The NSA has made glass houses out of all our homes
we can all be seen in these glass houses
and you don't know, or at least I'm quite sure that one pursues educational purposes
that certain actions are no longer acceptable
and that we internalize this observation
Regarding this observation, language obviously plays a very important role
Many of our statements take place in the medium of language
This has also given hackers the idea to trick the NSA with a site like "Hello NSA"
A website which assembles suspicious words into messages like a "bullshitter"
and these are then tweeted, mailed or chatted upon
to achieve something like the "operation Troll the NSA:
that you can jam the NSA scanners, so that you can execute a DDOS attack
simply by sending too much content, which is basically suspicious on the basis of keywords
The point of my presentation is showing that the image of the NSA is wrong.
We cannot assume that at the NSA people really print something
as soon as a keyword is displayed and laughter start to analyse everything
and look at it closer and do a qualitative evaluation
and this certainly is a very intensive task
and therefore a keyword spam DDoS would certainly be ineffective
You all have probably read the thanksgiving talkingpoints of the NSA.
I don't know if you stumbled across it, that under the 4th point there is something utterly important
"NSA brings together the best linguists, analysts, mathematicians, engineers and computer scientists
in the United States."
and the linguists are named first.
slight laughter
So you can see that the NSA is definitely aware of language as an important medium
and which is also very important to them. In that it surely makes sense to deal with that
It happens that the secretary of the Interior has leaked the most recent analysing software, the "Advanced Security Toolkit"
Developed by the Von-Leitner-Institute for distributed realtime java. laughter
First, we'll look at today's mission.
Today's task is to check out the German blogosphere
that seems to be radicalizing since the government's take-over by the grand coalition
it's important to check if actions are in preparation to identify radical subjects if necessary,
which are especially striking. As a start, we choose our targets, of course some are suggested to us
Unfortunately I can only present a small selection of possible targets. I would have loved to take more
There are a few socio-critical blogs and news sites
like blog.fefe.de, Indymedia, Mädchenmannschaft, Netzpolitik.org, rebellmarkt.blogger.de
And religiously motivated websites like kreuz.net islambruderschaft.com blog and discussion board salafistic
and of course we confirm the selection. This is a very sensitive selection
The following analyses are possible. Naturally, I can only show a selection of possible analytic tools today
I wish I could show lots more, but there won't be enough time.
First we'll look at what authors write about possible sensitive targets
Meaning we'll make a target analysis.
On the basis of Name Entity Recognition it examines the collocation for possible terror targets
We have to... what is this? ...let's have a look in the manual, what Named Entities are
since it is our first day today
First of all, Named Entities are expressions which distinguish one entity clearly from other entities with similar attributes
Spontaneously one thinks of names, but it's not trivial to say what a name is
Accordingly, Named Entity Recognition is the procedure with which one identifies such Named Entities
There sure are different classes of Named Entities, e.g. people, organisations, places
Sometimes it's not very clear what belongs to a certain Named Entity, e.g. "der Bundestag" (Lower House of German Parliament)
this can be a geographical place as well as an organisation
Now we still need to know what collocations are
They are statistically overly random frequent word combinations
so "we define a collocation as a combination of two words, that exhibit a tendency to occur near each other in natural language that is to cooccur”
like "take a road", "go down a road"
Those are typical connections between the words "road", "go down", or "take"
and these connections form collocations if they are overly random
as we could determine with statistical tests
and we can observe them in natural language
One example - you don't need to read that now - I wanted to show an example for the word "Spezialexperte"
you can see the "keyword in context" here, being the requested key word
and you can see the contexts of this word, so apparently they haven't found a "chosen special expert for internet issues"
We won't have to make a quiz game of what blog it could come from
What you do then, for a collocation analysis you examine contexts
e.g. here five words on the left, five words on the right till the beginning or end of a sentence
You just count the words that are in the blue area
and you compare the relative frequency with the words which are on the left and right in the white area
If a word appears significantly more frequent in the blue area, you can say it is a collocation of the word "Spezialexperte"
What is striking here for example is "kriegen" or "Adobe-Spezialexperten" laughter
You can visualize collocation as graphs laughter
The knots denote lexemes (I'm not sure what's there to laugh about)
(that's serious linguistics!) and the edges denote "is collocation of"
So here you see "the best of the best, sir", Sarrazin and Mehdorn belong there.
It proliferates a little more. "Adobe-Backup", "Backup-Spezialexperten“ … interesting
Ok. Now we are in the area of the target analysis. Let's start the analysis.
What is it we are doing there? What we're doing is recognizing all Named Entities in all Corpora
We first calculate it with methods of mechanical learning.
Meaning you examine certain contexts in which the Named Entities stand.
We have a training corpus which already knows what Named Entities are
e.g. that "Bundestag" is an organisation and the software learns from these contexts
what typical contexts for such Named Entities are and tries to apply them to new Corpora
What we're doing here: we identify in all corpora, in all blogs, that we examine, the Named Entities.
we categorize these Named Entities after people, organisation, geographical locations and other
and then we calculate the collocations to the relevant Named Entities.
e.g. "Angela Merkel" could be interesting or something
And then we also look in the collocations, if they contain any danger words
Meaning words that indicate terror plans or others. Now we'll do that.
The analysis seems to be finished and the result is, we have danger level 1 of 5, so it's not really tragic
the software suggests a check of the danger level regarding Berlin
being the location of donalphonso, the blogger of Rebellmarkt
A potential target of Fefe is the SPD (Social Democratic Party) laughter and the Maedchenmannschaft one is "Kristina Schroeder" (Minister of Family Affairs)
As an example, we now have gotten an order to see what bad things donalphonso writes about Berlin and if he is planning something
Now we can display collocation graphs or geo-collocations
This means that we have a map and at the places which donalphonso writes about there are the correspondent collocations
In America he writes about Boyd and culture, lone perpetrators, confused and "hate mail" and stuff
Germany, Middle Europe is in the focus of course. It goes down till Italy
There you can also see what donalphonso writes about
We're approaching Berlin. There are too many collocations to evaluate
So we look at our collocation graph and look for references to terror that could take place
I'll read out some: " „Berlin“, „Slum“, „Reichshauptslum“, „arm“, „Transferleistung“, „abscheulich“, „Berliner Hipster“ laughter
While this may show quite a negative attitude towards the subject, it's not exactly suspicious of terror.
The other potential target were the organisations "SPD" with Fefe
We'll look at the collocation graph. Fefe and the SPD. laughterapplause
hey „betrayer party“, „fall-over party“, let's turn back briefly
In total, in the entire list we really found words such as:
„hang“, „force“, „top candidate“, „betrayer party“, "fall-over party“, „pest“, „cholera“ laughterapplause
If we look at the collocation graph, we can already see that those are accusations
But Fefe is not planning to finish the top candidate off
Let's continue with the ideology monitor. We'd want to take some measurements now...
It has been proven that the NSA has filed many software patents for algorithms about Named Entity Recognition
There has been quite some research going on some time ago
But first you find out what interesting targets are and what is said about them
You can certainly improve that by measuring ideologies.
What we want to calculate now is the similarity of texts, from blogs to certain ideologies
We have the possibility of measuring extreme leftist, rightist or islamistic attitudes
We do this by calculating typical collocations... for a certain corpus
From this corpus we learn. So that's our model of comparison.
subtitles created by c3subtitles.de