Welcome to my talk. Thanks for your nice introduction and the nice welcoming from you guys!

You see the talk has the allusive name "Surveillance and language"

which obviously alludes to Foucault with "Surveillance and punish" (Discipline and Punish in English)

However, long before Foucault presented the genesis of the disciplinary society,

you find a lovely moral tale in a children's book,

which is named "The Kid in the glass house" by Heinrich Oswalt, written in 1877 and very foreshadowing

<i>In Frankfurt lives a glazier master,
Mr. Lebrecht Sheibenmann his name;</i>

<i>He had a little daughter,
Who never wanted to be washed.</i>

<i>And Gretchen came with sponge and soap,
So the bad girl ran away;</i>

<i>It even flipped the washing table -
The water flooded the house.</i>

<i>So Mr. Lebrecht Scheibenmann began
to build a strange house,</i>

<i>A house made only of glass, that, alas!
Was transparent throughout.</i>

And in this glass house
the bad daughter was then seated.

So that, in order to see,
People stopped on the street.

<i>So the kid was ashamed and ran around
In the entire house and screamed:</i>

<i>"Where can I hide?
You can see me from everywhere!</i>

<i>The roof, the cellar, every room
Is made of glass! you can always see me!"</i>

<i>The mother said: "My dear child!
There is a quick fix to that:</i>

<i>If people see you decent
They will pass by;</i>

<i>[...]
The daughter remembered that;
And tried to be seemly.</i>

<i>And because it no longer screamed while washing,
Other people never laughed;</i>

<i>Since everyone who peeked into the house,
Sees a kid that's very seemly.</i>

<i>And if you have your own child, you people,
That always screams while washing,</i>

<i>Just tell it Mr. Lebrecht Scheibenmann,
He will deliver you a glass house immediately.</i>

Yes, there... tentative approaches to applausing <i>laughs</i>
<i>Applause</i>

Yes, interesting story, that is certainly fitting for our times

as Lebrecht Scheibenmann is named Keith Alexander and works for the NSA

The NSA has made glass houses out of all our homes

we can all be seen in these glass houses

and you don't know, or at least I'm quite sure that one pursues educational purposes

that certain actions are no longer acceptable

and that we internalize this observation

Regarding this observation, language obviously plays a very important role

Many of our statements take place in the medium of language

This has also given hackers the idea to trick the NSA with a site like "Hello NSA"

A website which assembles suspicious words into messages like a "bullshitter"

and these are then tweeted, mailed or chatted upon

to achieve something like the "operation Troll the NSA:

that you can jam the NSA scanners, so that you can execute a DDOS attack

simply by sending too much content, which is basically suspicious on the basis of keywords

The point of my presentation is showing that the image of the NSA is wrong.

We cannot assume that at the NSA people really print something

as soon as a keyword is displayed and <i>laughter</i> start to analyse everything

and look at it closer and do a qualitative evaluation

and this certainly is a very intensive task

and therefore a keyword spam DDoS would certainly be ineffective

You all have probably read the thanksgiving talkingpoints of the NSA.

I don't know if you stumbled across it, that under the 4th point there is something utterly important

"NSA brings together the best linguists, analysts, mathematicians, engineers and computer scientists

in the United States."
and the linguists are named first.

<i>slight laughter</i>

So you can see that the NSA is definitely aware of language as an important medium

and which is also very important to them. In that it surely makes sense to deal with that

It happens that the secretary of the Interior has leaked the most recent analysing software, the "Advanced Security Toolkit"

Developed by the Von-Leitner-Institute for distributed realtime java. <i>laughter</i>

First, we'll look at today's mission.

Today's task is to check out the German blogosphere

that seems to be radicalizing since the government's take-over by the grand coalition

it's important to check if actions are in preparation to identify radical subjects if necessary,

which are especially striking. As a start, we choose our targets, of course some are suggested to us

Unfortunately I can only present a small selection of possible targets. I would have loved to take more

There are a few socio-critical blogs and news sites

like blog.fefe.de, Indymedia, Mädchenmannschaft, Netzpolitik.org, rebellmarkt.blogger.de

And religiously motivated websites like kreuz.net islambruderschaft.com blog and discussion board salafistic

and of course we confirm the selection. This is a very sensitive selection

The following analyses are possible. Naturally, I can only show a selection of possible analytic tools today

I wish I could show lots more, but there won't be enough time.

First we'll look at what authors write about possible sensitive targets

Meaning we'll make a target analysis.

On the basis of Name Entity Recognition it examines the collocation for possible terror targets

We have to... what is this? ...let's have a look in the manual, what Named Entities are

since it is our first day today

First of all, Named Entities are expressions which distinguish one entity clearly from other entities with similar attributes

Spontaneously one thinks of names, but it's not trivial to say what a name is

Accordingly, Named Entity Recognition is the procedure with which one identifies such Named Entities

There sure are different classes of Named Entities, e.g. people, organisations, places

Sometimes it's not very clear what belongs to a certain Named Entity, e.g. "der Bundestag" (Lower House of German Parliament)

this can be a geographical place as well as an organisation

Now we still need to know what collocations are

They are statistically overly random frequent word combinations

so "we define a collocation as a combination of two words, that exhibit a tendency to occur near each other in natural language that is to cooccur”

like "take a road", "go down a road"

Those are typical connections between the words "road", "go down", or "take"

and these connections form collocations if they are overly random

as we could determine with statistical tests

and we can observe them in natural language

One example - you don't need to read that now - I wanted to show an example for the word "Spezialexperte"

you can see the "keyword in context" here, being the requested key word

and you can see the contexts of this word, so apparently they haven't found a "chosen special expert for internet issues"

We won't have to make a quiz game of what blog it could come from

What you do then, for a collocation analysis you examine contexts

e.g. here five words on the left, five words on the right till the beginning or end of a sentence

You just count the words that are in the blue area

and you compare the relative frequency with the words which are on the left and right in the white area

If a word appears significantly more frequent in the blue area, you can say it is a collocation of the word "Spezialexperte"

What is striking here for example is "kriegen" or "Adobe-Spezialexperten" <i>laughter</i>

You can visualize collocation as graphs <i>laughter</i>

The knots denote lexemes (I'm not sure what's there to laugh about)

(that's serious linguistics!) and the edges denote "is collocation of"

So here you see "the best of the best, sir", Sarrazin and Mehdorn belong there.

It proliferates a little more. "Adobe-Backup", "Backup-Spezialexperten“ … interesting

Ok. Now we are in the area of the target analysis. Let's start the analysis.

What is it we are doing there? What we're doing is recognizing all Named Entities in all Corpora

We first calculate it with methods of mechanical learning.

Meaning you examine certain contexts in which the Named Entities stand.

We have a training corpus which already knows what Named Entities are

e.g. that "Bundestag" is an organisation and the software learns from these contexts

what typical contexts for such Named Entities are and tries to apply them to new Corpora

What we're doing here: we identify in all corpora, in all blogs, that we examine, the Named Entities.

we categorize these Named Entities after people, organisation, geographical locations and other

and then we calculate the collocations to the relevant Named Entities.

e.g. "Angela Merkel" could be interesting or something

And then we also look in the collocations, if they contain any danger words

Meaning words that indicate terror plans or others. Now we'll do that.

The analysis seems to be finished and the result is, we have danger level 1 of 5, so it's not really tragic

the software suggests a check of the danger level regarding Berlin

being the location of donalphonso, the blogger of Rebellmarkt

A potential target of Fefe is the SPD (Social Democratic Party) <i>laughter</i> and the Maedchenmannschaft one is "Kristina Schroeder" (Minister of Family Affairs)

As an example, we now have gotten an order to see what bad things donalphonso writes about Berlin and if he is planning something

Now we can display collocation graphs or geo-collocations

This means that we have a map and at the places which donalphonso writes about there are the correspondent collocations

In America he writes about Boyd and culture, lone perpetrators, confused and "hate mail" and stuff

Germany, Middle Europe is in the focus of course. It goes down till Italy

There you can also see what donalphonso writes about

We're approaching Berlin. There are too many collocations to evaluate

So we look at our collocation graph and look for references to terror that could take place

I'll read out some: " „Berlin“, „Slum“, „Reichshauptslum“, „arm“, „Transferleistung“, „abscheulich“, „Berliner Hipster“ <i>laughter</i>

While this may show quite a negative attitude towards the subject, it's not exactly suspicious of terror.

The other potential target were the organisations "SPD" with Fefe

We'll look at the collocation graph. Fefe and the SPD. <i>laughter</i><i>applause</i>

hey „betrayer party“, „fall-over party“, let's turn back briefly

In total, in the entire list we really found words such as:

„hang“, „force“, „top candidate“, „betrayer party“, "fall-over party“, „pest“, „cholera“ <i>laughter</i><i>applause</i>

If we look at the collocation graph, we can already see that those are accusations

But Fefe is not planning to finish the top candidate off

Let's continue with the ideology monitor. We'd want to take some measurements now...

It has been proven that the NSA has filed many software patents for algorithms about Named Entity Recognition

There has been quite some research going on some time ago

But first you find out what interesting targets are and what is said about them

You can certainly improve that by measuring ideologies.

What we want to calculate now is the similarity of texts, from blogs to certain ideologies

We have the possibility of measuring extreme leftist, rightist or islamistic attitudes

We do this by calculating typical collocations... for a certain corpus

From this corpus we learn. So that's our model of comparison.

subtitles created by c3subtitles.de