Welcome to my talk. Thanks for your nice introduction and the nice welcoming from you guys! You see the talk has the allusive name "Surveillance and language" which obviously alludes to Foucault with "Surveillance and punish" (Discipline and Punish in English) However, long before Foucault presented the genesis of the disciplinary society, you find a lovely moral tale in a children's book, which is named "The Kid in the glass house" by Heinrich Oswalt, written in 1877 and very foreshadowing In Frankfurt lives a glazier master, Mr. Lebrecht Sheibenmann his name; He had a little daughter, Who never wanted to be washed. And Gretchen came with sponge and soap, So the bad girl ran away; It even flipped the washing table - The water flooded the house. So Mr. Lebrecht Scheibenmann began to build a strange house, A house made only of glass, that, alas! Was transparent throughout. And in this glass house the bad daughter was then seated. So that, in order to see, People stopped on the street. So the kid was ashamed and ran around In the entire house and screamed: "Where can I hide? You can see me from everywhere! The roof, the cellar, every room Is made of glass! you can always see me!" The mother said: "My dear child! There is a quick fix to that: If people see you decent They will pass by; [...] The daughter remembered that; And tried to be seemly. And because it no longer screamed while washing, Other people never laughed; Since everyone who peeked into the house, Sees a kid that's very seemly. And if you have your own child, you people, That always screams while washing, Just tell it Mr. Lebrecht Scheibenmann, He will deliver you a glass house immediately. Yes, there... tentative approaches to applausing laughs Applause Yes, interesting story, that is certainly fitting for our times as Lebrecht Scheibenmann is named Keith Alexander and works for the NSA The NSA has made glass houses out of all our homes we can all be seen in these glass houses and you don't know, or at least I'm quite sure that one pursues educational purposes that certain actions are no longer acceptable and that we internalize this observation Regarding this observation, language obviously plays a very important role Many of our statements take place in the medium of language This has also given hackers the idea to trick the NSA with a site like "Hello NSA" A website which assembles suspicious words into messages like a "bullshitter" and these are then tweeted, mailed or chatted upon to achieve something like the "operation Troll the NSA: that you can jam the NSA scanners, so that you can execute a DDOS attack simply by sending too much content, which is basically suspicious on the basis of keywords The point of my presentation is showing that the image of the NSA is wrong. We cannot assume that at the NSA people really print something as soon as a keyword is displayed and laughter start to analyse everything and look at it closer and do a qualitative evaluation and this certainly is a very intensive task and therefore a keyword spam DDoS would certainly be ineffective You all have probably read the thanksgiving talkingpoints of the NSA. I don't know if you stumbled across it, that under the 4th point there is something utterly important "NSA brings together the best linguists, analysts, mathematicians, engineers and computer scientists in the United States." and the linguists are named first. slight laughter So you can see that the NSA is definitely aware of language as an important medium and which is also very important to them. In that it surely makes sense to deal with that It happens that the secretary of the Interior has leaked the most recent analysing software, the "Advanced Security Toolkit" Developed by the Von-Leitner-Institute for distributed realtime java. laughter First, we'll look at today's mission. Today's task is to check out the German blogosphere that seems to be radicalizing since the government's take-over by the grand coalition it's important to check if actions are in preparation to identify radical subjects if necessary, which are especially striking. As a start, we choose our targets, of course some are suggested to us Unfortunately I can only present a small selection of possible targets. I would have loved to take more There are a few socio-critical blogs and news sites like blog.fefe.de, Indymedia, Mädchenmannschaft, Netzpolitik.org, rebellmarkt.blogger.de And religiously motivated websites like kreuz.net islambruderschaft.com blog and discussion board salafistic and of course we confirm the selection. This is a very sensitive selection The following analyses are possible. Naturally, I can only show a selection of possible analytic tools today I wish I could show lots more, but there won't be enough time. First we'll look at what authors write about possible sensitive targets Meaning we'll make a target analysis. On the basis of Name Entity Recognition it examines the collocation for possible terror targets We have to... what is this? ...let's have a look in the manual, what Named Entities are since it is our first day today First of all, Named Entities are expressions which distinguish one entity clearly from other entities with similar attributes Spontaneously one thinks of names, but it's not trivial to say what a name is Accordingly, Named Entity Recognition is the procedure with which one identifies such Named Entities There sure are different classes of Named Entities, e.g. people, organisations, places Sometimes it's not very clear what belongs to a certain Named Entity, e.g. "der Bundestag" (Lower House of German Parliament) this can be a geographical place as well as an organisation Now we still need to know what collocations are They are statistically overly random frequent word combinations so "we define a collocation as a combination of two words, that exhibit a tendency to occur near each other in natural language that is to cooccur” like "take a road", "go down a road" Those are typical connections between the words "road", "go down", or "take" and these connections form collocations if they are overly random as we could determine with statistical tests and we can observe them in natural language One example - you don't need to read that now - I wanted to show an example for the word "Spezialexperte" you can see the "keyword in context" here, being the requested key word and you can see the contexts of this word, so apparently they haven't found a "chosen special expert for internet issues" We won't have to make a quiz game of what blog it could come from What you do then, for a collocation analysis you examine contexts e.g. here five words on the left, five words on the right till the beginning or end of a sentence You just count the words that are in the blue area and you compare the relative frequency with the words which are on the left and right in the white area If a word appears significantly more frequent in the blue area, you can say it is a collocation of the word "Spezialexperte" What is striking here for example is "kriegen" or "Adobe-Spezialexperten" laughter You can visualize collocation as graphs laughter The knots denote lexemes (I'm not sure what's there to laugh about) (that's serious linguistics!) and the edges denote "is collocation of" So here you see "the best of the best, sir", Sarrazin and Mehdorn belong there. It proliferates a little more. "Adobe-Backup", "Backup-Spezialexperten“ … interesting Ok. Now we are in the area of the target analysis. Let's start the analysis. What is it we are doing there? What we're doing is recognizing all Named Entities in all Corpora We first calculate it with methods of mechanical learning. Meaning you examine certain contexts in which the Named Entities stand. We have a training corpus which already knows what Named Entities are e.g. that "Bundestag" is an organisation and the software learns from these contexts what typical contexts for such Named Entities are and tries to apply them to new Corpora What we're doing here: we identify in all corpora, in all blogs, that we examine, the Named Entities. we categorize these Named Entities after people, organisation, geographical locations and other and then we calculate the collocations to the relevant Named Entities. e.g. "Angela Merkel" could be interesting or something And then we also look in the collocations, if they contain any danger words Meaning words that indicate terror plans or others. Now we'll do that. The analysis seems to be finished and the result is, we have danger level 1 of 5, so it's not really tragic the software suggests a check of the danger level regarding Berlin being the location of donalphonso, the blogger of Rebellmarkt A potential target of Fefe is the SPD (Social Democratic Party) laughter and the Maedchenmannschaft one is "Kristina Schroeder" (Minister of Family Affairs) As an example, we now have gotten an order to see what bad things donalphonso writes about Berlin and if he is planning something Now we can display collocation graphs or geo-collocations This means that we have a map and at the places which donalphonso writes about there are the correspondent collocations In America he writes about Boyd and culture, lone perpetrators, confused and "hate mail" and stuff Germany, Middle Europe is in the focus of course. It goes down till Italy There you can also see what donalphonso writes about We're approaching Berlin. There are too many collocations to evaluate So we look at our collocation graph and look for references to terror that could take place I'll read out some: " „Berlin“, „Slum“, „Reichshauptslum“, „arm“, „Transferleistung“, „abscheulich“, „Berliner Hipster“ laughter While this may show quite a negative attitude towards the subject, it's not exactly suspicious of terror. The other potential target were the organisations "SPD" with Fefe We'll look at the collocation graph. Fefe and the SPD. laughterapplause hey „betrayer party“, „fall-over party“, let's turn back briefly In total, in the entire list we really found words such as: „hang“, „force“, „top candidate“, „betrayer party“, "fall-over party“, „pest“, „cholera“ laughterapplause If we look at the collocation graph, we can already see that those are accusations But Fefe is not planning to finish the top candidate off Let's continue with the ideology monitor. We'd want to take some measurements now... It has been proven that the NSA has filed many software patents for algorithms about Named Entity Recognition There has been quite some research going on some time ago But first you find out what interesting targets are and what is said about them You can certainly improve that by measuring ideologies. What we want to calculate now is the similarity of texts, from blogs to certain ideologies We have the possibility of measuring extreme leftist, rightist or islamistic attitudes We do this by calculating typical collocations... for a certain corpus From this corpus we learn. So that's our model of comparison. subtitles created by c3subtitles.de