Each week I come across
an article or a report
that asserts that data is the new oil,
that the use of data will lead
to a new era of knowledge,
or even that it can predict the future.
This has been particularly true since
everyone started talking about big data.
You know, the use of large-scale data,
mega data.
For example, Sergei Brin,
the founder of Google,
who is focusing on the use of medical data
to cure Parkinson's disease,
for which he is at risk.
During the World Cup, many people said
that the German team was able to
beat the Brazilian team 7-1
thanks to the use of match data.
It's clear
that there is no field or
type of organization
for which Big Data
isn't supposed to be
a magic wand that will enable
the resolution of extremely
complex problems.
And I must admit that I feel uneasy
about these kinds
of simplistic statements,
which I see as overshadowing
a number of issues, including the economy,
the environment,
politics,
and the ethics
of the massive production of data.
Please don't think that I am skeptical
or doubtful about data,
or that I am opposed
to all forms of quantification.
On the contrary,
I live surrounded by data.
During the day, I'm working on a thesis
in sociology at Telecom ParisTech
where I study Open Data.
The important effort
to provide open access to public data.
And I study the consequences of Open Data
for the operation of government.
At night, I am the administrator
for an association,
Open Knowledge Foundation France,
which campaigns for open knowledge
and for data that benefits everyone.
Today, I would like to persuade you
that, at this time, when data
is becoming obtrusive,
we need to take a step back.
This coronation of data
that we are witnessing
during the era of Open Data and Big Data
demands a new culture of
critical thought about data.
We must be able to understand
how it is produced and used,
and how we can become
independent from it.
I also want to share
the results of an experiment
that we did at
Open Knowledge Foundation France
called "the School of Data."
I hope to show that,
through the use of data,
we can manage to develop
this culture of critical thought
and that we can develop
new checks and balances.
So, what are the problems with data?
The first problem is that
data is always right.
Now, don't believe that this is
anything new.
Historically, the word 'data'
comes from the Latin word "datum"
which, in mathematics and theology
in the 15th century, referred to
the facts taken as given in an argument
and which were not to be
called into question.
Today, as you know, data refers
to everything that flows
in your computer.
That is to say, the 1's and 0's
that pass from USB stick to hard disk
are considered data.
On the other hand,
the sense that data is a given,
that it is factual,
that it is not to be questioned,
has remained.
The second problem with data
is that we don't really know
where it comes from.
In general, when someone uses data,
he or she has very little information
about the way
in which it was produced.
At best, you will have access to metadata,
that is to say, data about the data,
which will tell you the contents
of the file and, occasionally,
how the data was produced.
However, that data has a long history.
It was collected.
It was processed, formatted,
aggregated, processed by algorithms,
and visualized before reaching you.
This is why sociologist
Bruno Latour asserts
that we should say 'obtaineds'
instead of data
to accurately reflect this long history
which will constrain a number of uses.
Finally, the third problem with data
is that we can't really see it.
Have you ever seen a data center,
even if only from outside,
or from the road?
Do you have any idea
of where your data is stored?
I mean, physically, where it is stored?
Do you have any idea what will happen
to it in 10 years?
In any case, I have no answer
for these three questions.
However, even if we can't see our data,
we can measure its effects.
At the individual level,
when Facebook changes its terms of service
or modifies its algorithm, it has
consequences for your private life
and for the way in which you present
yourself as an individual.
And on the most macroscopic level,
the Snowden affair has shown
that the massive production of data
can have consequences
for the sovereignty of the State
or for diplomacy.
This is why we must develop a culture
of critical thought about data.
To encourage myself,
I was inspired by a book
called "Statactivism."
Statactivism is a neologism
proposed by researchers and artists
that refers to those experiences
that permit one to liberate oneself
from the power of data.
The fundamental basis of statactivism
is that data controls us,
and that it imposes on us
like an argument from authority.
The goal of statactivism
is almost revolutionary.
It asserts that other kinds of data
must be possible.
It is not necessary to be opposed
to all data.
Instead, we should use the power of data
to propose other realities
to critique data more effectively,
or to propose other measures.
In short, to propose other data.
There is a motif in the book
which I find particularly meaningful,
that of the judoka.
Judoka use the strength of their opponents
in order to turn it back on them.
That is what I want to invite you
to do today:
think about how to use data
to better analyze it.
I think, precisely at this moment
in the development of Open Data,
the need to develop a culture
of critical thought about data
is increasingly crucial.
Don't be misled: Open Data represents
an extraordinary opportunity.
The volume of data is exploding
and data is no longer
the privilege of the powerful.
Today, you can use data
without asking anyone's permission.
And this is a good idea,
because public data is available.
But I think that there is a risk
to thinking
that the simple diffusion of data
will be enough to emancipate society,
that individuals can emancipate
themselves from the power of data
just because they have access to data.
There is a Canadian sociologist
named Michael Gurstein
who has proposed an expression
that sums up a risk of Open data,
namely, "Empower the Empowered,"
meaning to give more power
to those who already have it.
That is why it's crucial
to develop a culture of critical thought
to be able to understand how data
is produced,
used, and how you can use it
to take a step back.
Well, that's the theory.
I would like to share with you
the first results from an experiment
that we did in my association:
Open Knowledge Foundation France.
We are part of a worldwide network
dedicated to open knowledge and open data.
We have groups in more than 50 countries.
And the idea of our association
and of this worldwide movement
is that each person can benefit,
can profit,
from works, scientific articles,
and content,
to create, play, educate,
or to start up a business.
Open Knowledge has
a large number of projects.
I'm going to talk about one project,
the "School of Data."
We participated together
in the translation of this project,
this "School of Data."
The School of Data consists
of online resources
that are free and accessible to all,
and also events.
We first proposed classes.
In these classes, you do not even have
to know what data is.
Or how to use a spreadsheet,
which is really the tool of choice.
You will be taught about that
in our class.
No expertise is required,
you are guided step by step
in the use of data.
We also use another format
which is particularly educational,
namely, the recipe.
Recipes are just like in cooking -
you have ingredients
and steps.
The ingredients will be data,
software - free if possible,
so that you can use data.
The idea is that making a map
of electoral results,
or a graph of results
of the French soccer team
should be as easy to do as making
a tarte Tatin or Bechamel sauce.
You find the resources online
and we walk you through the project
step by step.
We also have tried to develop
another format for in-person sessions,
which we call expeditions.
For expeditions, it's like
mountain climbing:
you have a guide, a "data sherpa,"
who will accompany you,
attached by a rope.
There will be 10 or 20 participants
who work together during a weekend
or sometimes for a few hours.
Our first data expedition
focused on the question of air pollution
in Île-de-France.
I don't know if you have seen
these images of Paris
with black clouds of pollution.
They left their mark on us,
and we said to ourselves:
"Well, let's dig into this set of data."
The first step, when we undertook
this data expedition,
was to identify the available data.
We realized that there is
no available data
that is freely reusable, that is to say,
that you have the right to reuse
without asking for permission,
on this crucial question.
Therefore, we had to extract data
from websites,
reports, or even from graphics.
Imagine what a mess it is to expose
data that is in a graphic.
We also realized that Airparif,
the organization responsible
for the production of data
relevant to the question
of air pollution in Île-de-France
does not allow you to use
its data freely.
One must ask permission, or pay.
We were able to overcome
these constraints
and to conduct this expedition
guided by our sherpa, Pierre.
During this data expedition
we broke into small groups,
and each group was assigned an angle.
One of the principles of the expeditions:
you have an angle, like in journalism,
we ask ourselves questions that could be
the title of an article.
The first group asked itself
if bicycle riding had led to a decrease
in air pollution in Paris.
The second group,
since it was during a strike,
asked itself if public transport strikes
cause air pollution in Île-de-France
to increase.
And the third group asked if
all regions are equal
with regard to air pollution,
or if geography and environment
could have an effect, and if so,
could be seen in the data.
The results of this expedition,
I am sorry to say, will be a bit
disappointing.
We did not find any correlation
or causal connection
with nice data points,
a fitting curve, or a straight line,
that proves that our hypotheses
are correct.
We did not succeed at that,
but we worked for four hours.
What we did manage to show,
on the other hand,
is that it is extremely difficult
to use data concerning a question
as crucial as air pollution,
to understand how it is produced,
extremely difficult to use it,
that the most simple measurements
are not accessible,
and that you do not necessarily have
the right to reuse them.
That is just what we tried
to do at this event:
to develop a culture of critical thought
on the way in which data
is used concerning the question
of air pollution.
We also tried to develop this format
of expeditions and training events
with another group
that is less expected,
that of children.
We asked ourselves the question
during an event that we did with Etalab,
the government institution
in charge of data.gouv.fr,
the open data portal
of the French government.
We suggested the idea
of radically different open data portals.
They were fictional projects,
just prototypes.
There is a group that has come out
with a prototype called Tada.gouv.fr.
Tada.gouv.fr is a fictional portal,
a bit idealistic, destined for children.
The data is presented
not by government department or minister,
but by discipline, that is to say
that you have data
about history and geography,
physics and chemistry,
or life and Earth sciences.
On this occasion, we realized
that open data
can be a fantastic resource
for school
because it allows the development
of inter-disciplinary work,
and this culture of critical thought
about data I have mentioned.
We did not leave things at observation.
We tried to do a first experiment
and I would like to tell you
about the first results.
We joined with Silicon Banlieue,
which is a site dedicated
to data in Argenteuil,
and we proposed to do an event
with children between 8 and 14 years old
who came to the Open World Forum,
an event dedicated
to open computing in Paris.
There, you can see me from the back.
With the 8 to 14 year old children,
we worked on the question of cinema,
because that interested them,
and it is a simple enough subject.
First we collected data,
nothing very complicated,
it was just a paper form.
We asked them how many times a month
they go to the cinema,
which movies they saw from a list;
then we compared that with data
that is available from the survey
of French cultural practices,
on which you have
exactly the same type of data.
With the children, we produced
an infographic at this time.
Now, I am really bad at math,
I got a 7,5 on the Bac,
I found myself explaining
the concept and calculation
of averages using a spreadsheet,
which was rather surprising.
I explained how it works.
We emerged with an infographic
and we were able on this occasion,
I think that this is the important point,
to develop a culture of critical thought.
I explained to them about data,
how it is used,
how they can use it,
how it controls us in a certain way,
but that we can also take back
the power over data.
I assure you that with a topic
as attractive as cinema
we can deliver this kind of message
and have a discussion on these questions.
I hope that I have convinced you
that it is necessary today
to take a step back with regard to data,
to develop a culture of critical thought,
to understand
how it is produced
and how you can use it,
to prevent data from being forced on you.
So from today,
get your hands dirty,
find a sherpa,
all of the resources are online,
and go on a data expedition.
Thank you.
(Applause)