23 Jan 2012


The International Data Journalism Award competition was launched last week. It is the first ever contest to recognise  outstanding work in the growing field of data journalism worldwide.

Organised by the Global Editors Network (GEN) in collaboration with Google and the European Journalism Centre, the Data Journalism Awards aimed at “setting standards and highlighting the best practices in data journalism.”

“We’d like to enhance collaboration between journalists, developers and designers,” Bertrand Pecquerie, CEO of GEN, announced at a press conference in London last week.

“But we also want to inspire people in the newsrooms by showcasing outstanding data journalism work,” he said.

A jury of data journalism experts and editors from all over the world will grant a total of 45,000€ to six winners.

There are three award categories:

– data-driven investigations

– interactive data-visualizations

– data-driven mobile or web applications / services.

Two sub categories will be defined for national and international media organisations and for regional and hyperlocal organisations. “It is very important to us to have that three levels so that students or freelancers don’t have to compete with big organisations,” says Antoine Laurent, DJA Project Manager.

The DJA website is now live and media companies, non-profit organisations, freelancers or individuals have until the 10 April 2012 to submit their application by filling this online form. Only entries published between 11 April 2011 and 10 April 2012 will be considered.

“We are convinced there is a bright future for journalism. At the moment, only a few organisations are working on data. There is a lot of it available online so what are the journalists waiting for? It is a good idea to define standards for data journalism and that’s what these awards set to do,” argued Bertrand Pecquerie, CEO of GEN.

Paul Steiger, CEO of Pro Publica, is the president of the jury for this competition. Other big figures from the world of data journalism such as Aron Pilhofer from the New York Times and Wolfgang Blau from the Zeit Online are also part of the jury.

“So many data journalists are alone in their newsrooms, we are building a network where they can meet,” Wilfried Ruetten, Director of the European Journalism Centre said during last week’s press conference.

The selection process will start in April and the winners will be announced during the 2012 News World Summit in Paris on 30 May 2012. Good luck!


16 Nov 2011



This post is by Lucy Chambers, community coordinator at the Open Knowledge Foundation, and Friedrich Lindenberg, Developer on OpenSpending. They recently attended the Global Investigative Journalism Conference 2011 in Kyiv, Ukraine, and in this post, bring home their thoughts on journalist-programmer collaboration…

The conference

The Global Investigative Journalism Conference must be one of the most intense yet rewarding experiences either of us have attended since joining the OKF. With topics ranging from human trafficking to offshore companies, the meeting highlighted the importance of long-term, investigative reporting in great clarity.

With around 500 participants from all over the globe with plenty of experience in evidence gathering, we used this opportunity to ask many of them how platforms like OpenSpending can contribute, not only to the way in which data is presented, but also to how it is gathered and analyzed in the course of an investigation.

Spending Stories – the brainstorm

As many of you will be aware, earlier this year we won a Knight News Challenge award to help journalists contextualise and build narratives around spending data. Research for the project, Spending Stories, was one of the main reasons for our trip to Ukraine…

During the data clinic session as well as over drinks in the bar of “Hotel President” we asked the investigators what they would like to see in a spend analysis platform targeted at data journalists. Cutting to the chase, they immediately raised the key questions:


It was clear that the platform should support the existing journalistic workflow through publishing embargos, private datasets and note making. At the same time, the need for statistical and analytical heuristics to dissect the data, find outliers and visualize distributions was highlighted as a means to enable truly data-driven investigations of datasets. The goal in this is to distinguish anomalies from errors and patterns of corruption from policies.


With the data loaded and analyzed, the next question is what value can be added to published articles. Just like DocumentCloud enabled the easy embedding of source documents and excerpts, OpenSpending should allow journalists to visualize distributions of funds, embed search widgets and data links, as well as information about how the data was acquired and cleaned.


Many of those we spoke to were concerned about the complexity required to contribute data. The recurring question was: should I even try myself or hire help? It’s clear that for the platform to be accessible to journalists, a large variety of data cleansing tutorials, examples and tools need to be at their disposal.

We’ve listed the full brainstorm on the OpenSpending wiki

You can also see the mind map with concrete points below:

Hacks & Scrapers – How technical need data journalists be?

In a second session, “Data Camp” we went through the question of how to generate structured data from unstructured sources such as web pages and PDF documents. [Read more…]

15 Nov 2011

Nicola Hughes from ScraperWiki shared this video on Twitter recently and we thought it would be a shame not to share it with you too.

Experts in data mining gathered at the Paley Center for Media on 10 November 2011 to discuss the future of journalism and how to sustain a journalism watchdod in the digital age. This session is about data mining and the new tools available online.

Watch the video and let us know what you think. If you’ve used some of them, tell us how good -or how bad- you think they are…

Next Big Thing: New Tools for Digital Digging from The Paley Center For Media on FORA.tv

Presenters include:

Bill Allison

Bill Allison is the Editorial Director at the Sunlight Foundation. A veteran investigative journalist and editor for nonprofit media, Bill worked for the Center for Public Integrity for nine years, where he co-authored The Cheating of America with Charles Lewis, was senior editor of The Buying of the President 2000 and co-editor of the New York Times bestseller The Buying of the President 2004.

He edited projects on topics ranging from the role of international arms smugglers and private military companies in failing states around the world to the rise of section 527 organizations in American politics. Prior to joining the Center, Bill worked for eight years for The Philadelphia Inquirer — the last two as researcher for Pulitzer Prize winning reporters Donald L. Barlett and James B. Steele.


David Donald

David Donald, United States , is data editor at the Center for Public Integrity, where he oversees data analysis and computer-assisted reporting at the Washington-based investigative journalism nonprofit.


Sheila Krumholz

Sheila Krumholz is the Center for Responsive Politics’ executive director, serving as the organization’s chief administrator, the liaison to its board and major funders and its primary spokesperson.

Sheila became executive director in 2006, having served for eight years as the Center’s research director, supervising data analysis for OpenSecrets.org and the Center’s clients. She first joined the Center in 1989, serving as assistant editor of the very first edition of Open Secrets, the Center’s flagship publication.

In 2010, Fast Company magazine named Sheila to its “Most Influential Women in Technology” list. Sheila has a degree in international relations and political science from the University of Minnesota.

Jennifer 8. Lee

Jennifer 8. Lee authors The Fortune Cookie Chronicles ($24.99). Also, she’s a New York Times reporter.


Nadi Penjarla

Nadi Penjarla is the chief architect and designer of the Ujima Project. The Ujima Project (www.ujima-project.org) is a collection of databases, documents and other resources that aims to bring transparency to the workings of governments, multinational non-governmental organizations and business enterprises.

Nadi’s work demonstrates that data analysis provides unique insights into international and local political controversies and brings the facts of the world into sharper focus. He has spoken and conducted workshops on computer assisted reporting at international forums such as the ABRAJI Conference in Sao Paulo, Brazil, the GLMC Investigative Journalism Forum in Kigali, Rwanda, and at the Annual Investigative Reporters & Editors (IRE) Conference.

Nadi possesses a strong background in data analysis and data mining, including work as an investment banker, and a strategy and business analytics consultant. Past projects include consulting for Fortune 500 companies on how to improve strategic decision-making, enhance operations, conduct complementary marketing and transform related business processes by properly analyzing data and its implications. In 2003 Nadi was the founding editor of Global Tryst, an online magazine focusing on international issues from a grassroots perspective.

Nadi holds an MBA from the University of Chicago, an M.S in Engineering and Computer Science, and a B.S. in Engineering. He can be reached at 202-531-9300 or at nadi.penjarla@gmail.com

14 Nov 2011

This video is cross posted on DataDrivenJournalism.net, the Open Knowledge Foundation blog and on the Data Journalism Blog.

The Data Journalism Handbook is a project coordinated by the European Journalism Centre and the Open Knowledge Foundation, launched at the Mozilla Festival in London on 5 November 2011.

Journalists and experts in data gathered to create the first ever handbook to data journalism over a two-days challenge.

Read more about the Data Journalism Handbook in this article by Federica Cocco.

What data tool or great example of data journalism would you add to the handbook? Let’s make this comments section useful!

Every contribution, big or small, to the Data Journalism Handbook is very much appreciated. So use this space to give us links and examples to what you think should be included in the manual.

And if you feel more chatty, email us at editor@datajournalismblog.com

14 Nov 2011

By Federica Cocco

This article is cross posted on DataDrivenJournalism.net, the Open Knowledge Foundation blog and on the Data Journalism Blog.

Ravensbourne college is an ultramodern cubist design school which abuts the O2 arena on the Greenwich peninsula. It is perhaps an unusual and yet apt setting for journalists to meet.

Members of the Open Knowledge Foundation and the European Journalism Centre saw this as a perfect opportunity to herd a number of prominent journalists and developers who, fuelled by an unlimited supply of mocacchinos, started work on the first Data Journalism Handbook.

The occasion was the yearly Mozilla Festival, which acts as an incubator to many such gatherings. This year the focus was on media, freedom and the web.

The manual aims to address one crucial problem: “There are a lot of useful resources on the web,” Liliana Bounegru of the EJC said, “but they are all scattered in different places. So what we’re trying to do is put everything together and have a comprehensive step-by-step guide”.

In data journalism, most people are self-taught, and many find it hard to keep up-to-date with every tool produced by the industry. “It could be vital having a handbook that really explains to journalists how you can approach data journalism from scratch with no prior knowledge, ” says Caelainn Barr of the Bureau of Investigative Journalism
Friedrich Lindenberg of the OKF believes there is a real urgency in making newsrooms data-literate: “If journalists want to keep up with the information they need to learn coding, and some bits of data analysis and data-slicing techniques. That will make much better journalism and increase accountability.”

And who better than the New York Times’ Interactive Editor Aron Pilhofer, The Guardian Data Blog’s Simon Rogers and others to lead the ambitious efforts?
In charge of sorting the wheat from the chaff, around 40 people joined them in the sixth floor of the college, for a 48 hour session.

The first draft of the handbook should be ready in the coming months, as other contributions from every corner of the web are still working on making an input.
Of course the first data journalism handbook had to be open source. How else would it be able to age gracefully and be relevant in years to come?

Workshops of this sort represent a decisively different break from the past. Aspiring data journalists will know that hands-on sessions are a cut above the usual lectures featuring knowledgeable speakers and PowerPoint presentations. Discussing the topic and citing examples is not enough. After all, if you give a man a fish you have fed him for a day. But if you teach a man ho w to fish, you have him fed for a lifetime.

Jonathan Gray concurs: “Rather than just provide examples of things that have been done with data, we want to make it easier for journalists to understand what data is available, what tools they can use to work with data, how they can visualise data sets and how they can integrate that with the existing workflows of their news organisations.”

At the event itself, after a brief introduction, the crowd split into five groups and began collaborating on each chapter of the handbook. Some were there to instill knowledge, others were there to absorb and ask questions.

“I like the fact that everyone is bringing a different skillset to the table, and we’re all challenging each other”, one participant said.

Francis Irving, CEO of ScraperWiki, led the session on new methods of data acquisitions. He believes the collaboration between journalists, programmers, developers and designers, though crucial, can generate a culture clash: “When working with data, there’s a communication question, how do you convey what you need to someone more technical and how do they then use that to find it in a way that’s useful.”

“A project like this is quite necessary,” noted Pilhofer, “It’s kind of surprising someone hasn’t tried to do this until now.”

The free e-book will be downloadable from the European Journalism Centre’s DataDrivenJournalism.net/handbook in the coming months. If you want to follow our progress or contribute to the handbook you can get in touch via the data journalism mailing list, the Twitter hashtags #ddj and #ddjbook, or email bounegru@ejc.net.

Watch here the full video report from the Data Journalism Handbook session at the Mozilla Festival, 4-6 November in London.

The organisers would like to thank everyone who is contributing to the handbook for their input and to Kate Hudson for the beautiful graphics.

About the author: Federica Cocco is a freelance journalist and the former editor of Owni.eu, a data-driven investigative journalism site based in Paris. She has also worked with Wired, Channel 4 and the Guardian. 


01 Nov 2011

The following post is from Jonathan Gray, Community Coordinator at the Open Knowledge Foundation.

With the Mozilla Festival approaching fast, we’re getting really excited about getting stuck into drafting the Data Journalism Handbook, in a series of sessions run by the Open Knowledge Foundation and the European Journalism Centre.

As we blogged about last month, a group of leading data journalists, developers and others are meeting to kickstart work on the handbook, which will aim to get aspiring data journalists started with everything from finding and requesting data they need, using off the shelf tools for data analysis and visualisation, how to hunt for stories in big databases, how to use data to augment stories, and plenty more.

We’ve got a stellar line up of contributors confirmed, including:

Here’s a sneak preview of our draft table of contents:

  • Introduction
    • What is data journalism?
    • Why is it important?
    • How is it done?
    • Examples, case studies and interviews
      • Data powered stories
      • Data served with stories
      • Data driven applications
    • Making the case for data journalism
      • Measuring impact
      • Sustainability and business models
    • The purpose of this book
    • Add to this book
    • Share this book
  • Getting data
    • Where does data live?
      • Open data portals
      • Social data services
      • Research data
    • Asking for data
      • Freedom of Information laws
      • Helpful public servants
      • Open data initiatives
    • Getting your own data
      • Scraping data
      • Crowdsourcing data
      • Forms, spreadsheets and maps
  • Understanding data
    • Data literacy
    • Working with data
    • Tools for analysing data
    • Putting data into context
    • Annotating data
  • Delivering data
    • Knowing the law
    • Publishing data
    • Visualising data
    • Data driven applications
    • From datasets to stories
  • Appendix
    • Further resources

If you’re interested in contributing you can either:

  1. Come and find us at the Mozilla Festival in London this weekend!
  2. Contribute material virtually! You can pitch in your ideas via the public data-driven-journalismmailing list, via the #ddj hashtag on Twitter, or by sending an email to bounegru@ejc.net.

We hope to see you there!

18 Oct 2011

” – Hey! Are you coming to the free seminar on data visualisation for journalists this Thursday?

– Where is it?

– Everywhere! I mean, anywhere you like, it’s broadcast live on the internet at 4pm UK time.

– hell, yeah, I’ll come! Who’s talking?

– Only some big names in data journalism: Xaquín G.V. from The New York Times, Annamarie Cumiskey from the Bureau of Investigative Journalism, Mar Cabra of the International Consortium of Investigative Journalists – ICIJ, and David Cabo of Pro Bono Público

– Pro Bono Publico? Is that held in Spain then?

– Yep, It’s happening in Madrid at Medialab-Prado, a program of the Department of Arts of the City Council. You should check out their website, they have some really interesting stuff in terms of arts and visualisations.

– Great!

– If you want more information, take a look at the schedule here. The conference will be conducted in Spanish and English and will be translated live.

– That’s gonna be interesting 😉 Will I be able to ask some questions at the end?

– There will be some discussion afterwards but I don’t know whether the online audience will be able to join in. A workgroup on data journalism will also be launched during the event, seeking to bring together professionals interested in data visualisations, from journalists to graphic designers, who will then meet regularly at Medialab-Prado.

– Looking forward to see how it turns out.. Thanks for the info, speak to you on Thursday! You will write something on the Data Journalism Blog about this right?

– Sure! I might just copy and paste this conversation though.. 🙂

– You should! ”


18 Oct 2011



The annual IEEE Visualization, IEEE Information Visualization and IEEE Visual Analytics Science and Technology conferences – together known as IEEE Visweekwill be held in Providence, RI from October 23rd to October 28th.The detailed conference program is spectacular and can be downloaded here.Some of the new events this year are under the Professional’s Compass category. It includes a Blind date lunch (where one can meet some researcher they have never met and learn about each others research), Meet the Editors (where one can meet editors from the top graphics and visualization journals), Lunch with the Leaders session (an opportunity to meet famous researchers in the field) and Meet the faculty/postdoc candidates (especially geared towards individuals looking for a postdoctoral position or a faculty position). I think this is an excellent idea and hope that the event is a hit at the conference.

I am also eagerly looking forward towards the two collocated symposia – IEEE Biological Data Visualization (popularly known as biovis) and IEEE LDAV (Large data analysis and visualization).  Their excellent programs are out and I’d encourage you to take a look at them.

The tutorials this year look great and I am particularly looking forward to the tutorial on Perception and Cognition for Visualization, Visual Data Analysis and Computer Graphics by Bernice Rogowitz. Here is anoutline for the tutorial that can be found on her website. She was one of the first people to recommend that people STOP using the rainbow color map.

The telling stories with data workshop too looks great and will be a continuation of the great tutorial held by the same group last year. I am eagerly looking forward to it. [Read more…]

01 Oct 2011

SPLUNK BLOG – By Paul Wilke

Last week, I was fortunate enough to attend the Strata Big Data Conference in New York. With the conference spanning four days, two hotels, and over 400 attendees one thing stood out… big data is a hot topic!

Splunk was featured in two sessions. On Tuesday, Splunk CIO Doug Harr was part of a panel discussion on the changing role of the CIO, where he and the panel (which included CIOs from Batchtags, Accenture and Revolution Analytics) pointed out that the CIO  role is changing and expanding. The function has evolved into one of the most crucial positions in corporations focusing on sustainable growth.

On Friday Splunk Product Manager Jake Flomenberg took the stage with Denise Hemke from Salesforce.com to talk about gleaning new insights from massive amounts of machine data. Denise highlighted how at Salesforce a Chatter group is devoted to sharing ideas on how they work with Splunk so they can make the most of Splunk solutions. To highlight the usefulness of big data in a way that just about everyone could relate to, Jake showed how Splunk could be used to find the average price of pizza in New York City – definitely an example of using data for food, not evil!

Jake also gave a great interview at the conference, which you can see here:

Overall, a great crowd and very strong topics. One of my favorite sessions was current New York Mets’ executive Paul DePodesta talking about the big data behind Moneyball. It’s a shame the Mets aren’t taking it to heart this season. As the Splunk t-shirts we handed out at Strata say, “A petabyte of data is a terrible thing to waste”.

Read the original post on Splunk Blog here.

29 Sep 2011

OKF – By Lucy Chambers

This post is by Lucy Chambers, Community Coordinator at the Open Knowledge Foundation. The post contains a link to a report on the OKF / EJC Data Driven Journalism workshop on EU Spending, which took place in Utrecht, the Netherlands, on 8th-9th September.


The report was written by Nicolas Kayser-Bril who attended the workshop, and may be helping to run the next in the series in Warsaw in October… stay tuned to the data-driven journalism mailing list for more on the upcoming workshops…



“Data journalism is hard, but that’s precisely what makes it worthwhile… Not every journalist has the skills, knowledge or the commitment to dig into the data…so the ones who do are at a massive advantage” – Chris Taggart [paraphrased], closing remarks




The first in what we hope will become series of data-driven journalism events, the European Journalism Centre and the OKF teamed up alongside a crack-team experts to help tackle some of the technical & research-based challenges facing the modern journalist.


I have no intention of re-inventing the wheel here by giving a full rundown; Nicolas sums up the workshop & gives his insightful ideas for future workshops in his report on the Data Driven Journalism Blog from the EJC far better than I would. You can read the full report here. But just to whet your appetite here and now, here is a snippet:


“As Friedrich Lindenberg was writing this abstruse code on his MacBook plugged on the beamer at the workshop on EU spending on 9 September, 20 journalists listened attentively as data started to speak before their eyes. In a conference room in Utrecht University’s 15th-century Faculty Club, the group from across Europe watched as Lindenberg compared a list of lobbying firms with the list of accredited experts at the European Commission: Any overlap would clearly suggest a conflict of interest.”

“More than watching, the audience actually followed in Lindenberg’s steps on Google Refine, an Excel-like tool, and was taming the data on their own laptops. At this point in time, more journalists were engaging in data-mining in Utrecht than in any other newsroom. This practical exercise was the climax of two days of learning to investigate the mountains of data produced by European institutions. Besides Lindenberg, the coder behindOpenSpending, EU datajournalist Caelainn BarrOpenCorporates founder Chris Taggart and Erik Wesselius of Corporate Europe shared expertise with participants…”





The workshop clearly indicated that there is a great demand for practical skill-based workshops amongst journalists to help them to reap maximum benefit from all the data that is available. One person even asked for a week-long version of the workshop, covering everything in more detail!


We’ll see about the week-long session, but if you are sorry to have missed the last short workshop, don’t despair, there are more workshops coming soon!


Data-journalist? Data-wrangler? Tech geek? New to the field?


Will you be in or around Warsaw on 19th October?


We will be holding a one-day workshop in Warsaw in the run-up to Open Government Data Camp. The important thing to stress about this workshop is that we are looking to have a good ratio of technical people (e.g. programmers & data wranglers) to journalists, so that we can create smaller groups to really go into detail to get the results, fast!


We will post more information about the workshop in the coming days, but places will be limited, so if you are keen (& organised) request an invitation by contacting us now.