06 May 2012

After studying Data Journalism for a year at City University I have come to appreciate the importance of having the skillset to make the most out of numbers and statistics. Many aspiring journalists still see data as something that is separate from journalism, and as something that does not interest them. In response, I have compiled some reasons why data is increasingly important:

1.       Make sense of Mass Information

Having the skills to scrape, analyse, clean and present data allows journalists to present complicated and otherwise incomprehensible information in a clear way. It is an essential part of journalism to find material and present it to the public. Understanding data allows journalists to do this with large amounts of information, which would otherwise be impossible to understand.

2.       New Approaches to Storytelling

Able to create infographics and visualisations, data journalists can see and present information in a new and interesting way. Stories no longer need to be linear and based solely on text. Data can be grafted into a narrative which people can read visually. Interactive elements of data visualisations allow people to explore the information presented and make sense of it in their own way.

3.       Data Journalism is the Future

Understanding data now will put journalists ahead of the game. Information is increasingly being sourced and presented using data. Journalists who refuse to adapt to the modern, increasingly technological world will be unable to get the best stories, by-lines and scoops and their careers will suffer as a result.

4.       Save Time

No longer must journalists pore over spread-sheets and numbers for hours when there could be a simpler way to organise the information. Being technologically savvy and knowing the skills to apply to data sets can save journalists time when cleaning, organising and making sense of data. Not making mistakes due to lack of knowledge can also save a journalist time.

5.       A way to see things you might otherwise not see

Understanding large data sets can allow journalists to see significant information that they might otherwise have overlooked. Equally, some stories are best told using data visualisations as this enables people to see things that they might otherwise have been unable to understand.

 6.       A way to tell richer stories

Combining traditional methods of storytelling with data visualisations, infographics, video or photographs, creates richer, more interesting and detailed stories.

7.       Data is an essential part of Journalism

Many journalists do not see data as a specialist and separate area of journalism, but an interwoven, essential and important element of it. It is not there to replace traditional methods of finding information, but to enhance them. The journalist that can combine a good contact book and an understanding of data will be invaluable in the future.

24 Jan 2012




Brian Boyer, news applications editor at the Chicago Tribune, describes PANDA in a video by Jon Vidar.

Earlier this month PANDA, which helps news organizations better use public information by creating new software that cleans up and helps analyze it, went beta.

Users can now test PANDA Project Alpha and give feedback on how it’s doing.

Above, Brian Boyer describes the project’s latest developments.

Boyer is news applications editor at the Chicago Tribune, where he was the 2010 employee of the year. [Read more…]

Tool of the week: Playground, by PeopleBrowsr.

This post was first published on Journalism.co.uk

What is it? A social analytics platform which contains over 1,000 days of tweets (all 70 billion of them), Facebook activity and blog posts.

How is it of use to journalists? “Journalists can easily develop real-time insights into any story from Playground,” PeopleBrowsr UK CEO Andrew Grill explains.

Complex keyword searches can be divided by user influence, geolocation, sentiment, and virtual communities of people with shared interests and affinities.

These features – and many more – let reporters and researchers easily drill down to find the people and content driving the conversation on social networks on any subject.

Playground lets you use the data the way you want to use it. You can either export the graphs and tables that the site produces automatically or export the results in a CSV file to create your own visualisations, which could potentially make it the next favourite tool of data journalists.

Grill added:

The recent launch of our fully transparent Kred influencer platform will make it faster and easier for journalists to find key influencers in a particular community.

You can give Playground a try for the first 14 days before signing up for one of their subscriptions ($19 a month for students and journalists, $149 for organisations and companies).

Jodee Rich, the founder of PeopleBrowsr, gave an inspiring speech at the Strata Summit in September on how a TV ratings system such as Nielsen could soon be replaced by social media data thanks to the advanced online analytics that PeopleBrowsr offers.


Playground’s development is based on feedback from its community of users, which has been very responsive. Ideas can be sent to contact[@]peoplebrowsr.com or by tweeting@peoplebrowsr.

16 Nov 2011



This post is by Lucy Chambers, community coordinator at the Open Knowledge Foundation, and Friedrich Lindenberg, Developer on OpenSpending. They recently attended the Global Investigative Journalism Conference 2011 in Kyiv, Ukraine, and in this post, bring home their thoughts on journalist-programmer collaboration…

The conference

The Global Investigative Journalism Conference must be one of the most intense yet rewarding experiences either of us have attended since joining the OKF. With topics ranging from human trafficking to offshore companies, the meeting highlighted the importance of long-term, investigative reporting in great clarity.

With around 500 participants from all over the globe with plenty of experience in evidence gathering, we used this opportunity to ask many of them how platforms like OpenSpending can contribute, not only to the way in which data is presented, but also to how it is gathered and analyzed in the course of an investigation.

Spending Stories – the brainstorm

As many of you will be aware, earlier this year we won a Knight News Challenge award to help journalists contextualise and build narratives around spending data. Research for the project, Spending Stories, was one of the main reasons for our trip to Ukraine…

During the data clinic session as well as over drinks in the bar of “Hotel President” we asked the investigators what they would like to see in a spend analysis platform targeted at data journalists. Cutting to the chase, they immediately raised the key questions:


It was clear that the platform should support the existing journalistic workflow through publishing embargos, private datasets and note making. At the same time, the need for statistical and analytical heuristics to dissect the data, find outliers and visualize distributions was highlighted as a means to enable truly data-driven investigations of datasets. The goal in this is to distinguish anomalies from errors and patterns of corruption from policies.


With the data loaded and analyzed, the next question is what value can be added to published articles. Just like DocumentCloud enabled the easy embedding of source documents and excerpts, OpenSpending should allow journalists to visualize distributions of funds, embed search widgets and data links, as well as information about how the data was acquired and cleaned.


Many of those we spoke to were concerned about the complexity required to contribute data. The recurring question was: should I even try myself or hire help? It’s clear that for the platform to be accessible to journalists, a large variety of data cleansing tutorials, examples and tools need to be at their disposal.

We’ve listed the full brainstorm on the OpenSpending wiki

You can also see the mind map with concrete points below:

Hacks & Scrapers – How technical need data journalists be?

In a second session, “Data Camp” we went through the question of how to generate structured data from unstructured sources such as web pages and PDF documents. [Read more…]

15 Nov 2011

Nicola Hughes from ScraperWiki shared this video on Twitter recently and we thought it would be a shame not to share it with you too.

Experts in data mining gathered at the Paley Center for Media on 10 November 2011 to discuss the future of journalism and how to sustain a journalism watchdod in the digital age. This session is about data mining and the new tools available online.

Watch the video and let us know what you think. If you’ve used some of them, tell us how good -or how bad- you think they are…

Next Big Thing: New Tools for Digital Digging from The Paley Center For Media on FORA.tv

Presenters include:

Bill Allison

Bill Allison is the Editorial Director at the Sunlight Foundation. A veteran investigative journalist and editor for nonprofit media, Bill worked for the Center for Public Integrity for nine years, where he co-authored The Cheating of America with Charles Lewis, was senior editor of The Buying of the President 2000 and co-editor of the New York Times bestseller The Buying of the President 2004.

He edited projects on topics ranging from the role of international arms smugglers and private military companies in failing states around the world to the rise of section 527 organizations in American politics. Prior to joining the Center, Bill worked for eight years for The Philadelphia Inquirer — the last two as researcher for Pulitzer Prize winning reporters Donald L. Barlett and James B. Steele.


David Donald

David Donald, United States , is data editor at the Center for Public Integrity, where he oversees data analysis and computer-assisted reporting at the Washington-based investigative journalism nonprofit.


Sheila Krumholz

Sheila Krumholz is the Center for Responsive Politics’ executive director, serving as the organization’s chief administrator, the liaison to its board and major funders and its primary spokesperson.

Sheila became executive director in 2006, having served for eight years as the Center’s research director, supervising data analysis for OpenSecrets.org and the Center’s clients. She first joined the Center in 1989, serving as assistant editor of the very first edition of Open Secrets, the Center’s flagship publication.

In 2010, Fast Company magazine named Sheila to its “Most Influential Women in Technology” list. Sheila has a degree in international relations and political science from the University of Minnesota.

Jennifer 8. Lee

Jennifer 8. Lee authors The Fortune Cookie Chronicles ($24.99). Also, she’s a New York Times reporter.


Nadi Penjarla

Nadi Penjarla is the chief architect and designer of the Ujima Project. The Ujima Project (www.ujima-project.org) is a collection of databases, documents and other resources that aims to bring transparency to the workings of governments, multinational non-governmental organizations and business enterprises.

Nadi’s work demonstrates that data analysis provides unique insights into international and local political controversies and brings the facts of the world into sharper focus. He has spoken and conducted workshops on computer assisted reporting at international forums such as the ABRAJI Conference in Sao Paulo, Brazil, the GLMC Investigative Journalism Forum in Kigali, Rwanda, and at the Annual Investigative Reporters & Editors (IRE) Conference.

Nadi possesses a strong background in data analysis and data mining, including work as an investment banker, and a strategy and business analytics consultant. Past projects include consulting for Fortune 500 companies on how to improve strategic decision-making, enhance operations, conduct complementary marketing and transform related business processes by properly analyzing data and its implications. In 2003 Nadi was the founding editor of Global Tryst, an online magazine focusing on international issues from a grassroots perspective.

Nadi holds an MBA from the University of Chicago, an M.S in Engineering and Computer Science, and a B.S. in Engineering. He can be reached at 202-531-9300 or at nadi.penjarla@gmail.com

14 Nov 2011

This video is cross posted on DataDrivenJournalism.net, the Open Knowledge Foundation blog and on the Data Journalism Blog.

The Data Journalism Handbook is a project coordinated by the European Journalism Centre and the Open Knowledge Foundation, launched at the Mozilla Festival in London on 5 November 2011.

Journalists and experts in data gathered to create the first ever handbook to data journalism over a two-days challenge.

Read more about the Data Journalism Handbook in this article by Federica Cocco.

What data tool or great example of data journalism would you add to the handbook? Let’s make this comments section useful!

Every contribution, big or small, to the Data Journalism Handbook is very much appreciated. So use this space to give us links and examples to what you think should be included in the manual.

And if you feel more chatty, email us at editor@datajournalismblog.com

14 Nov 2011

By Federica Cocco

This article is cross posted on DataDrivenJournalism.net, the Open Knowledge Foundation blog and on the Data Journalism Blog.

Ravensbourne college is an ultramodern cubist design school which abuts the O2 arena on the Greenwich peninsula. It is perhaps an unusual and yet apt setting for journalists to meet.

Members of the Open Knowledge Foundation and the European Journalism Centre saw this as a perfect opportunity to herd a number of prominent journalists and developers who, fuelled by an unlimited supply of mocacchinos, started work on the first Data Journalism Handbook.

The occasion was the yearly Mozilla Festival, which acts as an incubator to many such gatherings. This year the focus was on media, freedom and the web.

The manual aims to address one crucial problem: “There are a lot of useful resources on the web,” Liliana Bounegru of the EJC said, “but they are all scattered in different places. So what we’re trying to do is put everything together and have a comprehensive step-by-step guide”.

In data journalism, most people are self-taught, and many find it hard to keep up-to-date with every tool produced by the industry. “It could be vital having a handbook that really explains to journalists how you can approach data journalism from scratch with no prior knowledge, ” says Caelainn Barr of the Bureau of Investigative Journalism
Friedrich Lindenberg of the OKF believes there is a real urgency in making newsrooms data-literate: “If journalists want to keep up with the information they need to learn coding, and some bits of data analysis and data-slicing techniques. That will make much better journalism and increase accountability.”

And who better than the New York Times’ Interactive Editor Aron Pilhofer, The Guardian Data Blog’s Simon Rogers and others to lead the ambitious efforts?
In charge of sorting the wheat from the chaff, around 40 people joined them in the sixth floor of the college, for a 48 hour session.

The first draft of the handbook should be ready in the coming months, as other contributions from every corner of the web are still working on making an input.
Of course the first data journalism handbook had to be open source. How else would it be able to age gracefully and be relevant in years to come?

Workshops of this sort represent a decisively different break from the past. Aspiring data journalists will know that hands-on sessions are a cut above the usual lectures featuring knowledgeable speakers and PowerPoint presentations. Discussing the topic and citing examples is not enough. After all, if you give a man a fish you have fed him for a day. But if you teach a man ho w to fish, you have him fed for a lifetime.

Jonathan Gray concurs: “Rather than just provide examples of things that have been done with data, we want to make it easier for journalists to understand what data is available, what tools they can use to work with data, how they can visualise data sets and how they can integrate that with the existing workflows of their news organisations.”

At the event itself, after a brief introduction, the crowd split into five groups and began collaborating on each chapter of the handbook. Some were there to instill knowledge, others were there to absorb and ask questions.

“I like the fact that everyone is bringing a different skillset to the table, and we’re all challenging each other”, one participant said.

Francis Irving, CEO of ScraperWiki, led the session on new methods of data acquisitions. He believes the collaboration between journalists, programmers, developers and designers, though crucial, can generate a culture clash: “When working with data, there’s a communication question, how do you convey what you need to someone more technical and how do they then use that to find it in a way that’s useful.”

“A project like this is quite necessary,” noted Pilhofer, “It’s kind of surprising someone hasn’t tried to do this until now.”

The free e-book will be downloadable from the European Journalism Centre’s DataDrivenJournalism.net/handbook in the coming months. If you want to follow our progress or contribute to the handbook you can get in touch via the data journalism mailing list, the Twitter hashtags #ddj and #ddjbook, or email bounegru@ejc.net.

Watch here the full video report from the Data Journalism Handbook session at the Mozilla Festival, 4-6 November in London.

The organisers would like to thank everyone who is contributing to the handbook for their input and to Kate Hudson for the beautiful graphics.

About the author: Federica Cocco is a freelance journalist and the former editor of Owni.eu, a data-driven investigative journalism site based in Paris. She has also worked with Wired, Channel 4 and the Guardian. 


01 Nov 2011

The following post is from Jonathan Gray, Community Coordinator at the Open Knowledge Foundation.

With the Mozilla Festival approaching fast, we’re getting really excited about getting stuck into drafting the Data Journalism Handbook, in a series of sessions run by the Open Knowledge Foundation and the European Journalism Centre.

As we blogged about last month, a group of leading data journalists, developers and others are meeting to kickstart work on the handbook, which will aim to get aspiring data journalists started with everything from finding and requesting data they need, using off the shelf tools for data analysis and visualisation, how to hunt for stories in big databases, how to use data to augment stories, and plenty more.

We’ve got a stellar line up of contributors confirmed, including:

Here’s a sneak preview of our draft table of contents:

  • Introduction
    • What is data journalism?
    • Why is it important?
    • How is it done?
    • Examples, case studies and interviews
      • Data powered stories
      • Data served with stories
      • Data driven applications
    • Making the case for data journalism
      • Measuring impact
      • Sustainability and business models
    • The purpose of this book
    • Add to this book
    • Share this book
  • Getting data
    • Where does data live?
      • Open data portals
      • Social data services
      • Research data
    • Asking for data
      • Freedom of Information laws
      • Helpful public servants
      • Open data initiatives
    • Getting your own data
      • Scraping data
      • Crowdsourcing data
      • Forms, spreadsheets and maps
  • Understanding data
    • Data literacy
    • Working with data
    • Tools for analysing data
    • Putting data into context
    • Annotating data
  • Delivering data
    • Knowing the law
    • Publishing data
    • Visualising data
    • Data driven applications
    • From datasets to stories
  • Appendix
    • Further resources

If you’re interested in contributing you can either:

  1. Come and find us at the Mozilla Festival in London this weekend!
  2. Contribute material virtually! You can pitch in your ideas via the public data-driven-journalismmailing list, via the #ddj hashtag on Twitter, or by sending an email to bounegru@ejc.net.

We hope to see you there!

05 Oct 2011

OJB – By Paul Bradshaw

I was very excited recently to read on the Scraperwiki mailing list that the website was working on making it possible to create an RSS feed from a SQL query.

Yes, that’s the sort of thing that gets me excited these days.

But before you reach for a blunt object to knock some sense into me, allow me to explain…

Scraperwiki has, until now, done very well at trying to make it easier to get hold of hard-to-reach data. It has done this in two ways: firstly by creating an environment which lowers the technical barrier to creating scrapers (these get hold of the data); and secondly by lowering the social barrier to creating scrapers (by hosting a space where journalists can ask developers for help in writing scrapers).

This move, however, does something different.

It allows you to ask questions – of any dataset on the site. Not only that, but it allows you to receive updates as those answers change. And those updates come in an RSS feed, which opens up all sorts of possibilities around automatically publishing those answers.

The blog post explaining the development already has a couple of examples of this in practice:

Anna, for example, has scraped data on alcohol licence applications. The new feature not only allows her to get a constant update of new applications in her RSS reader – but you could also customise that feed to tell you about licence applications on a particular street, or from a particular applicant, and so on. [Read more…]