08 May 2012

After spending the past seven months studying data, I’ve learnt at least one important lesson: data journalism doesn’t have to be impossibly hard. While there are plenty of things about data journalism which will always go straight over my head – there are also a lot of easy techniques, formulas and programmes. So here are my top 5 data recommendations, for dummies.

1)      Pdf to Excel

Scraping doesn’t have to be hard. Don’t spend hours faffing about with Scraperwiki if there’s an easier alternative. There are plenty of scraping programmes such as Pdf to Excel or GCal to Excel (which can extract data from Google Calendars).

They require absolutely no intelligence, and work like magic.

2)      =ImportHTML

This nifty formula requires next to no brain power. It also happens to be one of the most useful formulae for extracting data from webpages o spreadsheets. Just type =importHTML(‘URL containing table’, ‘table’, table index) into a cell and it will suck out all the data from that webpage, placing it on your spreadsheet.

Use this formulae in Google Docs and your spreadsheet will update automatically if/when the webpage changes.

3)      VLookups

VLookups are God’s gift to Excel. If you have two sheets on a spreadsheet and they relate to one another, you can use a VLookup formula to hook them up.

The formula is as follows:


If you’re not sure what all these words mean, have a look at the documentation.


You can use this to merge cells. So if, for example you have data which contains individuals’ names – and the first and last name appeared in different columns,  you could use this formula to merge them back together.

EG. If A2 contains ‘Joe’ and B2 contains ‘Blogs’. In cell C2 you can type this:


5)      Lastly, here’s the best website for finding KML files (the shape files needed to highlight areas on Google maps)

http://mapit.mysociety.org/ contains all the KMLs you could possibly need, as well lat/long and postcode lookups. All very easy to find.