Data Journalism Blog http://www.datajournalismblog.com Get the latest news on data driven journalism with interviews, reviews and news features Thu, 01 Oct 2015 11:17:33 +0000 en-US hourly 1 https://wordpress.org/?v=4.6.6 http://www.datajournalismblog.com/wp-content/uploads/2015/09/cropped-DJB-Logo-2015-square-simple-32x32.jpg Data Journalism Blog http://www.datajournalismblog.com 32 32 The New Data Journalism Blog is live http://www.datajournalismblog.com/2015/10/01/the-new-data-journalism-blog-is-live/ http://www.datajournalismblog.com/2015/10/01/the-new-data-journalism-blog-is-live/#comments Thu, 01 Oct 2015 02:36:03 +0000 http://www.datajournalismblog.com/?p=1752 Welcome to our new home. As you can see, we’ve redecorated the place. I am excited to share with you the project that kept us busy for the past few months. The new DJB is bolder, savvier, smarter, and packed with insights from the world of data journalism and innovative storytelling. We have a lot […]

The post The New Data Journalism Blog is live appeared first on Data Journalism Blog.

]]>

Welcome to our new home. As you can see, we’ve redecorated the place.

I am excited to share with you the project that kept us busy for the past few months.

The new DJB is bolder, savvier, smarter, and packed with insights from the world of data journalism and innovative storytelling.

We have a lot of new content lined up for you: articles, reviews, how-to guides and interviews with experts from the fields of data visualisations, programming and investigative reporting. As well as a few specials.

« Data » is a big buzz word, it’s also a great way to tell stories we couldn’t tell before.

We hope to launch an array of compelling web projects in the near future that will inform our audience in an engaging way, while becoming the prime destination for knowledge on data journalism and innovative storytelling.

 

Hei-Da.org: a not-for-profit fostering data journalism and web innovation

So we have this great new look and lots of new content. But that’s not the only change that we’ve made. There’s more…

The DJB is now part of the Hei-Da social enterprise for data journalism and web innovation, and we are very excited about it. But what does it mean exactly?

Hei-Da is a not-for-profit organisation fostering the future of data journalism, open data and innovative storytelling.

Its mission is to nurture the future of its field by building an innovation hub dedicated to research in the field of data journalism and web innovation where experiments, training and conferences would take place, unlikely collaborations would blossom, and startups tackling technologies related to data journalism can get advice and support.

We believe it is important that knowledge, skills and ideas get shared and reflected upon. We also think that news is not the only place for data storytelling skills to be used. Many NGOs, charities, local communities, governments and other organisations have data at hands that could tell compelling stories, yet they rarely have the time nor expertise to produce them. Hei-Da was also created to help them harness that data and create interactive storytelling projects on the web that support their mission.

For this to happen, we will need to gather the partners, sponsors and funding necessary for such an ambitious project. If you think you can help, please get in touch.

 

The DJB at TechFugees

Today is the start of the TechFugees conference in London, an exciting, absolutely free and nonprofit event organised by TechCrunch editor at large Mike Butcher to find technology solutions to the refugee crisis.

The Data Journalism Blog supports this event and I will be talking at the conference about our initiative, how data journalism has been used to cover the refugee crisis, what challenges news organisations face to get data on the crisis and what technology solutions there could be to facilitate data gathering, publishing and storytelling on the ground.

We will be covering the conference on the Data Journalism Blog (you can already see an introductory post here) and Andrew Rininsland, senior developer at The Times and The Sunday Times, will tell us about his experience of the Techfugees Hackathon happening on Friday, October 2nd in London (if you want to join, tickets are still available here).

 

We’ve only just begun

The Data Journalism Blog is built for a global audience of journalists, designers, developers and other data enthusiasts. People who are interested in the emergence of open data, both experts and amateurs, and want to understand better how it could change the future of information. Or, people who really like fancy infographics and want to find more data visualisations from various sources. Part of the content is very specific and would require knowledge about data journalism, other parts are very broad and could suit more novice readers.

We will thrive to push innovation to the full and experiment new techniques for ourselves, team up with partners to create compelling and interactive storytelling projects as well as deliver news and insights from the industry here on the DJB. So sit back, let us know what you think and let’s enjoy the journey. This is only the beginning.

For more info on Hei-Da.org, go and check out the website.

I hope you enjoy the new look and would love to hear your views. Catch us on Facebook and Twitter.

marianne-bouchart
Marianne is the founder and director of Hei-Da.org, a not-for-profit organisation based in London, UK, that specialises in open data driven projects and innovative storytelling. She also created the Data Journalism Blog back in 2011 and used to work as the Web Producer EMEA, Graphics and Data Journalism Editor for Bloomberg News.
Passionate about innovative story telling, she teaches data journalism at the University of Westminster and University of the Arts, London.

The post The New Data Journalism Blog is live appeared first on Data Journalism Blog.

]]>
http://www.datajournalismblog.com/2015/10/01/the-new-data-journalism-blog-is-live/feed/ 1
TechFugees conference hits London http://www.datajournalismblog.com/2015/10/01/techfugees-conference-hits-london/ Thu, 01 Oct 2015 02:15:45 +0000 http://www.datajournalismblog.com/?p=1923 Today is the day of the TechFugees conference in London, an exciting, absolutely free and nonprofit event organised by TechCrunch editor-at-large Mike Butcher to find technology solutions to the refugee crisis.   “Moved by the plight of refugees in Europe, a number of technology industry people have formed a small voluntary team to create the free, non-profit, […]

The post TechFugees conference hits London appeared first on Data Journalism Blog.

]]>
Today is the day of the TechFugees conference in London, an exciting, absolutely free and nonprofit event organised by TechCrunch editor-at-large Mike Butcher to find technology solutions to the refugee crisis.

01102015-TechFugees1

 

“Moved by the plight of refugees in Europe, a number of technology industry people have formed a small voluntary team to create the free, non-profit, “Techfugees” conference and hackathon.” — Mike Butcher

In just a few weeks, the Techfugees Facebook Group and Twitter account have exploded. Over 700 people from the tech community signed up to the event proving there is clearly a huge desire amongst the tech community to get involved.
Tech engineers, entrepreneurs and startups together with NGOs and other agencies will gather at SkillsMatter HQ in London to address the crisis in ways where the technology world can bring its considerable firepower.
Hei-Da and the Data Journalism Blog support this event and I will be talking at the conference about our initiative, how data journalism has been used to cover the refugee crisis, what challenges news organisations face to get data on the crisis and what technology solutions there could be to facilitate data gathering, publishing and storytelling on the ground.
Andrew Rininsland, senior developer at The Times and Sunday Times, also contributor of the DJB, will also tell us about his experience of the Techfugees Hackathon happening on Friday, October 2nd in London (tickets still available here).

marianne-bouchart
Marianne is the founder and director of Hei-Da.org, a not-for-profit organisation based in London, UK, that specialises in open data driven projects and innovative storytelling. She also created the Data Journalism Blog back in 2011 and used to work as the Web Producer EMEA, Graphics and Data Journalism Editor for Bloomberg News.
Passionate about innovative story telling, she teaches data journalism at the University of Westminster and University of the Arts, London.

The post TechFugees conference hits London appeared first on Data Journalism Blog.

]]>
3 Golden Rules to #ddj — Ændrew Rininsland http://www.datajournalismblog.com/2015/10/01/3-golden-rules-to-data-journalism-aendrew-rininsland/ Thu, 01 Oct 2015 01:53:19 +0000 http://www.datajournalismblog.com/?p=1914 1. Tell the reader what the data means Tools like Tableau make it really easy to make exploratory visualisations, giving the user the ability to sift through the data and localise it to themselves. However, as tempting as this can be, the role of the data journalist it to tell the reader what the data […]

The post 3 Golden Rules to #ddj — Ændrew Rininsland appeared first on Data Journalism Blog.

]]>
1. Tell the reader what the data means

Tools like Tableau make it really easy to make exploratory visualisations, giving the user the ability to sift through the data and localise it to themselves. However, as tempting as this can be, the role of the data journalist it to tell the reader what the data means — if you have a dataset that includes the entire country but only a handful of locations are relevant to your story, an exploratory map isn’t the best approach. Aim for explanatory visualisations.

 

2. Simple is usually better

A quick glance through the examples page of d3js.org reveals a wealth of different and unusual ways to visualise data. While there are definitely occasions where an exotic visualisation method communicates the data more effectively than a simple line or pie chart, these are really rather rare. The Economist’s use of series charts to efficiently summarise an entire article in a tiny space demonstrates how effective the “classic” visualisation types are — there’s a reason they’ve stood the test of time (The Economist’s incredibly clear descriptions and simple writing style also really help here). Meanwhile, I don’t think I’ve ever gained any insights from a streamgraph, pretty as they are.

 

3. Code for quality

News moves really quickly, which can make it exceptionally difficult to code for quality over speed. Nevertheless, all aspects of your data visualisation need to work — a bug causing a minor element like a tooltip to not update or report the wrong data can at best reduce reader confidence, or at worst, taint a long and costly investigation, possibly even leading to libel proceedings. This is made all the more difficult by the fact that JavaScript is what’s referred to as a “weakly typed” language, meaning that variable types (strings, numbers, objects, et cetera) can mutate over the course of a script’s execution without throwing errors — for instance, `Number(a + b)` will either return the sum of `a` and `b` or the concatenated value of those two variables (e.g., `’1’ + ‘2’ = ‘12’`), depending on whether they’re strings or numbers to begin with. This can be incredibly difficult to discover and troubleshoot. Fortunately, projects like Flow and TypeScript seek to add type annotations to JavaScript, effectively solving this problem (My recent open source project, generator-strong-d3, makes it really easy to scaffold a D3 project using either of these). Another way to improve code quality is to provide automated tests, which are a bit more work at the outset but will prevent bugs from cropping up as you get frantic towards deadline. “Test-Driven Development” (TDD) is a good practise to get into as it encourages you to write tests at the very beginning and then develop until those pass. It’s also a lot faster than writing tests later (or not at all, i.e., “cowboy coding”) once you get the hang of it, as you can save a lot of time avoiding the “make a change, refresh, manually execute a behaviour, evaluate output, repeat” cycle.

 


 

Aendrew-Rininsland-profile-picture

Ændrew Rininsland is a senior newsroom developer at The Times and Sunday Times and all-around data visualisation enthusiast. In addition to Axis, he’s the lead developer for Doctop.js, generator-strong-d3, Github.js and a ludicrous number of other projects. His work has also been featured by The GuardianThe Economist and the Hackney Citizen, and he recently contributed a chapter to Data Journalism: Mapping the Future?, edited by John Mair and Damian Radcliffe and published by Abramis. Follow him on Twitter and GitHub at @aendrew.

The post 3 Golden Rules to #ddj — Ændrew Rininsland appeared first on Data Journalism Blog.

]]>
Building a data journalism tools library http://www.datajournalismblog.com/2015/10/01/building-a-data-journalism-tools-library/ Thu, 01 Oct 2015 00:07:41 +0000 http://www.datajournalismblog.com/?p=1912 I’ve been working in data journalism since 2012. And one of the biggest personal challenges I still face is balancing between learning new tools, become more proficient with older ones, and not missing deadlines because I am spending too much time learning how to use data journalism tools. When I started as a data journalism […]

The post Building a data journalism tools library appeared first on Data Journalism Blog.

]]>
I’ve been working in data journalism since 2012. And one of the biggest personal challenges I still face is balancing between learning new tools, become more proficient with older ones, and not missing deadlines because I am spending too much time learning how to use data journalism tools.

When I started as a data journalism student, I began filling in a spreadsheet with links to inspiring tools I wanted to use and learn. I collected these from mailing lists, tweets, blogs and friends’ suggestions. At first, the spreadsheet was simply an ugly dump of links that I used as a student, then as a freelancer, then as a data journalist and data expert at Silk. A month ago I decided to turn it into something useful for other data journalists as well: an interactive and searchable database of data journalism tools. I knew that there were already many resources listing hundreds of (data) journalism tools. But all the ones I saw were lacking the data structure that would make it easy (and beautiful) to sift through the information.

01102015-Silk3

Silk.co is a platform for publishing, visualizing and sharing data on the Web. I realized that this was also the best tool to publish my data journalism tools’ database.  

On Silk I could:

  • quickly upload a spreadsheet to organize the information in an interactive database
  • visualize information about the tools, either as individual entries in galleries or tables or as a chart showing types of tools and other data
  • have individual profiles for each tool
  • generate inline filters that each time would allow me to find the tool I needed.

The project went live two weeks ago. You can find it at data-journalism-tools.silk.co.  I am regularly updating the Data Journalism Tools Silk, adding about 10 new tools every week. You can go to the website to check it out, or you can also “follow” it to receive free updates via email every time something new is added.

01102015-Silk2

Just as this Data Journalism Tools Silk is intended for the community, it will greatly benefit from the community’s input. For this, I’ve made a Google Form so that anyone can suggest a favourite tool.

The key thing for me is that adding real structure to data adds tremendous power to whatever presentation vector you choose to deploy. There are blogs and lists that contain many, many more journalism tools than this one. But by adding structure to each tool and putting it onto its own structured Web page, we can unlock the power of the data as a filtering, visualization and discovery tool. More structured data equals more discovery.

 


 

Alice Corona is an Italian data journalist. She received an MA of data journalism MA in The Netherlands and is currently a data journalism lead at the data and web publishing platform Silk.co. Here she regularly creates data-driven projects like “Through The Gender Lens: Analysis of 6,000 Movies”,  “Playboy, Then and Now”, “Women at the International Film Festivals” and “Patents by the National Security Agency” You can email her at alice@silk.co.

The post Building a data journalism tools library appeared first on Data Journalism Blog.

]]>
Developers at The Times create Axis http://www.datajournalismblog.com/2015/10/01/create-interactive-charts-with-axis/ Wed, 30 Sep 2015 23:31:09 +0000 http://www.datajournalismblog.com/?p=1906 When we started our Red Box politics vertical at The Times, we needed the ability to quickly generate charts for the web in a style that was consistent with the site’s design. There had been a few attempts to build things like this; we considered using Quartz’s Chartbuilder project for quite some time, but ultimately […]

The post Developers at The Times create Axis appeared first on Data Journalism Blog.

]]>
When we started our Red Box politics vertical at The Times, we needed the ability to quickly generate charts for the web in a style that was consistent with the site’s design. There had been a few attempts to build things like this; we considered using Quartz’s Chartbuilder project for quite some time, but ultimately felt its focus on static charts was a bit limiting. From this, Axis was born, which is both a customisable chart building web application and a framework for building brand new applications that generate interactive charts. It’s also totally open source, and free for anyone to use.

01102015-Axis2

Axismaker (use.axisjs.org)

Design considerations

From the outset, we set a few broader project goals, which have persisted over the last year as we’ve developed Axis:

  1. Enable easy creation of charts via a simple interface
  2. Accept a wide array of data input methods
  3. Be modular enough to allow chart frameworks to easily be replaced
  4. Allow for straightforward customisation and styling
  5. Allow for easy integration into existing content management systems
  6. Allow journalists to easily create charts that are embeddable across a wide array of devices and media

At the moment, the only D3-based charting framework Axis supports is C3.js (which I’m also a co-maintainer of), though work is underway to provide adapters for NVD3 and Vega. This means Axis supports all the basic chart types (line, bar, area, stacked, pie, donut and gauge charts) and will gain new functionality as C3 evolves. Of course, once other charting libraries are integrated and adding new ones is more straightforward than it currently is, the sky’s the limit in terms of the types of charts Axis will be able to produce.

 

This is all possible because Axis isn’t so much a standalone webapp as a chart application framework. In order to achieve this level of modularity, Axis was built as an AngularJS app making extensive use of services and providers, meaning it’s relatively simple to swap around various components. As a nice side effect of this, it’s really easy to embed Axis in a wide variety of content management systems — at present, we’ve created a WordPress plugin that integrates really nicely with the media library and is currently one of the more feature-rich chart plugins out there for WordPress, plus a Drupal implementation is being developed by the Axis community. Integrating Axis into a new content management system is as difficult as extending the default export service — for instance, Axisbuilder is a Node-based server app that saves charts directly to a GitHub repo supplied by the user and is intended more for general public use, whereas Axis Server saves chart configurations into a MongoDB database and is intended more for news organisations who want to use it as a centralised internal tool. It can also be used entirely without a server component, depending on the needs of the organisation using it.

 

01102015-Axis3

Main interface for Axis

 

Output is king

Charts are used universally by news organisations, whether that be in print, on the website or in a mobile app. As such, Axis was built to provide for a very wide variety of use cases — you can save Axis charts as a standalone embed code that can be pasted into a blog or forum post, Axis charts can be exported to a CMS, they can be saved as PNG for the web or SVG for print. In fact, print output is an important feature we’ve been developing recently so that chart output is ready to be placed in InDesign or Methode with little-or-no further tweaking. At the moment, basic charts for print at The Times and Sunday Times are produced by hand in Adobe Illustrator. The hope is that we can save our talented illustrators countless hours by dramatically reducing the time it takes to produce the large number of simple line graphs or pie charts needed for a single edition. The extensible configuration system means that customising the design of the output for a new section or title is as difficult as copying a config and CSS file and then customising to suit.

 

Proudly open source

Although Axis has been in development for just over a year, it’s really feature-rich — mainly as a result of working directly with journalists across The Times and Sunday Times to create the functionality they need. There are still a few sundry features we want to implement here and there, but ultimately the rest of version 1 will focus on stability and performance improvements. Version 2 — release date rather far into the future; we’re only in the pre-planning stages — will break away from this, with a restructuring of the internals, a redesign of the interface, and a whole boatload of new features.

Although we’ve built Axis with Times journalists in mind, we truly want it to grow as an open source project and welcome contributions both large and small (for example, we recently added i18n support, and are currently looking for translators to help internationalise the interface into different languages). Though designed to be powerful enough to support major news organisations, Axis is simple enough for anyone to use, and we particularly hope that student newspapers running WordPress will be encouraged to explore data journalism and visualisation using Axis.

For more about Axis and its related projects, please visit axisjs.org or follow us on Twitter at @axisjs. To try using Axis, visit use.axisjs.org.

 


 

Aendrew-Rininsland-profile-picture

 

Ændrew Rininsland is a senior newsroom developer at The Times and Sunday Times and all-around data visualisation enthusiast. In addition to Axis, he’s the lead developer for Doctop.js, generator-strong-d3, Github.js and a ludicrous number of other projects. His work has also been featured by The Guardian, The Economist and the Hackney Citizen, and he recently contributed a chapter to Data Journalism: Mapping the Future?, edited by John Mair and Damian Radcliffe and published by Abramis. Follow him on Twitter and GitHub at @aendrew.

The post Developers at The Times create Axis appeared first on Data Journalism Blog.

]]>
Imminent Relaunch http://www.datajournalismblog.com/2015/05/04/imminent-relaunch/ Mon, 04 May 2015 16:42:46 +0000 http://www.datajournalismblog.com/?p=1744   Hint hint… Do I hear a relaunch is in the works?.. Yes Indeed. A new, fresher, bolder and savvier DJB will go live soon alongside exciting new projects.  We thought it was about time to redecorate the place and give it a good upgrade. The support we’ve had since the launch of the DJB […]

The post Imminent Relaunch appeared first on Data Journalism Blog.

]]>
 

Hint hint… Do I hear a relaunch is in the works?.. Yes Indeed. A new, fresher, bolder and savvier DJB will go live soon alongside exciting new projects.  We thought it was about time to redecorate the place and give it a good upgrade.

The support we’ve had since the launch of the DJB in 2011 has been phenomenal. Even at times when new content wasn’t published frequently, the number of visitors and followers kept on growing. 2015 is a great and exciting year for journalism and the thirst for innovation in the newsroom has never been greater. So we’ve seriously thought about things and decided it was time to take action and relaunch the Data Journalism Blog. 

The new DJB will be bolder, savvier, smarter, and packed with new reviews, how-to guides and interviews about data journalism and innovative story telling for the web. We have exciting projects coming up, including our own compelling data journalism content and collaborations.

The great relaunch will happen in the next few weeks and we look forward to tell you more about it soon, but in the meantime here is a glimpse at our brand new logo…

DJB-Logo-2015-short

[Watch this space]

The post Imminent Relaunch appeared first on Data Journalism Blog.

]]>
Semantic Web and what it means for data journalism http://www.datajournalismblog.com/2012/05/08/semantic-web-and-what-it-means-for-data-journalism/ http://www.datajournalismblog.com/2012/05/08/semantic-web-and-what-it-means-for-data-journalism/#comments Tue, 08 May 2012 08:00:00 +0000 http://www.datajournalismblog.com/?p=1626 I’ve found myself increasingly interested by the semantic web in recent months, particularly in how it could be applied to data journalism. While the concept is still somewhat in its infancy, the potential it holds to quickly find data — and abstract it into a format usable by visualizations — is something that all data […]

The post Semantic Web and what it means for data journalism appeared first on Data Journalism Blog.

]]>
I’ve found myself increasingly interested by the semantic web in recent months, particularly in how it could be applied to data journalism. While the concept is still somewhat in its infancy, the potential it holds to quickly find data — and abstract it into a format usable by visualizations — is something that all data journalists should take note of.

Imagine the Internet as one big decentralized database, with important information explicitly tagged — instead of just a big collection of linked text files, organized on the larger document level, such as it currently is. In the foreseeable future, journalists wanting to answer a question will simply have to supply this database with a SQL-like query, instead of digging through a boatload of content or writing scrapers. Projects like Freebase and Wikipedia’s burgeoning “Datapedia” provide some clues as to the power of this notion — already, the semantic components of Wikipedia make it incredibly easy to answer a wide variety of questions in this manner.

Take, for example, the following bit of SPARQL, a commonly used semantic web query language:

SELECT ?country ?competitors WHERE {
?s foaf:page ?country .
?s rdf:type .
?s "2012"^^ .
?s dbpprop:competitors ?competitors
} order by desc(?competitors)

If used on DBPedia (a dataset cloning Wikipedia that attempts to make its data usable as semantic web constructs), this fairly straight-forward 6-line query will return a JSON object listing all countries participating in the London 2012 Olympics and the number of athletes they’re sending. Go ahead — try pasting the above snippet into a DBpedia SPARQL query editor, such as the one at live.dbpedia.org/sparql. To accomplish a similar feat would take hours of scraping or data gathering. Because it can provide results in JSON, CSV, XML or whatever strikes your fancy, the output can then be supplied to some piece of visualization, whether that’s simply a table or something more complex like a bar chart.

One nice thing about SPARQL is that a lot of the terminology becomes self-evident once you get the hang of it. For instance, if you want to find the properties of the OlympicResult ontology, you merely have to visit the URL in the rdf:type declaration. That will also link to other related ontologies and you can thus find the definitions you need to construct a successful query. For instance, try going to dbpedia.org/page/Canada_at_the_2012_Summer_Olympics, which is the page I used to derive most of the ontologies and properties for the above query. From that page, you learn that entities in the “olympic result” ontology are assigned a “dbpprop:games” property (I.e., the year of the games) and a “dbpprop:competitors” property (I.e., the number of competitors, a.k.a., your pay dirt).

Here’s another, more complex SPARQL query, taken from DBpedia’s documentation:

SELECT DISTINCT ?player {
?s foaf:page ?player.
?s rdf:type .
?s dbpedia2:position ?position .
?s ?club .
?club ?cap .
?s ?place .
?place ?population ?pop.
OPTIONAL {?s ?tricot.}
Filter (?population in (, , ))
Filter (xsd:int(?pop) >10000000 ) .
Filter (xsd:int(?cap) Filter (?position = "Goalkeeper"@en || ?position = || ?position = )
} Limit 1000

This selects all pages describing a “player”, of type “SoccerPlayer”, with position “goalkeeper”, playing for a club with a stadium capacity of less than 40,000 and born in a country with a population of greater than 10 million. Producing such a list without semantic web would be mind-numbingly difficult and would require a very complex scraping routine.

Some limitations

That said, there are some limitations to this. The first is that the amount of well-structured semantic web data out there is limited — at least in comparison with non-semantic web data — though that is growing all the time. Wikipedia/DBpedia seems to be the most useful resource for this by far at the moment, though it’s worth noting that semantic web data from Wikipedia suffers from the same problems that all data from Wikipedia suffers from — namely, the fact that it’s edited by anonymous users. In other words, if something’s incorrect on Wikipedia, it’ll also be wrong in the semantic web resource. Another aspect of this is that Wikipedia data changes really quickly, which means that the official DBpedia endpoint becomes outdated really quickly. As a result, it’s often better to use live.dbpedia.org, which enables a continuous synchronization between Wikipedia and DBpedia.

The other thing you have to watch out for is data tampering. If your visualization is hooked up to a data source with little editorial oversight and the ability of users to edit data, the possibility always exist that one of those users will realize that data set is hooked up to your live visualization on a newspaper website somewhere, and will thus try to tamper with the data in order to make it full of profanity or whatnot. As such, while semantic web data from DBpedia might be a good way of getting the initial result, saving that result as a static object within your script afterwards might be the safest course of action.

The post Semantic Web and what it means for data journalism appeared first on Data Journalism Blog.

]]>
http://www.datajournalismblog.com/2012/05/08/semantic-web-and-what-it-means-for-data-journalism/feed/ 1
Data for dummies http://www.datajournalismblog.com/2012/05/08/data-for-dummies/ Tue, 08 May 2012 05:03:38 +0000 http://www.datajournalismblog.com/?p=1638 After spending the past seven months studying data, I’ve learnt at least one important lesson: data journalism doesn’t have to be impossibly hard. While there are plenty of things about data journalism which will always go straight over my head – there are also a lot of easy techniques, formulas and programmes. So here are […]

The post Data for dummies appeared first on Data Journalism Blog.

]]>
After spending the past seven months studying data, I’ve learnt at least one important lesson: data journalism doesn’t have to be impossibly hard. While there are plenty of things about data journalism which will always go straight over my head – there are also a lot of easy techniques, formulas and programmes. So here are my top 5 data recommendations, for dummies.

1)      Pdf to Excel

Scraping doesn’t have to be hard. Don’t spend hours faffing about with Scraperwiki if there’s an easier alternative. There are plenty of scraping programmes such as Pdf to Excel or GCal to Excel (which can extract data from Google Calendars).

They require absolutely no intelligence, and work like magic.

2)      =ImportHTML

This nifty formula requires next to no brain power. It also happens to be one of the most useful formulae for extracting data from webpages o spreadsheets. Just type =importHTML(‘URL containing table’, ‘table’, table index) into a cell and it will suck out all the data from that webpage, placing it on your spreadsheet.

Use this formulae in Google Docs and your spreadsheet will update automatically if/when the webpage changes.

3)      VLookups

VLookups are God’s gift to Excel. If you have two sheets on a spreadsheet and they relate to one another, you can use a VLookup formula to hook them up.

The formula is as follows:

VLOOKUP(lookup_value,table_array,col_index_num,range_lookup)

If you’re not sure what all these words mean, have a look at the documentation.

4)      CONCATENATE

You can use this to merge cells. So if, for example you have data which contains individuals’ names – and the first and last name appeared in different columns,  you could use this formula to merge them back together.

EG. If A2 contains ‘Joe’ and B2 contains ‘Blogs’. In cell C2 you can type this:

=CONCATENATE(A2, ” “, B2)

5)      Lastly, here’s the best website for finding KML files (the shape files needed to highlight areas on Google maps)

http://mapit.mysociety.org/ contains all the KMLs you could possibly need, as well lat/long and postcode lookups. All very easy to find.


The post Data for dummies appeared first on Data Journalism Blog.

]]>
“I’m still waiting” – border control queues http://www.datajournalismblog.com/2012/05/07/im-still-waiting-border-control-queues/ Mon, 07 May 2012 13:30:17 +0000 http://www.datajournalismblog.com/?p=1595 You can’t failed to have encountered the brouhaha about protracted and maddening border control queues at UK airports. As we edge closer and closer to the Olympics, and an influx of tourists, the issue has been gathering pace. At its core is data; in fact the whole story hinges on how the various parties in […]

The post “I’m still waiting” – border control queues appeared first on Data Journalism Blog.

]]>
You can’t failed to have encountered the brouhaha about protracted and maddening border control queues at UK airports. As we edge closer and closer to the Olympics, and an influx of tourists, the issue has been gathering pace.

Picture by Tomy Pelluz via Flickr

At its core is data; in fact the whole story hinges on how the various parties in the debate including Immigration Minister Damian Green, count waiting times. It’s the perfect story to dissect to its numerical nuts and bolts.

I caught Radio 4’s More or Less programme (a must listen for data fiends) to try and make sense of the statistics.

And I should preface all this by saying that the target to process non-European passport holders is 45 minutes 95% of the time. It’s 25 minutes for European passport holders.

Wild exaggerated stats?

Last week Green told Parliament that the information had been wildly exaggerated and quoting “internal management information” (i.e. not official statistics), said the longest wait for non-EU nationals at Heathrow’s Terminal 5 was 90 minutes. Although he neglected to mention what timescale this longest wait relates.

CEO of the International Airlines Group Willie Walsh stated, in reaction, that Green himself was misinformed – and highlighted evidence of people queuing for over 2.5 hours.

It seems that Green’s figures came from the Border Agency but relate to a period last year and not to April. More or Less explained that the Border Agency collate data by choosing one person per hour from the back of the immigration queue, and then timing their progress.

The problem with this is that the influx of people isn’t steady; there are peaks and troughs in the numbers arriving at immigration. The Border Agency’s claim that 98% of non-EUs were channelled through checks in 45 minutes (last year – I have to point that out again) is based on biased data. It would be more accurate to choose one person per 1000 people passing through – rather than per hour – to measure its performance.

BAA’s data release

BAA’s more up-to-date but sketchy data on Heathrow was released shortly after the Green and Walsh bust-up and sheds new light on the debate.

It seems that BAA monitor queues in a pretty similar way to the Border Agency but select one person every 15 minutes so there’s potential to be slightly more accurate.

BAA was also able to specify exactly when Walsh’s 2.5 hours stat relates to (17 April, since you ask). It also plucked a three-hour wait out of its statistical arsenal (30 April).

Anyway, nitpicking aside, Heathrow’s performance was pretty shoddy over April. And as More or Less points out the targets are pretty lenient anyway – a 45-minute progress through immigration isn’t exactly speedy.

To listen to More or Less, click here

The post “I’m still waiting” – border control queues appeared first on Data Journalism Blog.

]]>
Data Visualisation vs. Text http://www.datajournalismblog.com/2012/05/06/data-visualisation-vs-text/ Sun, 06 May 2012 14:08:14 +0000 http://www.datajournalismblog.com/?p=1592 Simon Rogers has mapped data which ranked 754 beaches around Great Britain for the Guardian Data Store. The visualisation uses a satellite map of the UK, onto which Simon has marked every beach in its correct geographical location. The dots are colour coded to clearly denote the ranking the beach received from the 2012 Good […]

The post Data Visualisation vs. Text appeared first on Data Journalism Blog.

]]>
Simon Rogers has mapped data which ranked 754 beaches around Great Britain for the Guardian Data Store. The visualisation uses a satellite map of the UK, onto which Simon has marked every beach in its correct geographical location. The dots are colour coded to clearly denote the ranking the beach received from the 2012 Good Beach Guide, green representing ‘Recommended’, purple meaning ‘Guideline’, yellow meaning ‘Basic’ and red indicating that the beach failed to reach the Beach Guide’s standards. Users can click on individual dots to get the names of each beach and its ranking.

In this way an enormous mass of information is presented in a small space. It is also presented in a clear and comprehendible way. Users can spend as long as they like ‘reading’ the map and obtain as much or as little information as they wish to from it.

Underneath the map, Simon has written out all 754 beaches, with their ranking alongside it. As he has done so, we can easily compare the use of text to tell a data story with a visualisation. The text takes up significantly more room. It is much harder to find the individual beaches you are interested in and takes more energy and effort to scroll up and down in order to find a particular beach. The sheer mass of information presented in the text makes the story seem like a drag, rather than a fun exploration of the British coastline, as is felt by the visualisation.

However, underneath the map Simon has highlighted key features and findings of the data. He writes: “The report rated 516 out of 754 (68%) UK bathing beaches as having excellent water quality – up 8% on last year. That compares well to 2010, when it rated 421 of 769 beaches as excellent.”

It is not clear from the visualisation alone how many beaches received each rating and it would have been time consuming and difficult for the user to individually count this. Thus text is useful to provide a summary and to highlight key findings alongside a visualisation.

This is therefore a fine example of the way in which visualisations and text complement each other, and demonstrates that, with many data stories, combining visualisation and text creates the richest, most comprehendible and informative narrative.

 

 

The post Data Visualisation vs. Text appeared first on Data Journalism Blog.

]]>