This is what the best of data journalism looks like

This article was originally published on the Data Journalism Awards Medium Publication managed by the Global Editors Network. You can find the original version right here.

________________________________________________________________________________________________

 

After a year of hard work, collecting and sifting through hundreds of data projects from around the world, the news is finally out. The thirteen winners (and one honourable mention) of the Data Journalism Awards 2018 competition were announced on 31 May in Lisbon. Together they are the best of what the world of data journalism had to offer in the past year. They also teach us a lot about the state of data journalism.

 

 

All of the work I have done over the past few months has given me a pretty good perspective of what’s going on in the world of data journalism. Managing the Data Journalism Awards competition is probably the greatest way to find out what everybody has been up to and to discover amazing projects from all over the world.

And today I want to share some of this with you! Most of the examples you will see in this article are projects that either won or got shortlisted for the Data Journalism Awards 2018 competition.

When a news organisation submits a project, they have to fill in a form asking them to describe their work, but also how they made it, what technology they used, what methodology… And all of this information is published on the website for everyone to see.

So if you‘re reading this article in the hope of finding some inspiration for your next project, as I am confident you are, then here is a good tip: on top of all of the examples I will show you here, you can take a look at all of the 630 projects from all over the world which were submitted this year, right on the competition website. You’re welcome.

So what have we learned this year by going through hundreds of data journalism projects from around the world? What are the trends we’ve spotted?

 

Data journalism is still spreading internationally

And this is great news. We see more and more projects from countries that have never applied before, and this is a great indicator of the way journalists worldwide, regardless of their background, regardless of how accessible data is in their country, regardless of how data literate they are, are trying to tell stories with data.

 

Some topics are more popular than others

One of the first things we look at when we get the list of projects each year, is what topics did people tackle? And what we’ve learned from that is that some topics are more attractive than others.

Whether that’s because it is just easier to find data on them, or it’s easier to visualise things related to those topics, or it’s just the kind of big stories that everyone expects to see data on each year, we can’t really know for all of them. It’s probably a good mixture of all of this.

 

 

The refugee crises

The first recurrent topic that we’ve seen this past year is the refugee crises. And a great example of that is this project by Reuters called ‘Life in the camps’, which won the award for Data visualisation of the year at the Data Journalism Awards 2018.

This graphic provided the first detailed look at the dire living conditions inside the Rohingya refugee camps in Cox’s Bazar. Using satellite imagery and data, the graphic documented the rapid expansion and lack of infrastructure in the largest camp cluster, Kutupalong. Makeshift toilets sit next to wells that are too shallow, contaminating water supply.

This project incorporates data-driven graphics, photo and video. Reuters gained access to data from a group of aid agencies working together to document the location of infrastructure throughout the Kutupalong camp by using handheld GPS devices on the ground. The graphics team recognised that parts of the data set could be used to analyse the accessibility of basic water and sanitation facilities. After some preliminary analysis, they were able to see that some areas had water pumps located too close to makeshift toilets, raising major health issues.

They displayed this information in a narrative graphic format with each water pump and temporary latrine marked by a dot and overlaid on a diagram of the camp footprint. They compared these locations to the U.N.’s basic guidelines to illustrate the potential health risks. Reuters photographers then used these coordinates to visit specific sites and document real examples of latrines and water pumps in close proximity to each other.

Technologies used for this project: HTML, CSS, Javascript, QGIS and Illustrator.

 

 

Elections/Politics

Next topic that came up a lot this year was politics, and more specifically, anything related to recent elections, not just in the US, but also in many other countries. One great example of that was the Data Journalism Awards 2018 ‘News data app of the year’ award winner, ‘The atlas of redistricting’, by FiveThirtyEight in the US.

There’s a lot of complaining about gerrymandering (the process of manipulating the boundaries of an electoral constituency so as to favour one party or class) and its effects on US politics. But a fundamental question is often missing from the conversation: What should political boundaries look like? There are a number of possible approaches to drawing districts, and each involves tradeoffs. For this project, the team at FiveThirtyEight looked at seven different redistricting schemes; and to quantify their tradeoffs and evaluate their political implications, they actually redrew every congressional district in the U.S. seven times. The Atlas of redistricting allows readers to explore each of these approaches — both for the nation as a whole and for their home state.

The scope of this project really makes it unique. No other news organization covering gerrymandering has taken on a project of this size before.

To make it happen, they took precinct-level presidential election results from 2012 and 2016 and reallocated them to 2010 Census voting districts. That enabled them to add more up-to-date political data to a free online redistricting tool called Dave’s Redistricting App. Once the data was in the app, they started the long process of drawing and redrawing all the districts in the country. Then, they downloaded their district boundaries from the app, analysed their political, racial and geometric characteristics, and ultimately evaluated the tradeoffs of the different redistricting approaches. Sources for data included Ryne Rohla/Decision Desk HQ, U.S. Census Bureau, and Brian Olson.

Technologies used for this project: Ruby, PostGIS, Dave’s Redistricting App, Node, D3

 

 

An other great example of how politics and elections were covered this year comes from the Financial Times. It is called ‘French election results: Macron’s victory in charts’ and was shortlisted for the Data Journalism Awards 2018 competition.

Let’s say it, elections are a must for all data news teams around the world. That’s probably the topic where the audience is the most used to seeing data combined with maps, graphics and analysis.

Throughout 2017 and 2018, the Financial Times became an expert in:

  • producing rapid-response overnight analyses of elections,
  • leveraging their data collection and visualisation skills to turn around insightful and visually striking reports on several elections across Europe,
  • responding faster than other news organisations both in the UK and even those based in the countries where these elections have taken place.

Over and above simply providing the top-line results, they have focused on adding insight by identifying and explaining voting patterns, highlighting significant associations between the characteristics of people and places, and the political causes they support.

To deliver this, the team developed highly versatile skills in data scraping and cleaning. They also have carried out ‘election rehearsals’ — practice runs of election night to make sure their workflows for obtaining, cleaning and visualising data were all polished, and robust to avoid any glitches that might come up on the night of the count.

The work has demonstrably paid off, with readers from continental Europe outnumbering those from Britain and the United States — typically far larger audiences for the FT — for the data team’s analyses of the French, German and Italian elections.

For each election, the team identified official data sources at the most granular possible level, with the guidance of local academic experts and the FT’s network of correspondents.

R scripts were written in advance to scrape the electoral results services in real time and attach them to the static, pre-sourced demographic data.

Scraping and analysis was primarily conducted in R, with most final projection graphics created in D3 — often adapting the Financial Times’ Visual Vocabulary library of data visualisation formats.

Technologies used for this project: R, D3.

 

 

Crime

The last topic that I wanted to mention that was also recurrent this past year is crime. And to illustrate this, I’ve picked a project called ‘Deaths in custody’ by Malaysiakini in Malaysia.

This is an analysis of how deaths in police custody are reported, something that various teams around the world have been looking at recently. The team at Malaysiakini compared 15 years of official police statistics with data collected by a human rights organisation, called Suaram. The latter is the sole and most comprehensive tracker of publicised deaths in police custody in the country.

The journalists behind this project found that overall, deaths in Malaysian police custody are underreported, with one in four deaths being reported to the media or to Suaram.

They also highlight the important role that families of victims play in holding the police accountable and pushing to investigate the deaths. They created an interactive news game and a guide on what to do if somebody is arrested, both of which accompany the main article, taking inspiration from The Uber game that the Financial Times developed in 2017.

The game puts players in the shoes of a friend who is entangled in a custodial dilemma between a victim and the police. Along the way, there are fact boxes that teach players about their rights in custody. The real-life case that the game is based on is revealed at the end of the game.

Technologies used for this project: Tabula, OpenRefine, Google Sheets, HTML, CSS, Javascript, UI-Kit Framework, Adobe Photoshop.

 

We’ve changed the way we do maps

Another thing that we’ve learned by looking at all these data journalism projects is that we have changed the way we do maps.

Some newsrooms are really getting better at it. Maps are more interactive, more granular, prettier too, and integrated as part of a narrative instead of standing on their own, making us think that more and more journalists don’t do maps for the sake of doing maps, but for good reasons.

 

 

 

An example of how data journalists have made use of maps this past year is this piece by the BBC called ‘Is anything left of Mosul?’

It is a visually-led piece on the devastation caused to Mosul, Iraq, as a result of the battle to rid the city of Islamic State (IS). The piece not only gives people a full picture of the devastating scale of destruction, it also connects them to the real people who live in the city — essential when trying to tell stories from places people may not instantly relate to.

It was also designed mobile-first, giving users on small screens the full, in-depth experience. The feature uses the latest data from Unosat, allowing the BBC team to map in detail which buildings had suffered damage over time, telling the narrative of the war through four maps.

The feature incorporates interactive sliders to show the contrast of life before the conflict and after — a way of giving the audience an element of control over the storytelling.

They also used the latest data from the UNHCR, which told them where and when displaced people in Iraq had fled to and from. They mapped this data using QGIS’ heatmapping software and visualised it using their in-house Google Maps Chrome extension. They produced three heatmaps of Mosul at different phases of the battle, again telling a narrative of how the fighting had shifted to residential targets as the war went on.

The project got nearly half a million page views over several days in English. They also translated the feature into 10 other languages for BBC World Service audiences around the world.

Technologies used for this project: QGIS mapping software, Microsoft Excel, Adobe Illustrator, HTML, CSS, Javascript, Planet satellite imagery, DigitalGlobe images

 

 

Another example of how the data journalism community has changed the way it does maps, is this interactive piece by the South China Morning Post called ‘China’s Belt and Road Initiative’.

The aim of this infographic is to provide context to the railway initiative linking China to the West.

They combined classic long-form storytelling with maps, graphs, diagrams of land elevations, infrastructure and risk-measurement charts, motion graphics, user interaction, and other media. The variety of techniques were selected to prevent the extensive data from appearing overwhelming. The split screen on the desktop version meant readers could refer to the route as they read the narrative.

We are not talking about boring static maps anymore. And this is an example of how new teams around the world, and not just in western countries, are aiming for more interactivity, and a better user journey through data stories, even when the topic is complex. It is thanks to the interactivity of the piece and the diversity of elements put together that the experience becomes enticing.

They used data from the Economist Intelligence Unit (EIU). Using Google Earth, they plotted and traced the path of each initiative to obtain height profiles and elevations to explain the extreme geographical environments and conditions.

Technologies used for this project: Adobe Creative Suite (Illustrator, Photoshop…), QGIS Brackets io Corel Painter, Microsoft Excel, Javascript, Canvas, JQuery, HTML, CSS — CSS3, Json, CSV, SVG.

 

 

 

New innovative data storytelling practices have arrived

Another thing we saw was that data teams around the world are finding new ways to tell stories. New innovative storytelling practices have arrived and are being used more and more.

 

 

Machine learning

It is probably the most used term in current conversations about news innovation. It has also been used recently to help create data-driven projects, such as ‘Hidden Spy Planes’ by BuzzFeed News in the US, the winner of the JSK Fellowships award for innovation in data journalism at this year’s Data Journalism Awards.

This project revealed the activities of aircrafts that their operators didn’t want to discuss, opening the lid on a black box of covert aerial surveillance by agencies of the US government, the military and its contractors, and local law enforcement agencies.

Some of these spy planes employed sophisticated surveillance technologies including devices to locate and track cell phones and satellite phones, or survey Wi-Fi networks.

Before these stories came out, most Americans would have been unaware of the extent and sophistication of these operations. Without employing machine learning to identify aircraft engaged in aerial surveillance, the activities of many of aircraft deploying these devices would have remained hidden.

In recent years, there has been much discussion about the potential of machine learning and artificial intelligence in journalism, largely centered on classifying and organising content with a CMS, on fact-checking for example.

There have been relatively few stories that have used machine learning as a core tool for reporting, which is why this project is an important landmark.

Technologies used for this project: R, RStudio, PostgreSQL, PostGIS, QGIS, PostGIS, OpenStreetMap

 

 

Drone journalism

Another innovative storytelling practice that we’ve noticed is drone journalism, and here is an example called ‘Roads to nowhere’ from The Guardian.

It is an investigation using drone technology, historical research and analysis, interviews, as well as photomosaic visualizations.

It was a project that specifically looked at infrastructure in the US and the root causes of how cities have been designed with segregation and separation as a fundamental principle. It shows through a variety of means how Redlining and the interstate highway system were in part tools to disenfranchise African-Americans.

People are still living with this segregation to this day.

Most of the photos and all of the videos were taken by drone in this project. This is innovative in that it is really the only way to truly appreciate some of the micro-scale planning decisions taken in urban communities throughout the US.

Technologies used for this project: DJI Mavic Pro drone, a Canon 5Diii camera to take the photos, Shorthand, Adobe Photoshop. Knightlab’s Juxtapos tool to make it come to life with the slide tool

 

 

AR

Another innovative technique that has a lot of people talking at the moment is Augmented Reality, and to illustrate this in the context of data journalism, I am bringing you this project called ExtraPol by WeDoData in France.

Extrapol is an augmented reality app (iOS and Android) that was launched a month before the French presidential campaign in April 2017. Everyday, official candidates posters could be turned into new live data visualisations to inform the audience on the candidates. This data journalism project treated 30 topics in data such as: their geographical travels in France during the campaign, the cumulated number of years they have ruled a political mandate, etc.

This is probably the first ephemeral daily data journalism news app which uses augmented reality. This was the first time that real life materials, the official candidates posters, were ‘hacked’ to fact news on the politicians.

Technologies used for this project: Python, Javascript, HTML, CSS, PHP, jsFeat, TrackingWorker, Vuforia, GL Matrix, Open CV, Three.js, Adobe Illustrator, After Effect and Photoshop

 

 

Newsgames

They aren’t a new trend, but more and more newsrooms are playing with this. And this example, called ‘The Uber Game’ by the Financial Times in the UK, has been a key player in the field this year, inspiring news teams around the world…

This game puts you into the shoes of a full-time Uber driver. Based on real reporting, including dozens of interviews with Uber drivers in San Francisco, it aims to convey an emotional understanding of what it is like to try to make a living in the gig economy.

It is an innovative attempt to present data reporting in a new, interactive format. It was the third-most read by pageviews throughout 2017.

Roughly two-thirds of people who started the game finished it — even though this takes around 10 minutes and an average of 67 clicks.

Technologies used for this project: Ink to script the game, inkjs, anime.js, CSS, SCSS, NodeJS, Postgres database, Zeit Micro, Heroku 1X dynos, Standard-0 size Heroku Postgres database, Framer, Affinity Designer

 

 

Collaborations are still a big thing

And many organisations worldwide have had a go at it, in many regions around the world.

Paradise Papers

Of course we have the Paradise Papers investigation (pictured above) coordinated by the ICIJ with 380 journalists worldwide.

Based on a massive leak, it exposes secret tax machinations of some of the world’s most powerful people and corporations. The project revealed offshore interests and activities of more than 120 politicians and world leaders, including Queen Elizabeth II, and 13 advisers, major donors and members of U.S. President Donald J. Trump’s administration. It exposed the tax engineering of more than 100 multinational corporations, including Apple, Nike, Glencore and Allergan, and much more.

If you want to know more about how this was done, go to the Data Journalism Awards 2018 website where that information is published.

The leak, at 13.4 million records, was even bigger in terms of the number of records than the Panama Papers, and technically even more complex to manage.

The record set came from an array of sources from 19 secrecy jurisdictions. It also contained more than 110,000 files in database or spreadsheet formats (excel, CSVs and SQL). ICIJ’s data unit used reverse-engineering techniques to reconstruct corporate databases. The team scraped the records in the files and created a database with information of companies and individuals behind them.

The team then used ‘fuzzy matching’ techniques and other algorithms to compare the names of the people and companies in all these databases to lists of individuals and companies of interest, including prominent politicians and America’s 500 largest publicly traded corporations.

 

Technologies used for this project:

  • For data extraction and analysis: Talend Open Studio for Big Data, SQL Server, PostgreSQL, Python (nltk, beautifulsoup, pandas, csvkit, fuzzywuzzy), Google Maps API, Open Street Maps API, Microsoft Excel, Tesseract, RapidMiner, Extract
  • For the collaborative platforms: Linkurious, Neo4j, Apache Solr, Apache Tika, Blacklight, Xemx, Oxwall, MySQL and Semaphor.
  • For the interactive products: JavaScript, Webpack, Node.js, D3.js, Vue.js, Leaflet.js and HTML.
  • For security and sources protection: GPG, VeraCrypt, Tor, Tails, Google Authenticator, SSL (client certificates) and OpenVPN.

 

 

 

Monitor da violencia

Now here is an other collaborative project that you may not know of but is also quite impressive. It is called ‘Monitor da Violencia’, and it won the Microsoft award for public choice at this year’s Data Journalism Awards. It was done by G1 in Brazil, in collaboration with the Center for the Study of Violence at University of São Paulo (the largest university in Brazil) and the Brazilian Forum of Public Security (one of the most respected public security NGOs in Brazil).

This project is an unprecedented partnership which tackles violence in Brazil. To make it possible, G1 staff reporters all over Brazil kept track of violent deaths through the course of one week. Most of these are crimes that generally become forgotten — cases of homicides, robberies, deaths by police intervention, and suicides. There were 1,195 deaths in this period — one every 8 minutes on average.

All these stories have been cleared and written by more than 230 journalists spread throughout Brazil. This is a small sample — compared to the 60,000 annual homicide rate — but it represents a picture of the violence in Brazil.

The project aims at showing the faces of the victims; trying to understand the causes of this epidemic of deaths. As a first step, a news piece was written for each one of the violent deaths. An interactive map, complete with search filters, showed the locations of the crimes as well as the victim’s photos.

The second step was a collective and collaborative effort to find the names of unidentified people. A campaign was launched, including online, on TV and social media, so that people could help identify many of the victims.

A database was assembled from scratch, containing information such as the victims’ name, age, race, and gender. Also, the day, time, weapon used, and the exact location of the crime, among others.

Technologies used for this project: HTML, CSS, Javascript, Google Sheets, CARTO

 

 

 

 

Onwards and upwards for data journalism in 2018

The jury of the Data Journalism Awards, presided over by Paul Steiger, selected 13 winners (and one honorable mention) out of the 86 finalists for this year’s competition, and you can find the entire list, accompanied by comments from jury members, on the Data Journalism Awards website.

The insights I’ve listed in this article today show us that not only is the field ever-growing, it is also more impactful than ever, with many winning projects bringing change in their country.

Congratulations again to all of the winners, shortlisted projects, but also to all the journalists, news programmers, and NGOs pushing boundaries so that hard-to-reach data becomes engaging and impactful projects for news audiences.


 

The competition, organised by the Global Editors Network, with support from the Google News Initiative, the John S. and James L. Knight Foundation, Microsoft, and in partnership with Chartbeat, received 630 submissions of the highest standards from 58 countries.

Now in its seventh year, the Data Journalism Awards was launched in 2012. In the first edition, it received close to 200 projects. Over the years it has grown to become the first international awards recognising outstanding work in the field of data journalism, receiving the highest amount of submissions in the history of the competition in 2018.

 

 


marianne-bouchart

Marianne Bouchart is the founder and director of HEI-DA, a nonprofit organisation promoting news innovation, the future of data journalism and open data. She runs data journalism programmes in various regions around the world as well as HEI-DA’s Sensor Journalism Toolkit project and manages the Data Journalism Awards competition.

Before launching HEI-DA, Marianne spent 10 years in London where she worked as a web producer, data journalism and graphics editor for Bloomberg News, amongst others. She created the Data Journalism Blog in 2011 and gives lectures at journalism schools, in the UK and in France.

 

Developers at The Times create Axis

When we started our Red Box politics vertical at The Times, we needed the ability to quickly generate charts for the web in a style that was consistent with the site’s design. There had been a few attempts to build things like this; we considered using Quartz’s Chartbuilder project for quite some time, but ultimately felt its focus on static charts was a bit limiting. From this, Axis was born, which is both a customisable chart building web application and a framework for building brand new applications that generate interactive charts. It’s also totally open source, and free for anyone to use.

01102015-Axis2
Axismaker (use.axisjs.org)

Design considerations

From the outset, we set a few broader project goals, which have persisted over the last year as we’ve developed Axis:

  1. Enable easy creation of charts via a simple interface
  2. Accept a wide array of data input methods
  3. Be modular enough to allow chart frameworks to easily be replaced
  4. Allow for straightforward customisation and styling
  5. Allow for easy integration into existing content management systems
  6. Allow journalists to easily create charts that are embeddable across a wide array of devices and media

At the moment, the only D3-based charting framework Axis supports is C3.js (which I’m also a co-maintainer of), though work is underway to provide adapters for NVD3 and Vega. This means Axis supports all the basic chart types (line, bar, area, stacked, pie, donut and gauge charts) and will gain new functionality as C3 evolves. Of course, once other charting libraries are integrated and adding new ones is more straightforward than it currently is, the sky’s the limit in terms of the types of charts Axis will be able to produce.

 

This is all possible because Axis isn’t so much a standalone webapp as a chart application framework. In order to achieve this level of modularity, Axis was built as an AngularJS app making extensive use of services and providers, meaning it’s relatively simple to swap around various components. As a nice side effect of this, it’s really easy to embed Axis in a wide variety of content management systems — at present, we’ve created a WordPress plugin that integrates really nicely with the media library and is currently one of the more feature-rich chart plugins out there for WordPress, plus a Drupal implementation is being developed by the Axis community. Integrating Axis into a new content management system is as difficult as extending the default export service — for instance, Axisbuilder is a Node-based server app that saves charts directly to a GitHub repo supplied by the user and is intended more for general public use, whereas Axis Server saves chart configurations into a MongoDB database and is intended more for news organisations who want to use it as a centralised internal tool. It can also be used entirely without a server component, depending on the needs of the organisation using it.

 

01102015-Axis3
Main interface for Axis

 

Output is king

Charts are used universally by news organisations, whether that be in print, on the website or in a mobile app. As such, Axis was built to provide for a very wide variety of use cases — you can save Axis charts as a standalone embed code that can be pasted into a blog or forum post, Axis charts can be exported to a CMS, they can be saved as PNG for the web or SVG for print. In fact, print output is an important feature we’ve been developing recently so that chart output is ready to be placed in InDesign or Methode with little-or-no further tweaking. At the moment, basic charts for print at The Times and Sunday Times are produced by hand in Adobe Illustrator. The hope is that we can save our talented illustrators countless hours by dramatically reducing the time it takes to produce the large number of simple line graphs or pie charts needed for a single edition. The extensible configuration system means that customising the design of the output for a new section or title is as difficult as copying a config and CSS file and then customising to suit.

 

Proudly open source

Although Axis has been in development for just over a year, it’s really feature-rich — mainly as a result of working directly with journalists across The Times and Sunday Times to create the functionality they need. There are still a few sundry features we want to implement here and there, but ultimately the rest of version 1 will focus on stability and performance improvements. Version 2 — release date rather far into the future; we’re only in the pre-planning stages — will break away from this, with a restructuring of the internals, a redesign of the interface, and a whole boatload of new features.

Although we’ve built Axis with Times journalists in mind, we truly want it to grow as an open source project and welcome contributions both large and small (for example, we recently added i18n support, and are currently looking for translators to help internationalise the interface into different languages). Though designed to be powerful enough to support major news organisations, Axis is simple enough for anyone to use, and we particularly hope that student newspapers running WordPress will be encouraged to explore data journalism and visualisation using Axis.

For more about Axis and its related projects, please visit axisjs.org or follow us on Twitter at @axisjs. To try using Axis, visit use.axisjs.org.

 


 

Aendrew-Rininsland-profile-picture

 

Ændrew Rininsland is a senior newsroom developer at The Times and Sunday Times and all-around data visualisation enthusiast. In addition to Axis, he’s the lead developer for Doctop.js, generator-strong-d3, Github.js and a ludicrous number of other projects. His work has also been featured by The Guardian, The Economist and the Hackney Citizen, and he recently contributed a chapter to Data Journalism: Mapping the Future?, edited by John Mair and Damian Radcliffe and published by Abramis. Follow him on Twitter and GitHub at @aendrew.

Data Visualisation vs. Text

Simon Rogers has mapped data which ranked 754 beaches around Great Britain for the Guardian Data Store. The visualisation uses a satellite map of the UK, onto which Simon has marked every beach in its correct geographical location. The dots are colour coded to clearly denote the ranking the beach received from the 2012 Good Beach Guide, green representing ‘Recommended’, purple meaning ‘Guideline’, yellow meaning ‘Basic’ and red indicating that the beach failed to reach the Beach Guide’s standards. Users can click on individual dots to get the names of each beach and its ranking.

In this way an enormous mass of information is presented in a small space. It is also presented in a clear and comprehendible way. Users can spend as long as they like ‘reading’ the map and obtain as much or as little information as they wish to from it.

Underneath the map, Simon has written out all 754 beaches, with their ranking alongside it. As he has done so, we can easily compare the use of text to tell a data story with a visualisation. The text takes up significantly more room. It is much harder to find the individual beaches you are interested in and takes more energy and effort to scroll up and down in order to find a particular beach. The sheer mass of information presented in the text makes the story seem like a drag, rather than a fun exploration of the British coastline, as is felt by the visualisation.

However, underneath the map Simon has highlighted key features and findings of the data. He writes: “The report rated 516 out of 754 (68%) UK bathing beaches as having excellent water quality – up 8% on last year. That compares well to 2010, when it rated 421 of 769 beaches as excellent.”

It is not clear from the visualisation alone how many beaches received each rating and it would have been time consuming and difficult for the user to individually count this. Thus text is useful to provide a summary and to highlight key findings alongside a visualisation.

This is therefore a fine example of the way in which visualisations and text complement each other, and demonstrates that, with many data stories, combining visualisation and text creates the richest, most comprehendible and informative narrative.

 

 

Data visualisation and Art

Data is often seen to be as far removed from art as can be. A way of obtaining facts, information and statistics, it is technological rather than personal and beautiful.

Or is it?

Many designers and artists have straddled the line between art and information and used infographics and visualisations to create something that is not only relevant, but beautiful.

Bryan Christie, a New York based designer who regularly produces visualisations for the New York Times, is doing just that. Christie has used a combination of medical text books and MRI scans to reproduce the human hand in a virtual 3D space. The image is both scientific and beautiful.

http://ngm.nationalgeographic.com/2012/05/hands/zimmer-text

Christie said: “The medium I work in is a new form of photography; it is both sculptural and photographic. I model the figures in digital 3D on the computer then use a virtual camera within the computer to take a picture of the piece. There’s an interesting process that occurs in that my work is sculptural and exists in virtual three-dimensional space yet in the end it is viewed in two dimensions much like a photograph.”

Another example of the crossover between data, art and popular culture is Radiohead’s music video for House of Cards from their 2008 album, In Rainbows. The video was created using data visualisations created by Aaron Koblin. It uses 3D plotting technologies to collect information about the shapes and relative distances of objects, including lead singer Thom Yorke’s face, and then visualises the data. The result is eerily beautiful and surprisingly human, with the fragile nature of the lyrics and Yorke’s ethereal vocals perfectly complimented by the ghostly appearance of his face and the disconnected nature of the overall visual image.

http://www.youtube.com/watch?v=8nTFjVm9sTQ

Artists, too, have recognised the potential of data in art. In 2008 the Museum of Modern Art featured an exhibition by Harris and Kamvar, entitled I Want You to Want Me, and which made use of data visualisations to present the pieces. The exhibition, which included a fully interactive 56 inch touch screen installation, chronicles the world’s long-term relationship with romance, and gathered data from a variety of online dating sites in order to give viewers an insight into people’s personal lives. I Want You to Want Me was a beautiful collaboration of computer science, maths and art that uses data to evoke viewers’ emotions at a very personal level. The infographic image further raised questions about the virtual nature of modern relationships regarding dating sites.

http://iwantyoutowantme.org/statement.html

Data visualisations are thus not only an informative way to present a narrative, but can also be a beautiful one. In certain circumstances it can even be considered to be art.

 

 

Why Data Journalism is Important

After studying Data Journalism for a year at City University I have come to appreciate the importance of having the skillset to make the most out of numbers and statistics. Many aspiring journalists still see data as something that is separate from journalism, and as something that does not interest them. In response, I have compiled some reasons why data is increasingly important:

1.       Make sense of Mass Information

Having the skills to scrape, analyse, clean and present data allows journalists to present complicated and otherwise incomprehensible information in a clear way. It is an essential part of journalism to find material and present it to the public. Understanding data allows journalists to do this with large amounts of information, which would otherwise be impossible to understand.

2.       New Approaches to Storytelling

Able to create infographics and visualisations, data journalists can see and present information in a new and interesting way. Stories no longer need to be linear and based solely on text. Data can be grafted into a narrative which people can read visually. Interactive elements of data visualisations allow people to explore the information presented and make sense of it in their own way.

3.       Data Journalism is the Future

Understanding data now will put journalists ahead of the game. Information is increasingly being sourced and presented using data. Journalists who refuse to adapt to the modern, increasingly technological world will be unable to get the best stories, by-lines and scoops and their careers will suffer as a result.

4.       Save Time

No longer must journalists pore over spread-sheets and numbers for hours when there could be a simpler way to organise the information. Being technologically savvy and knowing the skills to apply to data sets can save journalists time when cleaning, organising and making sense of data. Not making mistakes due to lack of knowledge can also save a journalist time.

5.       A way to see things you might otherwise not see

Understanding large data sets can allow journalists to see significant information that they might otherwise have overlooked. Equally, some stories are best told using data visualisations as this enables people to see things that they might otherwise have been unable to understand.

 6.       A way to tell richer stories

Combining traditional methods of storytelling with data visualisations, infographics, video or photographs, creates richer, more interesting and detailed stories.

7.       Data is an essential part of Journalism

Many journalists do not see data as a specialist and separate area of journalism, but an interwoven, essential and important element of it. It is not there to replace traditional methods of finding information, but to enhance them. The journalist that can combine a good contact book and an understanding of data will be invaluable in the future.

INTERACTIVE MAP + How-to: overcrowding in London

 

This heat map, produced on Google Fusion Tables presents the levels of overcrowding in London, as in 2010. The data was published by the London datastore in March 2012 and presented figures for both the base figure (number of houses overcrowded) and that number as a percentage of the whole borough. The information mapped here is the percentage figures, however, the map is interactive, and clicking on each borough will give you the base figure information.

Overcrowding for the purposes of this data set is defined by the ‘bedroom standard’, explained on the London Datastore website:

‘Bedroom standard’ is used as an indicator of occupation density. A standard number of bedrooms is allocated to each household in accordance with its age/sex/marital status composition and the relationship of the members to one another. A separate bedroom is allocated to each married or cohabiting couple, any other person aged 21 or over, each pair of adolescents aged 10 – 20 of the same sex, and each pair of children under 10. Any unpaired person aged 10 – 20 is paired, if possible with a child under 10 of the same sex, or, if that is not possible, he or she is given a separate bedroom, as is any unpaired child under 10. This standard is then compared with the actual number of bedrooms (including bed-sitters) available for the sole use of the household, and differences are tabulated. Bedrooms converted to other uses are not counted as available unless they have been denoted as bedrooms by the informants; bedrooms not actually in use are counted unless uninhabitable.

Overcrowding figures include households with at least one bedroom too few.

The map clearly shows that the borough of Newham has the worst overcrowding situation with 17.9 per cent of households in the borough overcrowded; a figure of 506 households. However, the borough with the highest base rate figure was Enfield, with 670 households being overcrowded, 7.1 per cent of the borough.

In July 2011, Shelter Housing charity announced that their analysis of the English Housing Survey revealed that 391,000 children (24 per cent) in London suffered from overcrowding, which they said was an 18 per cent rise since 2008. Unfortunately the data provided by the Datastore did not break down the data into person specification categories, however, it is clear from the figures that overcrowding in London is a growing problem.

The Fusion Map was created by importing data from a spreadsheet into Google Fusion Tables and then merging this data with KML shape files for London, to get the heat map effect. Each Borough has an identification code, which my original data did not have, so I had to input this manually for each borough in my original data set before merging the two files – this provides a point of reference for the merge, so there are two identical pieces of information for each row to match up with.

I then chose the column I wanted the map to represent, in this case the percentage data. To get the map to show a range of colours related to the data, I set the map to have what are called “buckets”, this is the range of numbers represented by each colour. I then modified the colours I wanted to use with a system called colorbrewer, which allows you to customize colours showing specific colour ranges for heat maps. Finally I modified the information windows for each borough to show specifically the information I wanted, and in the style I wanted – this takes a small amount of HTML know-how.

I hope you enjoy this data map, and are inspired to create your own maps using Google Fusion in the future.

 

Data and the London Mayoral elections – let the data help you decide

The Guardian’s Data Store have compiled a catalogue of all the data to do with London, to give readers an objective and overall view of what is happening in the capital.
Polly Curtis’ reality check on 12 April, pins Boris Johnson and Ken Livingstone against each other and asks, “Who is right on police numbers?” She then outlines the Met police figures for London, highlighting what each party have claimed. Polly’s verdict:

Johnson’s campaign is correct in claiming that police officer numbers have risen over his term, albeit only by 2.4% if you take the baseline to be the March 2008, closest to when he was elected in May. But Labour is right that since 2009, the last year that Livingstone budgeted for, numbers have fallen overall by 1.17%.

An interactive map of the number of cycle and pedestrian road casualties offers in insight into road traffic accident hotspots across London. The map produced by ITO World for campaigning group See Me, Save Me, gathers together a decade of road casualty data from the Department for Transport called Stats19. The map distinguishes between pedestrians and cyclists and divides them up between those who were hit by Heavy Goods Vehicles (HGV) and those that weren’t. It also shows the age of the pedestrian or cyclist and the year of the accident. The stand first of the map asks, “With transport a key issue in the mayoral election, what patterns does this map show?”

In addition to showing a map of poverty and deprivation in London, data behind health, education and crime, the Data Store have also created a map showing how London voted in the last Mayoral elections, to give an overview of the Conservative and Labour areas across the capital.

Looking into data and presenting them as maps and infographics can show the bigger context behind a story and make figures more accessible. The Guardian Data stores compilation of London Data allows readers to get an objective view of their city before making decisions on who to vote for this coming Thursday 3 May. Take a look at the data for yourself at http://www.guardian.co.uk/uk/series/london-the-data before voting begins in a couple of days!

Mapping election data for North Wales

This is the data analysis behind how I, Andrew Stuart, did the data for the Daily Post interactive election map for 2012 on the website for the Daily Post, a regional daily newspaper in North Wales. I used Google Docs and Excel to work with the data we got hold of.

How the story appeared in the newspaper, with what we found through the data.

As a British citizen, I know that getting information for council elections is pretty difficult. How do you vote? Yes, you can vote along party lines, but they are generally dictated by national policy, wherever that may be. Generally, for local council elections, you have to wait for the information to drop through the letter box, or have a story about them.

However, Local councils really are where the stuff that we see and use on a day-to-day basis is done. Rubbish collections, inspecting where we go to eat, repairing the roads, street lighting, and planning. So, the people who decide this policy are important. And, knowing what they’re for, against, or couldn’t give two hoots about matters.

Sadly, writing individual feature pieces on 243 wards, with over 753 residents putting their names forward, for a regional paper covering 6 counties (5 of which are to have elections) is next to impossible. I say next to, because nothing is impossible.

So, when I was at the Daily Post, we decided to use the web to show who was standing where. That way, they are a quick Google search or a reference away to find out more about them. This is what we came up with:

The Election Map. Click the image to go the fusion table

So, how did we do it?

First, you need to gather data. This sounds easier than it is. Some council’s had a nice list of each statement of nomination so you can scroll through. Some had a good word doc for reference. Some had the images saved as PDF files, and are on the website individually. Some had three different areas of the council because the county is so big! All of them were not in the same format.

So, we have to type them out. Not the best way, but the only way. These are new candidates, and the data is not online in any sort of format I can import to Google Docs. Claire Miller for WalesOnline had to do the same thing. For every council in Wales, bar the 5 I did. I do not envy her job.

I typed all the name for a ward into the one cell in the format “Name Middle name surname (party), etc”. The comma is important. I saved three files – the online version, the reference version, and a raw backup.
Using a uniform way of typing means I can parse easily at the comma. This allows the file to be shared around different journalists, so they can cover the patches and get the story behind the story. The single cell one for online works in the info box.

The next bit was to make the map work. For this, I need the KML files. There is an easy way of doing this using ScraperWiki. That would bring all the children of each County Council into a file. What I did, however, was to save each file from mapit.mysociety.org (not that strenuous), then create individual county shapefiles in Google Earth. I then have individual maps, and joining all the wards together allows me to create a whole North Wales map.

Then, merge the two tables – the one with all the prospective councillor details and the one with the shape files into Google Fusion tables, and check for errors. The one which did flag up was Mostyn. There is a Mostyn ward in Conwy and Flintshire. The way around it? Type Mostyn, Conwy, and Mostyn Flintshire. It worked.

All you need to do then is to colour the shapefiles by county. To do this, I put the HTML colour codes in a column on the councillor list, and selected that column as the one for the colours for the polygons, and you have coloured counties.

And to get around the way of Anglesey not having elections? In the Anglesey cells, I typed no election. The info box then shows no election.

That’s how you inform 243 wards of who’s standing where, in one fell swoop, and may I say so, quite beautifully too.

This was originally posted on andrewwgstuart.com. Trinity Mirror own copyright for the cuttings used in this piece. Andrew Stuart created the map. 

Bringing a news story alive with data

A batch of stats may ostensibly seem dry – but I was having a ponder about the benefits of using data to bring depth to a topical news story. And if by magic, one cropped up, that could definitely do with some spreadsheet-based wizardry to soup it up.

The Human Fertilisation and Embryology Authority (HFEA), which will soon be absorbed by the Care Quality Commission, is launching a new drive for donors. Continue reading “Bringing a news story alive with data”