Visweek 2011 is upon us!

VISUALIZATION BLOG

 

The annual IEEE Visualization, IEEE Information Visualization and IEEE Visual Analytics Science and Technology conferences – together known as IEEE Visweekwill be held in Providence, RI from October 23rd to October 28th.The detailed conference program is spectacular and can be downloaded here.Some of the new events this year are under the Professional’s Compass category. It includes a Blind date lunch (where one can meet some researcher they have never met and learn about each others research), Meet the Editors (where one can meet editors from the top graphics and visualization journals), Lunch with the Leaders session (an opportunity to meet famous researchers in the field) and Meet the faculty/postdoc candidates (especially geared towards individuals looking for a postdoctoral position or a faculty position). I think this is an excellent idea and hope that the event is a hit at the conference.

I am also eagerly looking forward towards the two collocated symposia – IEEE Biological Data Visualization (popularly known as biovis) and IEEE LDAV (Large data analysis and visualization).  Their excellent programs are out and I’d encourage you to take a look at them.

The tutorials this year look great and I am particularly looking forward to the tutorial on Perception and Cognition for Visualization, Visual Data Analysis and Computer Graphics by Bernice Rogowitz. Here is anoutline for the tutorial that can be found on her website. She was one of the first people to recommend that people STOP using the rainbow color map.

The telling stories with data workshop too looks great and will be a continuation of the great tutorial held by the same group last year. I am eagerly looking forward to it. [Read more…]

The convergence of big data, baseball and pizza at Strata

SPLUNK BLOG – By Paul Wilke

Last week, I was fortunate enough to attend the Strata Big Data Conference in New York. With the conference spanning four days, two hotels, and over 400 attendees one thing stood out… big data is a hot topic!

Splunk was featured in two sessions. On Tuesday, Splunk CIO Doug Harr was part of a panel discussion on the changing role of the CIO, where he and the panel (which included CIOs from Batchtags, Accenture and Revolution Analytics) pointed out that the CIO  role is changing and expanding. The function has evolved into one of the most crucial positions in corporations focusing on sustainable growth.

On Friday Splunk Product Manager Jake Flomenberg took the stage with Denise Hemke from Salesforce.com to talk about gleaning new insights from massive amounts of machine data. Denise highlighted how at Salesforce a Chatter group is devoted to sharing ideas on how they work with Splunk so they can make the most of Splunk solutions. To highlight the usefulness of big data in a way that just about everyone could relate to, Jake showed how Splunk could be used to find the average price of pizza in New York City – definitely an example of using data for food, not evil!

Jake also gave a great interview at the conference, which you can see here:

[youtube RNGWPg27JVw]

Overall, a great crowd and very strong topics. One of my favorite sessions was current New York Mets’ executive Paul DePodesta talking about the big data behind Moneyball. It’s a shame the Mets aren’t taking it to heart this season. As the Splunk t-shirts we handed out at Strata say, “A petabyte of data is a terrible thing to waste”.

Read the original post on Splunk Blog here.

Data Driven Journalism: The Series Begins…

OKF – By Lucy Chambers

This post is by Lucy Chambers, Community Coordinator at the Open Knowledge Foundation. The post contains a link to a report on the OKF / EJC Data Driven Journalism workshop on EU Spending, which took place in Utrecht, the Netherlands, on 8th-9th September.

 

The report was written by Nicolas Kayser-Bril who attended the workshop, and may be helping to run the next in the series in Warsaw in October… stay tuned to the data-driven journalism mailing list for more on the upcoming workshops…

 

 

“Data journalism is hard, but that’s precisely what makes it worthwhile… Not every journalist has the skills, knowledge or the commitment to dig into the data…so the ones who do are at a massive advantage” – Chris Taggart [paraphrased], closing remarks

 

 

 

The first in what we hope will become series of data-driven journalism events, the European Journalism Centre and the OKF teamed up alongside a crack-team experts to help tackle some of the technical & research-based challenges facing the modern journalist.

 

I have no intention of re-inventing the wheel here by giving a full rundown; Nicolas sums up the workshop & gives his insightful ideas for future workshops in his report on the Data Driven Journalism Blog from the EJC far better than I would. You can read the full report here. But just to whet your appetite here and now, here is a snippet:

 

“As Friedrich Lindenberg was writing this abstruse code on his MacBook plugged on the beamer at the workshop on EU spending on 9 September, 20 journalists listened attentively as data started to speak before their eyes. In a conference room in Utrecht University’s 15th-century Faculty Club, the group from across Europe watched as Lindenberg compared a list of lobbying firms with the list of accredited experts at the European Commission: Any overlap would clearly suggest a conflict of interest.”

“More than watching, the audience actually followed in Lindenberg’s steps on Google Refine, an Excel-like tool, and was taming the data on their own laptops. At this point in time, more journalists were engaging in data-mining in Utrecht than in any other newsroom. This practical exercise was the climax of two days of learning to investigate the mountains of data produced by European institutions. Besides Lindenberg, the coder behindOpenSpending, EU datajournalist Caelainn BarrOpenCorporates founder Chris Taggart and Erik Wesselius of Corporate Europe shared expertise with participants…”

 

 

 

 

The workshop clearly indicated that there is a great demand for practical skill-based workshops amongst journalists to help them to reap maximum benefit from all the data that is available. One person even asked for a week-long version of the workshop, covering everything in more detail!

 

We’ll see about the week-long session, but if you are sorry to have missed the last short workshop, don’t despair, there are more workshops coming soon!

 

Data-journalist? Data-wrangler? Tech geek? New to the field?

 

Will you be in or around Warsaw on 19th October?

 

We will be holding a one-day workshop in Warsaw in the run-up to Open Government Data Camp. The important thing to stress about this workshop is that we are looking to have a good ratio of technical people (e.g. programmers & data wranglers) to journalists, so that we can create smaller groups to really go into detail to get the results, fast!

 

We will post more information about the workshop in the coming days, but places will be limited, so if you are keen (& organised) request an invitation by contacting us now.

 

 

 

 

Strata NY 2011 [Day 1]: The Human Scale of Big Data [VIDEO]

strata911Memorial.jpg

This post was written by Mimi Rojanasakul on Infosthetics.com. She is an artist and designer based in New York, currently pursuing her MFA in Communications Design at Pratt Institute. Say hello or follow her@mimiosity.
The 2011 Strata Conference in New York City kicked off on Thursday with a brief introduction byO’Reilly’s own Ed Dumbill. He ventures a bold assessment of the present social condition and how data science plays into it: the growth of our networks, government, and information feel as if they are slipping out of our control, evolving like a living organism. Despite this, Dumbill is optimistic, placing the hope to navigate this new “synthetic world” on the emerging role of the data scientist. And so sets the stage for the speakers to follow.

The first keynote comes from Rachel Sterne, New York City’s first Chief Digital Officer and a who’s who in the digital media world since her early twenties. Though there was some of the expected bureaucratic language, examples of what was being done with the city’s open data showed very real progress being made in making parts of government more accessible and allowing the public to engage more directly in their community. New York City is uniquely situated for a project of this nature, and the individual citizens are a key factor – densely packed in and cheerfully tagging, tweeting, and looking for someone to share their thoughts with (or perhaps gripe to). Through NYC Digital’s app-building competitions, hackathons, and more accessible web presence, New Yorkers are able to compose their own useful narratives or tools – from finding parking to restaurants on the verge of closing from health code violations. By the people and for the people — or at least an encouraging start.

strataNYCMap.jpg[ New York City evacuation zone map was shared with other parties to protect against heavy internet traffic taking down any individual site ]

On matters of a completely different spatial scale, we turn to Jon Jenkins of NASA’s SETI Institute and Co-Investigator of the Kepler mission. The Kepler satellite, launched in July of 2009, boasts a 100,000 pixel camera that checks for tiny planets blocking a star’s luminescence for over 145,000 stars in its fixed gaze, snapping a photo every 30 minutes with bated breath for potential candidates. As of February 2011, over 1200 planetary candidates were identified. Despite the cosmic scale of Kepler’s investigations, Jenkins’ communicates with a Carl-Sagan-like sense of wonder that is difficult not to get swept up in. Video renderings of distant solar system fly-bys show worlds not unlike our own, a reminder that the motives for some of our greatest accomplishments come from an innate, irrepressible curiosity.

strataKeplerFOV.jpg[ Photo and graphic representation of Kepler’s field of vision ]
strataKeplerSuns.jpg[ Recently discovered planet with two suns ]

Amazon’s John Rauser begins his own talk with a different story about staring at the sky. It’s 1750, Germany, and Tobias Mayer is about to discover the libration (wobble) in the Moon. Rauser argues that it was Mayer’s combination of “engineering sense” and mathematic abilities that allowed him to make the first baby steps toward establishing what we now know as data science. While an earlier presenter,Randy Lea of Teradata, focused mostly on the technological advancements made in the field of big data analytics, Rauser emphasized the human characteristics demanded for this career. Along with the more obvious need for programming fluency and applied math, he cites writing and communication as the first major difference in mediocracy and excellence, along with a strong, self-critical skepticism and passionate curiosity. These last three virtues could just as easily be transplanted into any other field, and judging from the applause and approving tweets, the relevancy clearly struck a nerve with the crowd.

From a design perspective, the obvious continuation to so many of these presentations was the successful visual communication of all this data. My aesthetic cravings immediately subside when Jer Thorp, current Data Artist in Residence at the New York Times, takes the stage. His presentation walks us through a commission to design an algorithm for Michael Arad’s 9/11 memorial that would place names according to the victims’ relationships to one another. Though clustering the 2900 names and 1400 adjacency requests was at first an issue of optimization-by-algorithm, manual typographic layout and human judgement was still necessary to achieve the aesthetic perfection needed. Thorp also made a great point about visualizations not only being an end-product, but a valuable part of the creative process earlier on.

strata911RelationshipViz.jpg[ Early visualization of density of relationships ]

 

[vimeo 23444105]

WTC Names Arrangement Tool from blprnt on Vimeo.

[ Processing tool built to arrange the name clusters by algorithm and by hand ]

 

To be honest, I was skeptical at first of the decision to cluster the names by association rather than simple alphabetization — an unnecessary gimmick for what should be a uncomplicated, moving experience. Part of the power of the Vietnam Memorial was its expression of the enormous magnitude of human casualties with simple typographics, while its logical organization provided map and key for those purposefully looking for one name. But as Thorp explained these adjacencies in context, the beauty of the reasoning began to unfold. First, it is a matter of new ways of understanding. We do not browse, we search. And collecting and visualizing our identity based on our social networks has become second nature. It has the potential to tell stories about each individual’s lives that go beyond the individual experience, creating a physical and imagined space to extend this unifying connectivity.

Overall, it was a humanizing first experience with professional “big data.” Coming from a background in art and design, you could say I had some apprehensions about my ability to understand the myriad of technical disciplines represented at Strata. Despite this, the experience so far has been of unexpected delights — a keenly curated look at where we are with data today.

I admit this first post was low on data visualizations, but there were plenty of interface and graphics talks in the afternoon sessions to share in the next posts. Stay tuned!

Strata Summit 2011: Generating Stories From Data [VIDEO]

As the world of data expands, new challenges arise. The complexity of some datasets can be overwhelming for journalists across the globe who “dig” for a story without the technical skills. Narrative Science’s Kristian Hammond addressed this challenge during last week’s Strata Summit in New York in a presentation about a software platform that helps write stories out of numbers…

[youtube P9hJJCOeIB4]

 

 

Strata Summit 2011: The US Government’s Big Data Opportunity [VIDEO]

So the Strata Summit happened last week and blew our data minds with new ideas and incredible speeches from the best people in the data world. One of the highlights we particularly liked was the conversation about the future of open government data in the US.

Here is a video where Aneesh Chopra, the US Federal Chief Technology Officer, deputy CTO Chris Vein, and Tim O’Reilly, founder and CEO of O’Reilly Media, discuss Obama’s latest visit to New York and the opportunities that big datasets could set for the future…

[youtube 4wdkk9B7qec]

More info on the speakers (from O’Reilly website):

Photo of Aneesh Chopra

Aneesh Chopra

Federal Office of Science and Technology Policy

Chopra serves as the Federal Chief Technology Officer. In this role, Chopra promotes technological innovation to help the country meet its goals from job creation, to reducing health-care costs, to protecting the homeland. Prior to his confirmation, he served as Virginia’s Secretary of Technology. He lead the Commonwealth’s strategy to effectively leverage technology in government reform, to promote Virginia’s innovation agenda, and to foster technology-related economic development. Previously, he worked as Managing Director with the Advisory Board Company, leading the firm’s Financial Leadership Council and the Working Council for Health Plan Executives.

Photo of Chris Vein

Chris Vein

Office of Science and Technology Policy

Chris Vein is the Deputy U.S. Chief Technology Officer for Government Innovation in the White House Office of Science and Technology Policy. In this role, Chris searches for those with transformative ideas, convenes those inside and outside government to explore and test them, and catalyzes the results into a national action plan. .Prior to joining the White House, Chris was the Chief Information Officer (CIO) for the City and County of San Francisco (City) where he led the City in becoming a national force in the application of new media platforms, use of open source applications, creation of new models for expanding digital inclusion, emphasizing “green” technology, and transforming government. This year, Chris was again named to the top 50 public sector CIOs by InformationWeek Magazine. He has been named to Government Technology Magazine’s Top 25: Dreamers, Doers, and Drivers and honored as the Community Broadband Visionary of the Year by the National Association of Telecommunications Officers and Advisors (NATOA). Chris is a sought-after commentator and speaker, quoted in a wide range of news sources from the Economist to Inc. Magazine. In past work lives, Chris has worked in the public sector at Science Applications International Corporation (SAIC), for the American Psychological Association, and in a nonpolitical role, at the White House supporting three Presidents of the United States.

Photo of Tim O'Reilly

Tim O’Reilly

O’Reilly Media, Inc.

Tim O’Reilly is the founder and CEO of O’Reilly Media, Inc., thought by many to be the best computer book publisher in the world. O’Reilly Media also hosts conferences on technology topics, including the O’Reilly Open Source Convention, the Web 2.0 SummitStrata: The Business of Data, and many others. O’Reilly’s Make: magazine and Maker Faire has been compared to the West Coast Computer Faire, which launched the personal computer revolution. Tim’s blog, O’Reilly Radar, “watches the alpha geeks” to determine emerging technology trends, and serves as a platform for advocacy about issues of importance to the technical community. Tim is also a partner atO’Reilly AlphaTech Ventures, O’Reilly’s early stage venture firm, and is on the board of Safari Books Online.

Data-Driven Journalism In A Box: what do you think needs to be in it?

The following post is from Liliana Bounegru (European Journalism Centre), Jonathan Gray (Open Knowledge Foundation), and Michelle Thorne (Mozilla), who are planning a Data-Driven Journalism in a Box session at the Mozilla Festival 2011, which we recently blogged about here. This is cross posted at DataDrivenJournalism.net and on the Mozilla Festival Blog.

We’re currently organising a session on Data-Driven Journalism in a Box at the Mozilla Festival 2011, and we want your input!

In particular:

  • What skills and tools are needed for data-driven journalism?
  • What is missing from existing tools and documentation?

If you’re interested in the idea, please come and say hello on our data-driven-journalism mailing list!

Following is a brief outline of our plans so far…

What is it?

The last decade has seen an explosion of publicly available data sources – from government databases, to data from NGOs and companies, to large collections of newsworthy documents. There is an increasing pressure for journalists to be equipped with tools and skills to be able to bring value from these data sources to the newsroom and to their readers.

But where can you start? How do you know what tools are available, and what those tools are capable of? How can you harness external expertise to help to make sense of complex or esoteric data sources? How can you take data-driven journalism into your own hands and explore this promising, yet often daunting, new field?

A group of journalists, developers, and data geeks want to compile a Data-Driven Journalism In A Box, a user-friendly kit that includes the most essential tools and tips for data. What is needed to find, clean, sort, create, and visualize data — and ultimately produce a story out of data?

There are many tools and resources already out there, but we want to bring them together into one easy-to-use, neatly packaged kit, specifically catered to the needs of journalists and news organisations. We also want to draw attention to missing pieces and encourage sprints to fill in the gaps as well as tighten documentation.

What’s needed in the Box?

  • Introduction
    • What is data?
    • What is data-driven journalism?
    • Different approaches: Journalist coders vs. Teams of hacks & hackers vs. Geeks for hire
    • Investigative journalism vs. online eye candy
  • Understanding/interpreting data:
    • Analysis: resources on statistics, university course material, etc. (OER)
    • Visualization tools & guidelines – Tufte 101, bubbles or graphs?
    • Acquiring data
  • Guide to data sources
  • Methods for collecting your own data
  • FOI / open data
  • Scraping
    • Working with data
  • Guide to tools for non-technical people
  • Cleaning
    • Publishing data
  • Rights clearance
  • How to publish data openly.
  • Feedback loop on correcting, annotating, adding to data
  • How to integrate data story with existing content management systems

What bits are already out there?

What bits are missing?

  • Tools that are shaped to newsroom use
  • Guide to browser plugins
  • Guide to web-based tools

Opportunities with Data-Driven Journalism:

  • Reduce costs and time by building on existing data sources, tools, and expertise.
  • Harness external expertise more effectively
  • Towards more trust and accountability of journalistic outputs by publishing supporting data with stories. Towards a “scientific journalism” approach that appreciates transparent, empirically- backed sources.
  • News outlets can find their own story leads rather than relying on press releases
  • Increased autonomy when journalists can produce their own datasets
  • Local media can better shape and inform media campaigns. Information can be tailored to local audiences (hyperlocal journalism)
  • Increase traffic by making sense of complex stories with visuals.
  • Interactive data visualizations allow users to see the big picture & zoom in to find information relevant to them
  • Improved literacy. Better understanding of statistics, datasets, how data is obtained & presented.
  • Towards employable skills.