Jonathan Stray's background is in computer science. He used to write computer software for Adobe. Then, he started traveling and ended up in Hong Kong as a freelance reporter. "I stopped writing software, but discovered that there were many connections between computing and journalism," Jonathan recalls. For the last two years, Jonathan made those connections by leading interactive stories and data journalism efforts at the news giant AP. His new venture is called The Overview Project. We interviewed him in the lead-up to the Data, Stories & Co event.
What is your definition of data journalism?
To start, my definition of data is: a collection of similar items. For me, data is not just databases or numbers. It can come in many different forms: historical records, computer files, information about the world. If that data is systematic, it can tell us something that individual reports cannot. Data journalism is about looking at broader patterns, not just individual occurrences. In one sentence, data journalism is finding, verifying, reporting on and publishing data in the public interest.
There is this idea that data is truth. But just like in any story, in data journalism, there are editorial choices such as what to highlight and what to leave out. The key is to make good choices so that the story is not deceptive, especially if it includes a visualization.
What is The Overview Project?
The OP is a project to build a tool to help journalists with very large sets of text documents. There are search engines out there, but with thousands of pages of human language, you need a tool to produce a visual presentation of the content. It basically provides you with a visual table of contents. The goal is to find things you didn't know to look for.
For example, public corporations in the US – and I'm sure it's the same in many other countries – have to file disclosure documents when they buy stocks, do a public offering or engage in other financial transactions. This generates 10,000 new documents a day. If journalists want to keep track, in theory, they need to read all these disclosures. The Overview Project wants journalists to be able to find trends and to discover things they did not know about beforehand. That's particularly journalistic.
We have recently made bold steps on transparency, especially with freedom of information laws. We sometimes forget how new these laws are, in many countries only ten or twenty years old. Then the open data movement came along. The result is that there is more data collected today then there used to be and access to data has improved. Now, the basic problem I'm trying to solve is how journalists can analyze all this information. The first prototype will be available by the end of September.
Why is data journalism not everywhere by now?
Data driven journalism is found in more and more media. One reason why it has not yet become mainstream is that it requires special skills that newsrooms have not been looking for. It's not just about writers and reporters anymore. We need to encourage people who like statistics, mathematics and software to join media outlets. Knowledge about data is an essential part of today's journalism. This doesn't mean everyone needs to be a data expert, but everyone in the newsroom needs to understand what data is and when it should be part of the story.
If you want to know more about Jonathan Stray's practice of data journalism, attend Data, Stories & Co. on September 6 in Montreal. Register now: http://media.mcgill.ca