The theme of last week’s CIJ Summer Conference 2016 was Data Journalism. It focused on the importance of data and documents in investigative journalism and shed light on various types of documents and data – whether it was accounts and statistics in an excel sheet or highly classified and sensitive data about corruption and tax avoidance. Veteran data journalists also shared their personal experiences about the change in the process of accessing and collecting, analysing, scraping, coding and encrypting data.
The first speaker was James B Steele, a contributing editor at Vanity Fair and the winner of more than 50 national journalism awards in the USA. Mr. Steele emphasized on the importance of documents in investigative journalism and told the audience of more than 100 at Goldsmiths University that there were no set rules on where to look for documents, because every story is different and therefore, the need for documents in each story is different. Sharing some of his experiences, he told the audience how one post box address led him to find a breaking story that the US Government had signed a contract with a very low-level house rebuilding company with no experience in security and financial dealings to oversee the transfer of $12 billions cash to Iraqi government.
Using examples of how he has worked since high school and what lessons he learned throughout his career, he gave the following tips to journalist:
– Documents are crucial in investigative journalism, but documents by no means replace the interviews. “You just need to observe the documents before conducting the interviews.”
– Avoid using anonymous sources in investigative journalism as much as you can.
– Use the documents in stories when people don’t want to talk to you.
– Be aware of doctored documents
– There are no set rules on where to look for documents. Start looking for documents wherever possible.
– Don’t trust any source. Examine any documents you receive from anyone.
– Write as you go along. Don’t wait to complete everything first, then begin the process of writing.
The second speaker at the day was The Intercept’s Editor, Betsy Reed.
Ms. Reed’s topic was about The Drone Papers. She spoke about surveillance, whistleblowers and governments’ collection of data, in most cases unlawfully. According to Ms. Reed, a whistleblower is often an insider who is disturbed about the practices [of the government that are not in accordance with the rules of the country].
Using PowerPoint slides, Ms. Reed tried to illustrate to audience how the US government should have operated when it came to drone attacks and how they were doing it in reality. It demonstrated that the government was jumping many steps and in most cases the US president did not authorize the drone attacks.
The afternoon session in the conference was organized in small groups where journalists were provided with opportunities to meet with speakers and editors and discuss data related matters.
In the afternoon session, the participants had lots of questions from a team of data editors about various topics including where to look for data, how to convert various file types, what data sources to use and what tools to use for scraping huge data.
Below are some of the useful tips the data editors shared with the participants.
Converting PDF to text/excel
Optical character recognition
Tabula – free
Abbyfinereader – free
Documentcloud.org – free, needs registration, made for journalists
Adobe Acrobat – paid for Creative Cloud account needed
Cometdocs.com – free
Nicar – go to ire.org – join NICAR and get access to tip sheets. Rob Gebeloff – pdf tip sheet
Data sources – international
Us lobby docs
Usaspending.gov – all us federal spending
Googleguide.com – cheat sheet on advanced operators
Find like-minded people through meetups =
Beautiful soup – python
Helium scraper – add-on to browser
Then the last session was the Panama Papers, but that was under the Chatham House Rules. No recording or social media.