If you’re familiar with Microsoft Excel or Access, you might like Fusion Tables. It’s a free tool that allows you to create interactive maps and charts with data. For journalists, this is fantastic. Fusion Tables unlocks the data stuck in your hard drive and lets you easily share it with readers in a compelling format. Check out some great examples at Matt Stiles’ blog, the Daily Viz.
But which stations rack up the most complaints, flunk the most inspections and cost consumers the most money?
The answers to those questions lurk within inspection data collected by state employees. The information is public. But like many government agencies, Weights and Measures hasn’t been analyzing its own data to look for trends that could help consumers make informed decisions.
So Express-News Data Editor Joe Yerardi downloaded a publicly available copy of the inspection data and took a look at it for himself.
The result was an interesting Sunday story that told readers things that state officials probably should have known themselves.
Joe learned that one out of five stations in San Antonio had at least one pump that failed inspections. The pumps that are more likely to shortchange customers are owned by one of the biggest players in town: Valero Energy Corp.
Joe mapped the locations of the stations and their inspection results, so anyone can check out the track record of their neighborhood gas station.
Joe told me it took nearly four weeks to work on the story. One of the difficulties he faced was sharing what he learned with state officials, who hadn’t analyzed their own database of inspection reports.
“It’s not their job,” Joe said, describing the bureaucratic mentality of some government workers. “It’s not what they’re paid to do.”
Not every government agency is like that, but it’s not an uncommon problem. When I found a San Antonio police database that documented every vehicle pursuit involving officers, I was a bit surprised to learn that SAPD had never analyzed the information, even though it shed light on an important public policy issue.
These agencies probably paid some poor data-entry monkey to go through each paper report and type the details into a spreadsheet or database. Why not go the extra step and analyze that information?
Joe described these kinds of stories as “low-hanging fruit” for journalists, who can step in and analyze databases that agencies aren’t scrutinizing.
“If they would go above and beyond their actual jobs, there’d be less of a need for reporters,” he said.
One of the golden rules of writing is show, don’t tell. The same holds true for stories based on public data. Check out this cool interactive map by Nolan Hicks and Yang Wang showing food stamp recipients by Zip code for the whole state of Texas.
Check out this amazing presentation at Google I/O 2011 about Google Fusion Tables. The whole video is interesting. But for a journalist’s perspective on the importance of making data accessible to readers, at the 34:50 mark Simon Rogers of the Guardian’s Data Blog offers some interesting examples of how journalists can bring “data to life” with Fusion Tables, a free online tool.
Anyone who cares about journalism should read Al Tompkins’ post examining the innovative storytelling techniques that empowered the Las Vegas Sun series “Do No Harm,” a project by reporters Marshall Allen and Alex Richards. The reporters analyzed 2.9 million hospital records that revealed systematic, preventable errors at the local healthcare system. They found more than 300 patients who died from mistakes in 2008 and 2009 that could have been prevented.
Rather than rely on anecdotal sob stories that would be dismissed as scare-mongering by hospitals, the reporters used reader-friendly multimedia presentations to make the data come alive and show, in a powerful way, the scope and human toll of the problem. Thanks to the project, Tompkins writes, six pieces of legislation have been filed in the Nevada Legislature to reform and bring more transparency to the hospital system.
The project took two years — an eternity in journalism time. But it still offers important lessons for journalists. We’re no longer chained to simply telling a story with an 80-inch news article and a few pictures and graphics. We can use the Internet to let readers look over our shoulders and check out the raw documentation and data and videos for themselves. One of the most creative things the Sun did was make it incredibly easy for readers to offer feedback:
When the stories started running, the paper’s phones rang off the hook. Rather than let the calls fall into the digital abyss, the team edited some and provide a sampling of the public’s reaction. They also posted reader reaction to the website, allowing people to share their personal experiences with Vegas-area hospitals.
Marshall Allen invited readers to share their stories using an easy online form.
Because of these storytelling techniques, the project was impossible to ignore. It could prompt change — and save lives.
But where are all these newcomers moving to within Bexar County?
Kelly Guckian, database manager for the San Antonio Express-News, pulled together more detailed population figures from the 2010 Census to help show where Bexar County is gaining residents — and where it’s losing them.
Kelly focused on census tracts, which are geographic boundaries set by the Census Bureau that encompass, on average, about 4,000 people. This allowed her to zoom in on population changes at the neighborhood level. She did the tedious work of compiling and mapping the data, and I helped export it into this interactive Google map that shows how the far West and North sides of the county saw explosive gains in the blue areas, while many inner city neighborhoods in the yellow areas lost residents. Kelly and graphic artist Mark Blackwell also produced maps showing the population trends broken down by race and ethnicity, and MySA’s Mike Howell put it all together in an interesting package online.
The explosive growth on the county’s outskirts occurred during a decade when city officials emphasized the importance of living near downtown and limiting urban sprawl. Our news story about the Census numbers explored why many people either didn’t hear the city’s message — or ignored it.
Until recently, I had no idea this DPS database existed. But I stumbled across it a few months earlier when I was working on this article about pursuits in San Antonio. SAPD keeps a database packed with details about each chase — the weather and road conditions, the pursuit speeds and durations, the injuries and fatalities. Since SAPD had this data, I figured other law enforcement agencies in Texas probably kept similar records. I asked around and sure enough, DPS was one of the agencies that collects details about pursuits.
Why is that a big deal? Well, when you find a previously unknown database with information about an important public safety issue and analyze those digital records, you’ll probably discover fresh, interesting information for your readers. Public databases empower journalists to do their own research and find surprising answers.
Brandi asked for a copy of the data and we received it from DPS with little trouble. It was a big spreadsheet documenting nearly 5,000 pursuits from 2005 to July 2010.
One detail jumped out at us: Hidalgo County, by far, had the most pursuits over the past five years — 656. Several other border counties also ranked high, suggesting smugglers were often fleeing DPS troopers. The database told us all kinds of things about these pursuits — how often people were injured, how often motorists escaped, and how they got away.
When reporters dive into data-heavy topics, it’s important to find the real people behind the numbers. We asked DPS early in the reporting process to go on a ride-along with a trooper in Hidalgo County. Brandi and photographer Callie Richmond visited McAllen and went on a ride along with DPS Trooper Johnny Hernandez. Their experience became the lede of our story. Brandi had some great interviews with Hernandez and other troopers in Hidalgo County, who openly talked about their continual struggles to catch smugglers from Mexico. The visit provided rich material for photos and an awesome online video that Callie produced.
Brandi wrote a big chunk of the article on the drive back from McAllen. We finished writing and editing the story in a Google Document, which really beats sending e-mails back and forth and losing track of differing versions of the story. Google Docs lets you see what each collaborator is adding to the document as they write. It’s like the Big Brother version of Microsoft Word, but less evil. It’s a useful tool for collaborating with people, especially if they work in a different organization in a different city. Plus, Google gives you a chat window in the document, which is nice if you want to mock the typing skills of your colleagues.
There were some interesting reactions to the story. Scott Henson at Grits for Breakfast was surprised so many suspects got away: “I would not have guessed that the number of chases ending with the suspect successfully eluding troopers on foot would have been so high, nor that the proportion who stop and surrender would be so low.”
KXXV TV localized the story by looking at the high number of pursuits in McLennan County.
That’s the great thing about news stories based on public data — people can take the information you found, talk about it, and look at the data themselves.
Google unveiled a free tool for journalists who are interested in analyzing public data. Google Refine is a “power tool for working with messy data.” It helps import information and clean up data-entry problems that lurk in many government databases.
It’s open to everyone but it looks like Google created this tool with an eye on computer-assisted reporting. Google’s introductory video touts “Dollars for Docs,” a data-driven story by ProPublica that showed how drug companies paid doctors to promote their products.
Analyzing databases is a niche skill in newsrooms. Not all reporters are comfortable doing queries in Microsoft Access or sifting through thousands of computerized records, but those skills can really empower reporters who are trying to make sense of a complicated world. Columbia Journalism Review published a great profile of Daniel Gilbert, a reporter for the Bristol Herald Courier who came across a potential blockbuster of a story about unpaid royalties from mineral rights. But the issue was so complex he didn’t know how to unlock it.
His editor persuaded the newspaper’s publisher to pay for Gilbert to attend a database boot camp at Investigative Reporters and Editors, and Gilbert learned skills that helped him piece together the gas royalties puzzle. The result: “Underfoot, out of reach“, a series of stories that showed how millions of dollars owed to landowners had been tied up in an “an opaque state-run escrow fund, where it has accumulated with scant oversight for nearly 20 years.” Gilbert won the Pulitzer Prize.
I haven’t played around with Google Refine yet, but I hope it encourages more journalists to take the plunge into computer-assisted reporting. There are some amazing, data-driven stories to be told out there. We just need more people to tell them.
(h/t: Jennifer Peebles)
Two refineries owned by oil giant BP account for 97 percent of all flagrant violations found in the refining industry by government safety inspectors over the past three years, a Center for Public Integrity analysis shows. Most of BP’s citations were classified as “egregious willful” by the Occupational Safety and Health Administration and reflect alleged violations of a rule designed to prevent catastrophic events at refineries.
This package is a nice example of computer-assisted reporting that sheds light on an important issue. Plus, the compelling findings are easy to share — the map of violations is embeddable, which gives an incentive for blogs to pick up the story.
The state agency oversees the amusement-ride industry. When a patron is seriously injured, the ride owner is supposed to report the injury to the department of insurance, and the information is typed into a database.
“Tedeso has put together a searchable database of injuries from our data,” department spokesman Jerry Hagins informed his colleagues in the July 2009 e-mail.
“Can he do this?????????” replied Richard Baker, a manager at the agency.
A question with nine question marks deserves an answer: Yes, we can do this. In fact, news organizations and blogs ought to do this.
Government agencies collect reams of data about important issues. When journalists find that data, analyze it, and share it with the public, we help readers make sense of a complicated world. That’s our mission. And that’s why news sites are publishing “data centers” with unique and useful information.