Posts Tagged ‘Computer-Assisted Reporting’

How to create maps and charts with Google Fusion Tables

Monday, October 15th, 2012

The friendly folks at the Association of Health Care Journalists held a conference last week in San Antonio, and they invited me to present an introduction about Google Fusion Tables.

If you’re familiar with Microsoft Excel or Access, you might like Fusion Tables. It’s a free tool that allows you to create interactive maps and charts with data. For journalists, this is fantastic. Fusion Tables unlocks the data stuck in your hard drive and lets you easily share it with readers in a compelling format. Check out some great examples at Matt Stiles’ blog, the Daily Viz.

If you’re interested in learning more, check out this slideshow for a step-by-step tutorial about some of the basics.

Nickel and dimed: Find out which gas stations have faulty pumps that overcharge motorists

Sunday, June 3rd, 2012

Valero Station in San Antonio

If you’ve ever suspected your neighborhood gas station is stiffing you at the pump, you might already know you can file a complaint with the Weights and Measures Program at the Texas Department of Agriculture. The agency’s inspectors verify the accuracy of gas pumps.

But which stations rack up the most complaints, flunk the most inspections and cost consumers the most money?

The answers to those questions lurk within inspection data collected by state employees. The information is public. But like many government agencies, Weights and Measures hasn’t been analyzing its own data to look for trends that could help consumers make informed decisions.

So Express-News Data Editor Joe Yerardi downloaded a publicly available copy of the inspection data and took a look at it for himself.

The result was an interesting Sunday story that told readers things that state officials probably should have known themselves.

Joe learned that one out of five stations in San Antonio had at least one pump that failed inspections. The pumps that are more likely to shortchange customers are owned by one of the biggest players in town: Valero Energy Corp.

Joe mapped the locations of the stations and their inspection results, so anyone can check out the track record of their neighborhood gas station.

Joe told me it took nearly four weeks to work on the story. One of the difficulties he faced was sharing what he learned with state officials, who hadn’t analyzed their own database of inspection reports.

“It’s not their job,” Joe said, describing the bureaucratic mentality of some government workers. “It’s not what they’re paid to do.”

Not every government agency is like that, but it’s not an uncommon problem. When I found a San Antonio police database that documented every vehicle pursuit involving officers, I was a bit surprised to learn that SAPD had never analyzed the information, even though it shed light on an important public policy issue.

These agencies probably paid some poor data-entry monkey to go through each paper report and type the details into a spreadsheet or database. Why not go the extra step and analyze that information?

Joe described these kinds of stories as “low-hanging fruit” for journalists, who can step in and analyze databases that agencies aren’t scrutinizing.

“If they would go above and beyond their actual jobs, there’d be less of a need for reporters,” he said.

(Photo credit: Derrich on Flickr)

Impact of the recession: Google map shows Texas food stamp recipients, by neighborhood

Sunday, November 20th, 2011

One of the golden rules of writing is show, don’t tell. The same holds true for stories based on public data. Check out this cool interactive map by Nolan Hicks and Yang Wang showing food stamp recipients by Zip code for the whole state of Texas.

Transform a dull spreadsheet into a compelling, interactive map for readers

Tuesday, May 31st, 2011

Check out this amazing presentation at Google I/O 2011 about Google Fusion Tables. The whole video is interesting. But for a journalist’s perspective on the importance of making data accessible to readers, at the 34:50 mark Simon Rogers of the Guardian’s Data Blog offers some interesting examples of how journalists can bring “data to life” with Fusion Tables, a free online tool.

How two Pulitzer finalists used public data and the Internet to connect with readers

Monday, April 25th, 2011

Anyone who cares about journalism should read Al Tompkins’ post examining the innovative storytelling techniques that empowered the Las Vegas Sun series “Do No Harm,” a project by reporters Marshall Allen and Alex Richards. The reporters analyzed 2.9 million hospital records that revealed systematic, preventable errors at the local healthcare system. They found more than 300 patients who died from mistakes in 2008 and 2009 that could have been prevented.

Rather than rely on anecdotal sob stories that would be dismissed as scare-mongering by hospitals, the reporters used reader-friendly multimedia presentations to make the data come alive and show, in a powerful way, the scope and human toll of the problem. Thanks to the project, Tompkins writes, six pieces of legislation have been filed in the Nevada Legislature to reform and bring more transparency to the hospital system.

The project took two years — an eternity in journalism time. But it still offers important lessons for journalists. We’re no longer chained to simply telling a story with an 80-inch news article and a few pictures and graphics. We can use the Internet to let readers look over our shoulders and check out the raw documentation and data and videos for themselves. One of the most creative things the Sun did was make it incredibly easy for readers to offer feedback:

When the stories started running, the paper’s phones rang off the hook. Rather than let the calls fall into the digital abyss, the team edited some and provide a sampling of the public’s reaction. They also posted reader reaction to the website, allowing people to share their personal experiences with Vegas-area hospitals.

Marshall Allen invited readers to share their stories using an easy online form.

Because of these storytelling techniques, the project was impossible to ignore. It could prompt change — and save lives.

Interactive Census map shows population trends in Bexar County and San Antonio

Friday, March 11th, 2011


View Larger Map

Last month, the U.S. Census Bureau announced the latest population figures for Texas, and the numbers showed Bexar County had gained nearly 332,000 people in the past decade.

But where are all these newcomers moving to within Bexar County?

Kelly Guckian, database manager for the San Antonio Express-News, pulled together more detailed population figures from the 2010 Census to help show where Bexar County is gaining residents — and where it’s losing them.

Kelly focused on census tracts, which are geographic boundaries set by the Census Bureau that encompass, on average, about 4,000 people. This allowed her to zoom in on population changes at the neighborhood level. She did the tedious work of compiling and mapping the data, and I helped export it into this interactive Google map that shows how the far West and North sides of the county saw explosive gains in the blue areas, while many inner city neighborhoods in the yellow areas lost residents. Kelly and graphic artist Mark Blackwell also produced maps showing the population trends broken down by race and ethnicity, and MySA’s Mike Howell put it all together in an interesting package online.

The explosive growth on the county’s outskirts occurred during a decade when city officials emphasized the importance of living near downtown and limiting urban sprawl. Our news story about the Census numbers explored why many people either didn’t hear the city’s message — or ignored it.

Telling stories with data: Police chases and drug smugglers on the Texas-Mexico border

Friday, November 26th, 2010

After the Express-News and the Texas Tribune collaborated last month on a story about concealed handgun permits, Brandi, Matt and I were jazzed about the results and started talking about what to work on next. Here’s what we came up with: An analysis of nearly 5,000 vehicle-pursuit reports kept by the Texas Department of Public Safety.

Until recently, I had no idea this DPS database existed. But I stumbled across it a few months earlier when I was working on this article about pursuits in San Antonio. SAPD keeps a database packed with details about each chase — the weather and road conditions, the pursuit speeds and durations, the injuries and fatalities. Since SAPD had this data, I figured other law enforcement agencies in Texas probably kept similar records. I asked around and sure enough, DPS was one of the agencies that collects details about pursuits.

Why is that a big deal? Well, when you find a previously unknown database with information about an important public safety issue and analyze those digital records, you’ll probably discover fresh, interesting information for your readers. Public databases empower journalists to do their own research and find surprising answers.

Brandi asked for a copy of the data and we received it from DPS with little trouble. It was a big spreadsheet documenting nearly 5,000 pursuits from 2005 to July 2010.

One detail jumped out at us: Hidalgo County, by far, had the most pursuits over the past five years — 656. Several other border counties also ranked high, suggesting smugglers were often fleeing DPS troopers. The database told us all kinds of things about these pursuits — how often people were injured, how often motorists escaped, and how they got away.

When reporters dive into data-heavy topics, it’s important to find the real people behind the numbers. We asked DPS early in the reporting process to go on a ride-along with a trooper in Hidalgo County. Brandi and photographer Callie Richmond visited McAllen and went on a ride along with DPS Trooper Johnny Hernandez. Their experience became the lede of our story. Brandi had some great interviews with Hernandez and other troopers in Hidalgo County, who openly talked about their continual struggles to catch smugglers from Mexico. The visit provided rich material for photos and an awesome online video that Callie produced.

Brandi wrote a big chunk of the article on the drive back from McAllen. We finished writing and editing the story in a Google Document, which really beats sending e-mails back and forth and losing track of differing versions of the story. Google Docs lets you see what each collaborator is adding to the document as they write. It’s like the Big Brother version of Microsoft Word, but less evil. It’s a useful tool for collaborating with people, especially if they work in a different organization in a different city. Plus, Google gives you a chat window in the document, which is nice if you want to mock the typing skills of your colleagues.

Why bother teaming with the Tribune? I blogged earlier about how I’m warming up to the touchy feely trend of collaboration in journalism — how it helps overworked reporters tackle stories, and broadens their reach with a wider audience when the final product is published. When our story ran Sunday, it was published in the Express-News, the Texas Tribune, the Houston Chronicle and the New York Times.

The collaboration also helped us post online goodies for readers hungry for more information. Matt Stiles made an interactive county map of Texas. I used DocumentCloud to post this annotated copy of a pursuit report that offered context from the pursuit data. Callie’s YouTube video was a very cool mini-documentary that explained the issue. We also posted the data online, allowing readers to learn about pursuits in their own counties.

There were some interesting reactions to the story. Scott Henson at Grits for Breakfast was surprised so many suspects got away: “I would not have guessed that the number of chases ending with the suspect successfully eluding troopers on foot would have been so high, nor that the proportion who stop and surrender would be so low.”

KXXV TV localized the story by looking at the high number of pursuits in McLennan County.

That’s the great thing about news stories based on public data — people can take the information you found, talk about it, and look at the data themselves.

Google Refine: A tool for journalists looking for great stories in data

Thursday, November 11th, 2010

Google unveiled a free tool for journalists who are interested in analyzing public data. Google Refine is a “power tool for working with messy data.” It helps import information and clean up data-entry problems that lurk in many government databases.

It’s open to everyone but it looks like Google created this tool with an eye on computer-assisted reporting. Google’s introductory video touts “Dollars for Docs,” a data-driven story by ProPublica that showed how drug companies paid doctors to promote their products.

Analyzing databases is a niche skill in newsrooms. Not all reporters are comfortable doing queries in Microsoft Access or sifting through thousands of computerized records, but those skills can really empower reporters who are trying to make sense of a complicated world. Columbia Journalism Review published a great profile of Daniel Gilbert, a reporter for the Bristol Herald Courier who came across a potential blockbuster of a story about unpaid royalties from mineral rights. But the issue was so complex he didn’t know how to unlock it.

His editor persuaded the newspaper’s publisher to pay for Gilbert to attend a database boot camp at Investigative Reporters and Editors, and Gilbert learned skills that helped him piece together the gas royalties puzzle. The result: “Underfoot, out of reach“, a series of stories that showed how millions of dollars owed to landowners had been tied up in an “an opaque state-run escrow fund, where it has accumulated with scant oversight for nearly 20 years.” Gilbert won the Pulitzer Prize.

I haven’t played around with Google Refine yet, but I hope it encourages more journalists to take the plunge into computer-assisted reporting. There are some amazing, data-driven stories to be told out there. We just need more people to tell them.

(h/t: Jennifer Peebles)

Data analysis shows BP suffered from a ‘systematic safety problem’ before oil spill

Tuesday, May 18th, 2010

The map says it all, but be sure to read the story by the Center for Public Integrity about the vast oil spill in the Gulf of Mexico:

Two refineries owned by oil giant BP account for 97 percent of all flagrant violations found in the refining industry by government safety inspectors over the past three years, a Center for Public Integrity analysis shows. Most of BP’s citations were classified as “egregious willful” by the Occupational Safety and Health Administration and reflect alleged violations of a rule designed to prevent catastrophic events at refineries.

This package is a nice example of computer-assisted reporting that sheds light on an important issue. Plus, the compelling findings are easy to share — the map of violations is embeddable, which gives an incentive for blogs to pick up the story.

The Knight Foundation offers some more information about how the story tapped into the resources of DocumentCloud to post government records about the oil spill.

Government official shocked — shocked! — when public data is posted online

Monday, May 17th, 2010

Texas state officials surprised when public data is posted online

Karisa King and I were cleaning our corner of the newsroom last week, and I rediscovered this gem of an e-mail written by an official for the Texas Department of Insurance.

The state agency oversees the amusement-ride industry. When a patron is seriously injured, the ride owner is supposed to report the injury to the department of insurance, and the information is typed into a database.

For a story I wrote about the safety of amusement rides, we obtained a copy of the injury database, and the Express-News posted the data online.

That disturbed at least one state official.

“Tedeso has put together a searchable database of injuries from our data,” department spokesman Jerry Hagins informed his colleagues in the July 2009 e-mail.

“Can he do this?????????” replied Richard Baker, a manager at the agency.

A question with nine question marks deserves an answer: Yes, we can do this. In fact, news organizations and blogs ought to do this.

Government agencies collect reams of data about important issues. When journalists find that data, analyze it, and share it with the public, we help readers make sense of a complicated world. That’s our mission. And that’s why news sites are publishing “data centers” with unique and useful information.

Want to learn the salary of the city manager of San Antonio? Check a public-salary database.

Curious what litterbugs have been dumping on roadways? Check the state’s “Dont Mess with Texas” database.

Wondering where it’s safe to drive in San Antonio during a downpour? Check a map of low-water crossings, which was created from a city database.

Can we post this data?

Yep.

Should we?

Absolutely.