Computer-Assisted Reporting

  • Up in Flames: Flares wasting natural gas in the Eagle Ford Shale

    If you drive through the bustling oil patch of the Eagle Ford Shale near San Antonio, it won’t take long to find the surreal sight of flares burning natural gas like perpetual bonfires.

    Natural gas is cheap. Pipelines are expensive. So instead of collecting the fossil fuel, many oil and gas operators build tall, metallic spires called flare stacks to burn the gas and release it into the Texas sky.

    Natural gas flareFor years, no one could say with any certainty how much natural gas was going to waste. Everyone knew flaring in shale country was a problem. But officials at the Railroad Commission of Texas, the state agency that oversees the oil and gas industry, had never released figures showing how much was being burned in the Eagle Ford.

    Instead, the agency released only statewide figures showing the overall volume of flaring was low compared to overall production — about one percent.

    Whenever a government agency touts rosy statistics, there’s probably a database behind those numbers. And if you obtain that raw data, you might be able to figure out what’s really going on.

    Today’s Express-News story about flares burning 20 billion cubic feet of natural gas so far in 2014 is a good reminder of the value of public databases — and why journalists need to get their hands on them to analyze the records for themselves.

    There’s no question analyzing data can be a lot of work. We filed an open records request with the Railroad Commission for a copy of the flaring data in the spring of 2013. It’s a huge database of monthly reports showing how much oil and gas is produced in Texas and where those hydrocarbons go. Flaring and venting are one of the “disposition” categories in the data.

    I drove to the agency’s Austin headquarters with a flash drive that could handle the enormous database. It was a beast — more than 25 gigabytes of 85 million records. All that summer we used software to convert the Railroad Commission’s archaic data to CSV files, a format we could use in the newsroom. After that, it took weeks to crunch the numbers and uncover the hidden pitfalls.

    Why go through the hassle? Why should frazzled journalists take the time to learn how to analyze data? Don’t we have enough to do?

    The answers is, journalists need to know a lot of skills — how to interview people, how to write clearly, how to find information. Analyzing public data should be a part of that skill set. It opens doors to stories that couldn’t otherwise be told. This is what journalism is all about.

    When we were finished reviewing the flaring data, our analysis showed that the volume of flared gas in Texas had increased by 400 percent since 2009. And most of that gas came from the Eagle Ford Shale near San Antonio. This chart essentially told the story of flaring in the shale that no one had figured out — not even state officials:

    Quantifying the volume of flared gas opened up new questions and possibilities. When Projects Editor David Sheppard asked how much air pollution was created by all this flaring, we found out there was a way to calculate an estimate. We obtained emails from the state’s environmental agency, the Texas Commission on Environmental Quality, that showed how to estimate levels of air pollution created by gas flares. Those formulas were based on the volume of flared gas – which we had. So we plugged those numbers into Excel spreadsheets to come up with the amounts of sulfur, volatile organic compounds and other pollutants that came from flaring in the region.

    In August, the Express-News published the results of our investigation, Up in Flames. The total volume of wasted gas in the shale from 2009 to 2012 was almost 39 billion cubic feet — enough to meet the annual heating and cooking needs for all 335,700 residential customers who relied on gas last year in CPS Energy’s service area, which includes San Antonio.

    Sunday’s story is based on a fresh batch of flaring figures obtained by Express-News Data Editor Joseph Kokenge, who scraped the data directly from the Railroad Commission’s website.

    The new numbers for 2013 and 2014 show that flares burned and wasted even more of the fossil fuel. In the first seven months of 2014, more than 20 billion cubic feet of gas went up in smoke — enough to fuel CPS Energy’s 800 megawatt Rio Nogales power plant during the same time frame.

  • How to create maps and charts with Google Fusion Tables

    The friendly folks at the Association of Health Care Journalists held a conference last week in San Antonio, and they invited me to present an introduction about Google Fusion Tables.

    How to create maps and charts with Google Fusion Tables   John TedescoIf you’re familiar with Microsoft Excel or Access, you might like Fusion Tables. It’s a free tool that allows you to create interactive maps and charts with data. For journalists, this is fantastic. Fusion Tables unlocks the data stuck in your hard drive and lets you easily share it with readers in a compelling format. Check out some great examples at Matt Stiles’ blog, the Daily Viz.

    If you’re interested in learning more, check out this slideshow for a step-by-step tutorial about some of the basics.

  • Nickel and dimed: Find out which gas stations have faulty pumps that overcharge motorists

    Valero Station in San Antonio

    If you’ve ever suspected your neighborhood gas station is stiffing you at the pump, you might already know you can file a complaint with the Weights and Measures Program at the Texas Department of Agriculture. The agency’s inspectors verify the accuracy of gas pumps.

    But which stations rack up the most complaints, flunk the most inspections and cost consumers the most money?

    The answers to those questions lurk within inspection data collected by state employees. The information is public. But like many government agencies, Weights and Measures hasn’t been analyzing its own data to look for trends that could help consumers make informed decisions.

    So Express-News Data Editor Joe Yerardi downloaded a publicly available copy of the inspection data and took a look at it for himself.

    The result was an interesting Sunday story that told readers things that state officials probably should have known themselves.

    Joe learned that one out of five stations in San Antonio had at least one pump that failed inspections. The pumps that are more likely to shortchange customers are owned by one of the biggest players in town: Valero Energy Corp.

    Joe mapped the locations of the stations and their inspection results, so anyone can check out the track record of their neighborhood gas station.

    Joe told me it took nearly four weeks to work on the story. One of the difficulties he faced was sharing what he learned with state officials, who hadn’t analyzed their own database of inspection reports.

    “It’s not their job,” Joe said, describing the bureaucratic mentality of some government workers. “It’s not what they’re paid to do.”

    Not every government agency is like that, but it’s not an uncommon problem. When I found a San Antonio police database that documented every vehicle pursuit involving officers, I was a bit surprised to learn that SAPD had never analyzed the information, even though it shed light on an important public policy issue.

    These agencies probably paid some poor data-entry monkey to go through each paper report and type the details into a spreadsheet or database. Why not go the extra step and analyze that information?

    Joe described these kinds of stories as “low-hanging fruit” for journalists, who can step in and analyze databases that agencies aren’t scrutinizing.

    “If they would go above and beyond their actual jobs, there’d be less of a need for reporters,” he said.

    (Photo credit: Derrich on Flickr)

  • Transform a dull spreadsheet into a compelling, interactive map for readers

    Google I O 2011  Managing and visualizing your geospatial data with Fusion Tables   YouTubeCheck out this amazing presentation at Google I/O 2011 about Google Fusion Tables.

    The whole video is interesting. But for a journalist’s perspective on the importance of making data accessible to readers, at the 34:50 mark Simon Rogers of the Guardian’s Data Blog offers some interesting examples of how journalists can bring “data to life” with Fusion Tables, a free online tool.

  • How two Pulitzer finalists used public data and the Internet to connect with readers

    Anyone who cares about journalism should read Al Tompkins’ post examining the innovative storytelling techniques that empowered the Las Vegas Sun series “Do No Harm,” a project by reporters Marshall Allen and Alex Richards. The reporters analyzed 2.9 million hospital records that revealed systematic, preventable errors at the local healthcare system. They found more than 300 patients who died from mistakes in 2008 and 2009 that could have been prevented.

    Pulitzer finalist  Do No Harm  from Las Vegas Sun   YouTubeRather than rely on anecdotal sob stories that would be dismissed as scare-mongering by hospitals, the reporters used reader-friendly multimedia presentations to make the data come alive and show, in a powerful way, the scope and human toll of the problem. Thanks to the project, Tompkins writes, six pieces of legislation have been filed in the Nevada Legislature to reform and bring more transparency to the hospital system.

    The project took two years — an eternity in journalism time. But it still offers important lessons for journalists. We’re no longer chained to simply telling a story with an 80-inch news article and a few pictures and graphics. We can use the Internet to let readers look over our shoulders and check out the raw documentation and data and videos for themselves. One of the most creative things the Sun did was make it incredibly easy for readers to offer feedback:

    When the stories started running, the paper’s phones rang off the hook. Rather than let the calls fall into the digital abyss, the team edited some and provide a sampling of the public’s reaction. They also posted reader reaction to the website, allowing people to share their personal experiences with Vegas-area hospitals.

    Marshall Allen invited readers to share their stories using an easy online form.

    Because of these storytelling techniques, the project was impossible to ignore. It could prompt change — and save lives.

  • Interactive Census map shows population trends in Bexar County and San Antonio

    Last month, the U.S. Census Bureau announced the latest population figures for Texas, and the numbers showed Bexar County had gained nearly 332,000 people in the past decade.

    But where are all these newcomers moving to within Bexar County?

    Kelly Guckian, database manager for the San Antonio Express-News, pulled together more detailed population figures from the 2010 Census to help show where Bexar County is gaining residents — and where it’s losing them.

    Stone Oak rooftops in San Antonio

    Kelly focused on census tracts, which are geographic boundaries set by the Census Bureau that encompass, on average, about 4,000 people. This allowed her to zoom in on population changes at the neighborhood level. She did the tedious work of compiling and mapping the data, and I helped export it into this interactive Google map that shows how the far West and North sides of the county saw explosive gains in the blue areas, while many inner city neighborhoods in the yellow areas lost residents. Kelly and graphic artist Mark Blackwell also produced maps showing the population trends broken down by race and ethnicity, and MySA’s Mike Howell put it all together in an interesting package online.

    The explosive growth on the county’s outskirts occurred during a decade when city officials emphasized the importance of living near downtown and limiting urban sprawl. Our news story about the Census numbers explored why many people either didn’t hear the city’s message — or ignored it.

  • Telling stories with data: Police chases and drug smugglers on the Texas-Mexico border

    After the Express-News and the Texas Tribune collaborated last month on a story about concealed handgun permits, Brandi, Matt and I were jazzed about the results and started talking about what to work on next. Here’s what we came up with: An analysis of nearly 5,000 vehicle-pursuit reports kept by the Texas Department of Public Safety.

    Drug runners drive into Rio Grande River
    Drug runners crash into the Rio Grande River (Source: Texas DPS)
    Until recently, I had no idea this DPS database existed. But I stumbled across it a few months earlier when I was working on this article about pursuits in San Antonio. SAPD keeps a database packed with details about each chase — the weather and road conditions, the pursuit speeds and durations, the injuries and fatalities. Since SAPD had this data, I figured other law enforcement agencies in Texas probably kept similar records. I asked around and sure enough, DPS was one of the agencies that collects details about pursuits.

    Why is that a big deal? Well, when you find a previously unknown database with information about an important public safety issue and analyze those digital records, you’ll probably discover fresh, interesting information for your readers. Public databases empower journalists to do their own research and find surprising answers.

    Brandi asked for a copy of the data and we received it from DPS with little trouble. It was a big spreadsheet documenting nearly 5,000 pursuits from 2005 to July 2010.

    One detail jumped out at us: Hidalgo County, by far, had the most pursuits over the past five years — 656. Several other border counties also ranked high, suggesting smugglers were often fleeing DPS troopers. The database told us all kinds of things about these pursuits — how often people were injured, how often motorists escaped, and how they got away.

    When reporters dive into data-heavy topics, it’s important to find the real people behind the numbers. We asked DPS early in the reporting process to go on a ride-along with a trooper in Hidalgo County. Brandi and photographer Callie Richmond visited McAllen and went on a ride along with DPS Trooper Johnny Hernandez. Their experience became the lede of our story. Brandi had some great interviews with Hernandez and other troopers in Hidalgo County, who openly talked about their continual struggles to catch smugglers from Mexico. The visit provided rich material for photos and an awesome online video that Callie produced.

    Brandi wrote a big chunk of the article on the drive back from McAllen. We finished writing and editing the story in a Google Document, which really beats sending e-mails back and forth and losing track of differing versions of the story. Google Docs lets you see what each collaborator is adding to the document as they write. It’s like the Big Brother version of Microsoft Word, but less evil. It’s a useful tool for collaborating with people, especially if they work in a different organization in a different city. Plus, Google gives you a chat window in the document, which is nice if you want to mock the typing skills of your colleagues.

    Why bother teaming with the Tribune? I blogged earlier about how I’m warming up to the touchy feely trend of collaboration in journalism — how it helps overworked reporters tackle stories, and broadens their reach with a wider audience when the final product is published. When our story ran Sunday, it was published in the Express-News, the Texas Tribune, the Houston Chronicle and the New York Times.

    The collaboration also helped us post online goodies for readers hungry for more information. Matt Stiles made an interactive county map of Texas. I used DocumentCloud to post this annotated copy of a pursuit report that offered context from the pursuit data. Callie’s YouTube video was a very cool mini-documentary that explained the issue. We also posted the data online, allowing readers to learn about pursuits in their own counties.

    There were some interesting reactions to the story. Scott Henson at Grits for Breakfast was surprised so many suspects got away: “I would not have guessed that the number of chases ending with the suspect successfully eluding troopers on foot would have been so high, nor that the proportion who stop and surrender would be so low.”

    KXXV TV localized the story by looking at the high number of pursuits in McLennan County.

    That’s the great thing about news stories based on public data — people can take the information you found, talk about it, and look at the data themselves.

  • Google Refine: A tool for journalists looking for great stories in data

    Google unveiled a free tool for journalists who are interested in analyzing public data. Google Refine is a “power tool for working with messy data.” It helps import information and clean up data-entry problems that lurk in many government databases.

    Google Refine  A tool for journalists looking for great stories in data   John TedescoIt’s open to everyone but it looks like Google created this tool with an eye on computer-assisted reporting. Google’s introductory video touts “Dollars for Docs,” a data-driven story by ProPublica that showed how drug companies paid doctors to promote their products.

    Analyzing databases is a niche skill in newsrooms. Not all reporters are comfortable doing queries in Microsoft Access or sifting through thousands of computerized records, but those skills can really empower reporters who are trying to make sense of a complicated world. Columbia Journalism Review published a great profile of Daniel Gilbert, a reporter for the Bristol Herald Courier who came across a potential blockbuster of a story about unpaid royalties from mineral rights. But the issue was so complex he didn’t know how to unlock it.

    His editor persuaded the newspaper’s publisher to pay for Gilbert to attend a database boot camp at Investigative Reporters and Editors, and Gilbert learned skills that helped him piece together the gas royalties puzzle. The result: “Underfoot, out of reach“, a series of stories that showed how millions of dollars owed to landowners had been tied up in an “an opaque state-run escrow fund, where it has accumulated with scant oversight for nearly 20 years.” Gilbert won the Pulitzer Prize.

    I haven’t played around with Google Refine yet, but I hope it encourages more journalists to take the plunge into computer-assisted reporting. There are some amazing, data-driven stories to be told out there. We just need more people to tell them.

    (h/t: Jennifer Peebles)

  • Government official shocked — shocked! — when public data is posted online

    Texas state officials surprised when public data is posted online by John Tedesco

    Karisa King and I were cleaning our corner of the newsroom last week, and I rediscovered this gem of an e-mail written by an official for the Texas Department of Insurance.

    The state agency oversees the amusement-ride industry. When a patron is seriously injured, the ride owner is supposed to report the injury to the department of insurance, and the information is typed into a database.

    For a story I wrote about the safety of amusement rides, we obtained a copy of the injury database, and the Express-News posted the data online.

    Texas state officials surprised when public data is posted onlineThat disturbed at least one state official.

    “Tedeso has put together a searchable database of injuries from our data,” department spokesman Jerry Hagins informed his colleagues in the July 2009 e-mail.

    “Can he do this?????????” replied Richard Baker, a manager at the agency.

    A question with nine question marks deserves an answer: Yes, we can do this. In fact, news organizations and blogs ought to do this.

    Government agencies collect reams of data about important issues. When journalists find that data, analyze it, and share it with the public, we help readers make sense of a complicated world. That’s our mission. And that’s why news sites are publishing “data centers” with unique and useful information.

    Want to learn the salary of the city manager of San Antonio? Check a public-salary database.

    Curious what litterbugs have been dumping on roadways? Check the state’s “Dont Mess with Texas” database.

    Wondering where it’s safe to drive in San Antonio during a downpour? Check a map of low-water crossings, which was created from a city database.

    Can we post this data?


    Should we?


Posts navigation