Unless your perfect, expect some giggles when opening your (data) kimono...

<disclaimer>I'm going to try to leave the politics out of this one, but I might cross that fine line - consider yourself warned...</disclaimer>

Lot's of press today about the "Data Quality" issues found on recovery.gov.

  • Obama Administration Defends Its Data Quality
  • which references a blog post on the White House's Web site Looking at the Big Picture on the Recovery Act
  • another: Some don't report how stimulus funds spent

    The list unfortunately could go on and on and on...

    With all this said, I can't help but think to myself:
    "umm, what did they think was going to happen?"

    Increased visibility to your data allows people to find more data quality issues with your data. It's a very simple concept. I learned this lesson many years ago when building applications as a junior software developer for a large organization. We created this enormous database containing information about products in our industry. Things were going along fine and we were fat/dumb/happy with ourselves UNTIL PEOPLE STARTED LOOKING AT IT! Each and every time you expose more data to more and more people, the sheer number of questions about that data is going to skyrocket.

    "more data + more visibility = more questions"

    Every single time I've been involved in a project to put data into the hands of the masses we've gone into the project knowing that we were going to get some folks giggling at our "open kimono".

    It's a very difficult position to be in, I kind of feel bad for these folks (just a little bit). In Mr. DeSeve's posting he implores people to look at the big picture. This must be this guy's first time building anything like this because anyone and everyone in the data world knows that if you build a really slick "look and feel" reporting application but can't trust the data - people will not want to use the application.

    People will not and can not get past obvious data quality issues in reports. Any junior data analyst in the industry you ask should know this.

    With all that said, I'm shying away from stating that the program was or was not a good idea. I do in fact however think that the government was premature in posting this data to people without any quality assurance done on the data. The fact that someone could type in an incorrect Congressional district in this day and age (it's called referential integrity and most databases have had it for over a decade) is inexcusable. The fact that the government posted data with incorrect or missing districts is inexcusable. The fact that they can't tell us who has reported and who hasn't is also significantly concerning.

    The #'s wouldn't have been as glamorous, but why not call out attention to those folks who have not reported or who have reported incorrectly.

    At the end of the day the website had great intentions (show me my data) but will forever be associated with poor data. Here are some quotes that we've all heard a variations of before:
  • Many of the mistakes "don't undermine information at the heart of the data"
  • "the mistakes are RELATIVELY few, and don’t change the fundamental conclusions one can draw from the data."
  • "Some of the mistakes are frustrating typos and coding errors that don’t undermine information at the heart of the data"

    If anyone out there is interested in building a reporting application like this for an organization (large or small), be prepared to be giggled at when you open that kimono - unless that is - your "perfect".

    Until next time...Rich
  • Comments