Data Quality issue in my new database - or so we thought...

I work for a neat little company where we specialize in helping organizations with Business Resilience and Risk Management. One of the initiatives we're working on is building out a rather neat "Hotel" database which will essentially allow our clients to understand how "safe" certain hotels are, particularly around fire and safety features they have as well as how "safe" the surrounding areas are. A very exciting project and I'm very proud of the work we've done so far.

So...





We have a person traveling to a city here in the US and staying in a "Hyatt by the airport". After this person made their arrangements they decided to take a peek at the database and low and behold, there are TWO "Hyatts" by this airport in our database. The addresses were different (but VERY close, within 1 mile of each other) and the "full" names were almost the same: "Hyatt this" and "Hyatt that".

Bells started ringing and everyone's thinking like Scooby Do - "Ruh-roh!".


Well, after about 20 minutes of discussion on how this "couldn't be", I excuse myself and picked up the phone and called the phone number listed in our database for the first property. "Hey, have you recently moved from address XXX to address YYY?". The answer, "No, there is a different Hyatt right up the street. They specialize in... and we do ...".

Problem solved, there are two so we have no problem...

Perhaps we're too concerned about data quality? I'm sure there's a point of diminishing returns here and perhaps we're dancing around that point or "crossed that line in the sand" (use whichever metaphor you like better). Had we done some sort of cleanup I'd be really concerned that someone might have merged these records, sometimes slow and steady really does win the race.

Until next time...Rich

Comments

Thorsten said…
This is pretty typical of what I run into. There is something that looks strange, and you talk to someone who knows (like the hotel chain in your example, or business users for me).
Sometimes you come up with a problem (and depending on how you worded your question, they're happy you pointed it out), or there is some exception to the general rule (no two hotels of the same chain within x miles).
Maybe we should start talking about "quality smells" (similar to "code smells" in software dev), things tat look fishy and need some more investigation if the smell is getting too bad.
Jim Harris said…
Yes, us data quality professionals can sometimes be all too eager to get Fred, Daphne, Velma, Shaggy, and Scooby-Doo, jump into the Mystery Machine and ride off to solve the case of:

“Scooby-Doo and the Hyatt Two”

Not every apparent “data problem” turns out to be an actual “data quality issue” and there is definitely a point of diminishing returns even when it is.

But sometimes, when you pull off the mask and learn the true identity of “something suspicious in the data” really is a critical data quality problem, you can’t help but imagine hearing:

“And I would have gotten away with it, if it wasn't for you meddling data quality kids!”

Scooby-DQ!

Popular Posts