Dirty, dirty data
One of the lesser known facets of Business Intelligence is dealing with Data
Quality. Data Quality (also called Information Quality (IQ)) has a major effect
on the accuracy, usability and most importantly the credibility of any
reporting.
In our experience, the data is always dirtier than the customer believes. This
is because the managing application hides data smuttiness or does not use the
data that we later rely on.
To address this, the temptation is to either:
-
ignore the data issues and present the information to the user community and
let them deal with it, or
-
embark on a major data cleansing project.
The most pragmatic approach is somewhere in between. First, you need to
understand:
-
how and where dirty data appears
-
the consequential effects and their importance
-
cost to address these issues
-
time to address these issues
-
effectiveness of any mechanisms to address them.
And remember the Golden Rules:
-
fixing at the source is best
-
if the users do not trust the data, they will not use it.
Read next »