Quality data is essential to meet business needs

Data quality is such a simple concept. Except, it’s not.

As leaders and business executives, we tend to accept that other people and systems are collecting data. But here’s where your leadership can shine: Examine the data that is being retrieved – to find out its importance to the business.

Let’s take a step back, and look at the assumptions made about data – and where good leaders can interject their expertise for guidance. 

We tend to think of it as:

Bad data = wrong or incomplete data 
Good data = Correct and complete data 

But these facile definitions do not reflect the complexity of data quality, notably that bad data can be valid data that gives you answers that seem good but truly aren’t. Here’s a simple example.

Counting the blades of grass in your yard will yield correct data, but it doesn’t tell us when you need to cut your grass. For that, you need different data altogether – the measurement of how high your grass is.

 In other words, data is important to us only because it tells us something that we need to know. Therefore, relevance to an issue is essential to defining data quality, because, while all bad data is alike; good data is good in its own way (apologies to Leo Tolstoy).

 Two additional things to consider here:

    1. It’s not always easy to spot bad data.

    2. It is VERY easy to create bad data.

It’s worth noting that having the skill to identify good data is an essential part of data literacy, which I define as the ability to:

  • recognize good (and bad) data,

  • understand how it relates to other data,

  • connect how data relates to your specific field or domain, and

  • interpret data in a way that lets you understand something that you need to know.

Quality data powers insight

To paraphrase Professor Richard W. Hamming’s famous quote, “The purpose of [data] is insight, not numbers.”

 And analyzing data that is not meaningful is misleading, because the data itself may be good, but it doesn’t answer the questions you’re asking. This has enormous implications for missions, business goals, budgets, and employees.  

To get insight, the data you gather has to be correct, consistent, and fit for purpose – meaning that it is relevant to your data needs.

How do we get so much bad data?

To aid in helping us identify, gather, and use quality data, let’s first look at how we wind up with bad data. First, let me say that you could write a book about any one of the items I discuss below – and people have.

Let’s break this down into three categories: collecting, storing, and processing data.

Data collection creates bad data when you gather incomplete, inconsistent, and data that won’t answer your questions or drive the function you need.

Chart illustrates how collection issues can cause bad data.

Data storage and systems issues cause bad data when you gather data and never touch it again, even as you gather new examples of the same type of data; store identical data in multiple places; have bad system design (e.g. system doesn’t allow you to query data easily; system lacks metadata to help you understand why you have this data); and there are system communication issues.

Chart describes how storage and systems issues create bad data.

Processing issues cause problems by migrating data without verifying its quality or what it’s used for; causing formatting issues in the new system; and not de-duplicating data.

Chart illustrates how data processing issues can cause bad data.

Every issue we just discussed is influenced by one idea: data isn’t a one-and-done thing. When we find data that appears to be a duplicate, seemingly has no function, is badly formatted, or is suspect in any other way, we need a system for deciding if it is bad or otherwise managing this data.

What is good data, really?

Good data minimizes how many copies of data are in your system, and is current, maintained, and annotated with metadata.

 We often discuss "a single source of truth" but I've yet to implement such a thing for a client. The idea is that you have each datum in one central data store that everyone can use for every application. The goal is to break down data silos or disconnected systems that don't talk to one another. It's a great goal, but the reality is more complex, so I'll say this: good data is data that does not get duplicated and passed from one system to the next to be used, and ensures "… that the organization provides stakeholders with consistent information throughout time and perspectives".

Adding metadata to your data helps greatly with this. Helping with this is adding metadata to data. Metadata is often unhelpfully described as data about data, but it's really details about data that tell you:

  • if data is important

  • how data is important

  • why data is important

A simple form of metadata is the names of rows in an Excel sheet.

How leaders can support tech groups

The most important thing a leader can do to support data quality is to set aside budget and time for data wrangling and maintenance. To have good data, your teams need to know who owns the data, what it does, what they need to keep for new processes; and have and enforce data governance policies.

Data scientists often say that data wrangling takes up about 75% of their workday. That’s how important clean and useful data is – you spend 75% of your time cleaning it up, so that you can spend the other 25% using and asking questions of it.

That means you need to budget enough time to clean the data – and you start planning to do that as soon as possible.

When moving from one system or app to another, your tech teams need to know who owns the data, what it does, and what we need to keep for new processes.

They also need to understand where data originated from, when it was loaded, and how it moves through current systems.

Ideally, your tech folks already know this because they were given the time and money to map your data assets.

Understanding the importance of data quality is paramount for leadership, because without quality data, we cannot make sound business decisions.

Next
Next

Data literacy is essential to navigating all things technical