Building QA Into Business Intelligence
Quality Data Means Quality Business Intelligence
Defining data warehouse quality is not generally
done in terms of data, but rather in terms of the bigger picture
ability to satisfy its customer base.
The best time to begin building in quality is before
the warehouse is first developed.
Developing a data warehouse is an iteractive process
- measure what's been done, see where you are, make adjustments,
and plan the next iteration using the measurement data.
Quality control is meeting customer expectations but not exceeding
them. The cost of exceeding needs is extreme, for very little, if
any additional value or return.
Measuring is the only way to determine if you are improving over
time. Data warehousing is a process, hence process-oriented measures
should be used.
Measure the level of activity, and for how long, rather than product
measures, such as volumes of data or instances of access to the
Individual measures feed into larger sets of metrics, encompassed
by an overall data warehouse quality program.
Quality, Measurement, and Data Warehousing
Quality is not free, however, measurement does not cost as much
as bad quality.
The cost of a quality program includes:
- ongoing measurement
- re-planning—to accomodate changing business needs
The business value in data warehousing is in the right decisions
being taken and the right action being performed.
The primary DW measurement is therefore in terms of the business
impact as a result of the warehouse.
Data Warehouse Success Measures
To understand quality [what you did right and wrong], one needs
“meta data” about what you're doing. Successful measurement
is the key to warehouse quality.
There are three types of success as they relate to data warehousing:
- Economic success - the data warehouse has a
positive impact on the bottom line.
- Political success - people like what you've
done, and they use it
- Technical success - this is the easiest to
accomplish. It means the chosen technologies are appropriate for
the task and are applied correctly.
Data Warehouse Quality Measures
Quality, defined in terms of degrees of excellence is avery subjective
measure. The overall quality of a data warehouse is best measured
in terms of:
- Business Quality
- Information Quality
- Technical Quality
Business quality is directly related to economic success; the ability
of the data warehouse to provide information to those who need it,
in order to have a positive impact on the business.
Business quality is made up of business drivers that directly correlate
to items in the company's strategic plans. How well the data warehouse
helps accomplish these drivers, is a key measure of the success
of the data warehouse.
For instance, does the data warehouse align with business strategy,
and how well does it support the process of strengthening core competencies
and improving competitive position? Does it enable business tactics,
such that it makes a positive day-to-day difference?
Information is only of value if it is used. Its value is therefore
based on how well it is integrated into business processes, not
on data quality itself.
Information quality is the key to political success, people actually
using the data warehouse. In turn, this success depends on promoting
awareness of the existance of the DW, access tools, and the knowledge
and skills to use information outputs.
Use of BI tools has a large change management component, moving
users away from using 2D reports to reports available using a multidimensional
Information quality measures will also include:
- how well users understand the warehouse
- ease of getting data required
- user access to data - in office, from home, third party partners
- frequency of data access
- how and when data is used
Information quality also includes data quality and performance.
Expectations must be closely managed in this area in accordance
with technical capability.
Technical quality is the ability of the data warehouse to satisfy
users information needs. There are four important technical quality
Reach - whether the data warehouse can be used
by those who are best served by its existence. This is typically
beyond the base of suppliers, customers, and a few managers.
Range - defines a range of services provided by
the data warehouse, including: what data is available and what is
accessible. For instance, web services make data widely available
for extraction from multiple locations as well as accessible by
users in multiple locations.
Manuverability - the ability of the data warehouse
to respond to changes in the business environment. The data warehouse
must continually evolve to conform with changes in:
- users and their expectations
- upper management
- the overall business
- technical platform
Capability - an organization's technical capability
to build, operate, maintain, and use a data warehouse.
A good approach to measuring data warehouse quality is using the
goal-question-metric (GQM). This is achieved by:
- Identifying the type of desired impact with the data warehouse
- business, information, or technical quality.
- Defining quality goals specific to the business - specific
statements that relate to the type of impact.
- Developing questions to ask to identify if goals have been
achieved in terms of usage, response time, meeting the needs of
users, errors, and on-time delivery of cubes.
- Identifing Quality Areas
- Creating Goals
Distinguishing characteristics that help define quality of the
Business quality - focus on business drivers—those things
that help a business achieve its overall goals.
Information quality - users know when and how the data warehouse
can help them make business decisions.
Technical quality - this relates to “reach,” or the
ability to access the necessary information in the warehouse.
Tip: Don't try to achieve all goals at once - focus on the things
that make the most sense.
Metrics and Measures
The terms 'measures' and 'metrics' are often confused, and confusing!
Measures are the specific pieces of data you need to collect.
A metric is a set of measures, or a methodology used to measure.
In data warehousing, a metric would be the general number of access
to the data warehouse. Measures would be the number of specific
accesses to SQL, accesses to certain data tables, etc.
Objective measures and subjective measures should be defined:
Objective measures can only measure those things which are tangible,
and as such 'countable' in the data warehouse process.
Subjective measures are people's perceptions, usually collected
using surveys or user interviews. They are not as 'countable' as
It is easy to get misleading data using subjective measures. It is
therefore best to integrate subjective measurement into the daily
user experience, without being overly intrusive to your users. For
example, gather responses during user login or logout. It's also important
to provide feedback to the participants. Make sure you are surveying
the right people, at the right time?
In addition to different types of measures, there are also different
Existence - does the warehouse exist or doesn't it? This sounds
overly simple, but it's important: Have users accessed the data
base or not?
Quantity - this refers to “how much,” or how many times
the warehouse was used.
Quality - the most difficult level, assessing “How good did
we do?” Thomann warns this third level is the fuzziest until
you understand the first two levels.
Metrics also have a number of components, and for data warehousing
can be broken down in the following manner:
Objects - the “themes” in the data warehouse environment
which need to be assessed. Objects can include business drivers,
warehouse contents, refresh processes, accesses, and tools.
Subjects - things in the data warehouse to which we assign numbers,
or a quantity. For example, subjects include the cost or value of
a specific warehouse activity, access frequency, duration, and utilization.
Strata - a criterion for manipulating metric information. This
might include day of the week, specific tables accessed, location,
time, or accesses by department.
These metric components may be combined to define an “application,”
which states how the information will be applied. For example: “When
actual monthly refresh cost exceeds targeted monthly refresh cost,
the value of each data collection in the warehouse must be re-established.”
The Data Warehouse and Change
An important characteristic in data warehousing is the concept
of process; in this sense, the realization that the warehouse will
constantly change. Wells suggests that organizations anticipate
change in data warehousing and expect it. We're surprised by change,
he says, but we should just accept it—and manage it.
Growth is a form of change, but it's more predictable and thus
more manageable. For example, in data warehousing, growth can be
defined by the following:
the number of users,
how they use the warehouse,
the addition of new data, and
the addition of different types of data.
Wells suggests using a chart to help manage data warehouse growth.
The chart can state expectations of the warehouse (which can be
defined in measurable terms), and for each expectation, list the
goals, metrics, and measures to be used to manage those expectations.
Don't forget adding what you'll do to monitor the growth (is reality
matching the data?), and plan to update the goals, metrics, and
measures as the warehouse changes over time.
Thomann adds that change in data warehousing is desirable, because
it needs to grow. Otherwise you'll have tomorrow's legacy system.
But to keep the warehouse valuable you have to strive for continuous
improvement. To illustrate, he describes a “typical”
data warehousing curve: When the warehouse is first implemented,
after a week or so the usage level is very high because news about
the warehouse has spread and users are exploring. In a month, usage
drops off significantly as users learn what the data warehouse cannot
do. After the next release or feature addition usage goes up slightly,
although there isn't as much interest as with the initial release.
Then, a few days later it dips again. This is a typical pattern,
but what you're ideally looking for is a curve with definite increases
and no dips. “You can't stop entropy,” he says, “but
you can delay it by being proactive in your management. So use data—like
you give your users—only it's for you.”
Of course, the objective of measuring is to take the measurement
data and brainstorm possible future improvements. For example, once
you have measurement data you can do cost/benefit analyses for new
data warehouse projects. Thomann and Wells suggest building a set
of priorities, because you can't do everything. Then plan future
projects, packaging the good ideas together if they are compatible.
In addition, what you don't want to do with the data warehouse is
important, so put a boundary around your projects based on the organization's
specific needs. In the end, however, a measurement program isn't
about just getting data—you have to apply the knowledge and
take action to make it work.
Back To Top