Data Warehouse Glossary
Additive
Describes a fact (or measure) that can be summarized through addition.
An additive fact is the most common type of fact. Examples include
sales, cost, and profit. Contrast with nonadditive and semi-additive.
See Also: factadvisor, Summary Advisor.
Aggregate
Summarized data. For example, unit sales of a particular product
could be aggregated by day, month, quarter and yearly sales.
Aggregation
The process of consolidating data values into a single value.
For example, sales data collected daily may be aggregated to week
level, the week data aggregated to month level, and so on. Aggregation
may also be referred to as summarization or summary data.
Ancestor
A value that is at a level higher than another given value in
a hierarchy. For example, with the time value, 2007 is the ancestor
of the values Q1-2007 and Jan-2007.
Attribute
A descriptive characteristic of one or more levels. For example,
the product dimension for a cellphone manufacturer may contain a
level called 'item', with a number of attributes such as 'color',
'network', or 'mp3'. Attributes represent logical groupings that
enable end users to select data based on like characteristics. In
this case they may wish a red Verizon Cellphone that includes an
mp3 player.
Note:
- In relational modeling, an attribute is defined as a characteristic
of an entity.
- In Oracle9i, an attribute is a column in a dimension that characterizes
elements of a single level.
Cardinality
From an OLTP perspective, cardinality refers to the number of
rows in a table.
From a data warehousing perspective, it typically refers to the
number of distinct values in a column.
For most data warehouse DBAs, the more important element is the
degree of cardinality.
Child
A value at the level under a higher [ancestor] value in a hierarchy.
In the time dimension, the value Q1-2007 is the child of the value
2007. A value can be a child for more than one parent if the child
value belongs to multiple hierarchies.
The process of resolving inconsistencies and fixing the anomalies
in source data, typically as part of the ETL process.
A repository standard used by Oracle data warehousing, and decision
support. The CWM repository schema is a standalone product that
other products can share--each product owns only the objects within
the CWM repository that it creates.
cross product
A procedure for combining the elements in multiple sets. For example,
given two columns, each element of the first column is matched with
every element of the second column. A simple example is illustrated
as follows:
Col1 Col2 Cross Product
---- ---- -------------
a c ac
b d ad
bc
bd
Cross products are performed when grouping sets are concatenated,
as described in Chapter 18, "SQL for Aggregation in Data Warehouses".
Data Mart
A data warehouse that is designed for a particular line of business,
such as sales, marketing, or finance. A data mart may be an independent
data silo or the data may be derived from an enterprise-wide data
warehouse. In an independent data mart, data can be collected directly
from sources.
Data Source
A database, application, repository, or file that contributes
data to a warehouse.
Data Warehouse
A relational database designed for query and analysis rather than
transaction processing. A data warehouse gnerally contains historical
data derived from transaction data, but may also include data from
other sources. The DW separates analysis workload from transaction
workload and enables a business to consolidate data from several
sources.
In addition to a relational database, a data warehouse environment
often consists of an ETL solution, an OLAP engine, client analysis
tools, and other applications that manage the process of gathering
data and delivering it to business users.
Degree of Cardinality
The number of unique values of a column divided by the total number
of rows in the table. This is used when deciding which indexes to
build. It is recommended to use bitmap indexes on low degree [under
1%] of cardinality columns and B-tree indexes on high degree of
cardinality columns.
Denormalize
The process of allowing redundancy in a table. Contrast with normalize.
Derived Fact (or Measure)
A fact (or measure) that is generated from existing data using
a mathematical operation or a data transformation. For instance:
averages, totals, percentages, and differences.
Dimension
The term dimension is used in two ways:
- To specify the members of a data set. The 3 most common dimensions
in sales-oriented data warehouses are: time, geography, and product.
Most dimensions have hierarchies.
- As an object defined in a database to enable queries to navigate
dimensions.
- In Oracle9i, a dimension is a database object that defines
hierarchical (parent/child) relationships between pairs of
column sets.
- In Oracle Express, a dimension is a database object that
consists of a list of values.
Dimension Table
Dimension tables describe the business entities of an enterprise,
represented as hierarchical, categorical information such as time,
departments, locations, and products. Also called lookup or reference
tables.
Dimension Value
One element in the list that makes up a dimension. For instance,
a clothing company might have dimension values in the product dimension
called DRESS and SHIRT. Values in the geography dimension might
include California and Texas. Values in the time dimension might
include OCT2007 and NOV2007.
Drill
Drilling typically involves navigating up and down through the
levels in a hierarchy to navigate from one item to a set of related
items. When selecting data, users can expand or collapse a hierarchy
by drilling down or up in it, respectively.
Drill down
To expand the view to include child values that are associated
with parent values in the hierarchy.
Drill up
To collapse the list of descendant values that are associated
with a parent value in the hierarchy.
Element
An object or process. For example, a dimension is an object, a
mapping is a process, and both are elements.
Entity
Entity is used in database modeling. In relational databases,
it typically maps to a table.
ETL
Extraction, transformation, and loading. ETL refers to the methods
involved in accessing and manipulating source data and loading it
into a data warehouse. The order in which these processes are performed
varies.
Note that ETT (extraction, transformation, transportation) and
ETM (extraction, transformation, move) are sometimes used instead
of ETL.
Extraction
The process of taking data out of a source as part of an initial
phase of ETL.
Fact
See: Measure
Fact Table
A table in a star schema that contains facts. A fact table typically
has two types of columns: those that contain facts and those that
are foreign keys to dimension tables. The primary key of a fact
table is usually a composite key that is made up of all of its foreign
keys.
A fact table might contain either detail level facts or facts
that have been aggregated (fact tables that contain aggregated facts
are often instead called summary tables). A fact table usually contains
facts with the same level of aggregation.
Fast Refresh
An operation that applies only the data changes to a materialized
view, thus eliminating the need to rebuild the materialized view
from scratch.
File-to-Table Mapping
Maps data from flat files to tables in the warehouse.
Hierarchy
A logical structure that uses ordered levels as a means of organizing
data. A hierarchy can be used to define data aggregation; for example,
in a time dimension, a hierarchy might be used to aggregate data
from the Month level to the Quarter level to the Year level. Hierarchies
can be defined in Oracle9i as part of the dimension object. A hierarchy
can also be used to define a navigational drill path, regardless
of whether the levels in the hierarchy represent aggregated totals.
Level
A position in a hierarchy. For example, a time dimension might
have a hierarchy that represents data at the Month, Quarter, and
Year levels.
Level value table
A database table that stores the values or data for the levels
you created as part of your dimensions and hierarchies.
Mapping
The definition of the relationship and data flow between source
and target objects.
Materialized View
A pre-computed table comprising aggregated or joined data from
fact and possibly dimension tables. Also known as a summary or aggregate
table.
Measure
Data, usually numeric and additive, that can be examined and analyzed.
Examples include sales, cost, and profit. Fact and measure are synonymous;
fact is more commonly used with relational environments, measure
is more commonly used with multi-dimensional environments.
Metadata
Data that describes data and other structures, such as objects,
business rules, and processes. For example, the schema design of
a data warehouse is typically stored in a repository as metadata,
which is used to generate scripts used to build and populate the
data warehouse. A repository contains metadata.
Examples include: for data, the definition of a source to target
transformation that is used to generate and populate the data warehouse;
for information, definitions of tables, columns and associations
that are stored inside a relational modeling tool; for business
rules, discount by 10 percent after selling 1,000 items.
Model
An object that represents something to be made. A representative
style, plan, or design. Metadata that defines the structure of the
data warehouse.
Nonadditive
Describes a fact (or measure) that cannot be summarized through
addition. An example includes Average. Contrast with additive and
semi-additive.
Normalize
In a relational database, the process of removing redundancy in
data by separating the data into multiple tables. Contrast with
denormalize.
The process of removing redundancy in data by separating the data
into multiple tables.
OLAP [online analytical processing]
OLAP functionality is characterized by dynamic, multidimensional
analysis of historical data, which supports activities such as the
following:
• Calculating across dimensions and through hierarchies
• Analyzing trends
• Drilling up and down through hierarchies
• Rotating to change the dimensional orientation
OLAP tools can run against a multidimensional database or interact
directly with a relational database.
Online transaction processing [OLTP]
Online transaction processing. OLTP systems are optimized for
fast and reliable transaction handling. Compared to data warehouse
systems, most OLTP interactions will involve a relatively small
number of rows, but a larger group of tables.
Parallelism
Breaking down a task so that several processes do part of the
work. When multiple CPUs each do their portion simultaneously, very
large performance gains are possible.
Parallel Execution
Breaking down a task so that several processes do part of the
work. When multiple CPUs each do their portion simultaneously, very
large performance gains are possible.
Parent
A value at the level above a given value in a hierarchy. For example,
in a Time dimension, the value Q1-99 might be the parent of the
value Jan-99.
Partition
Very large tables and indexes can be difficult and time-consuming
to work with. To improve manageability, you can break your tables
and indexes into smaller pieces called partitions.
Pivoting
A transformation where each record in an input stream is converted
to many records in the appropriate table in the data warehouse.
This is particularly important when taking data from nonrelational
databases.
Publisher
Usually a database administrator who is in charge of creating
and maintaining schema objects that make up the Change Data Capture
system.
Refresh
The mechanism whereby materialized views are changed to reflect
new data.
Schema
A collection of related database objects. Relational schemas are
grouped by database user ID and include tables, views, and other
objects. Whenever possible, a sample schema called sh is used throughout
this Guide. See also: snowflake schema and star schema
Semi-additive
Describes a fact (or measure) that can be summarized through addition
along some, but not all, dimensions. Examples include headcount
and on hand stock. Contrast with additive and nonadditive.
Slice and Dice
An informal term referring to data retrieval and manipulation.
We can picture a data warehouse as a cube of data, where each axis
of the cube represents a dimension. To "slice" the data
is to retrieve a piece (a slice) of the cube by specifying measures
and values for some or all of the dimensions. When we retrieve a
data slice, we may also move and reorder its columns and rows as
if we had diced the slice into many small pieces. A system with
good slicing and dicing makes it easy to navigate through large
amounts of data.
Snowflake Schema
A type of star schema in which the dimension tables are partly
or fully normalized.
Source
A database, application, file, or other storage facility from
which the data in a data warehouse is derived.
Source System
A database, application, file, or other storage facility from
which the data in a data warehouse is derived.
Staging Area
A place where data is processed before entering the warehouse.
Staging File
A file used when data is processed before entering the warehouse.
Star Query
A join between a fact table and a number of dimension tables.
Each dimension table is joined to the fact table using a primary
key to foreign key join, but the dimension tables are not joined
to each other.
Star Schema
A relational schema whose design represents a multidimensional
data model. The star schema consists of one or more fact tables
and one or more dimension tables that are related through foreign
keys.
Subject Area
A classification system that represents or distinguishes parts
of an organization or areas of knowledge. A data mart is often developed
to support a subject area such as sales, marketing, or geography.
Subscribers
Consumers of the published change data. These are normally applications.
Summary
See: materialized view.
Summary Advisor
The Summary Advisor recommends which materialized views to retain,
create, and drop. It helps database administrators manage materialized
views. It is a GUI in Oracle Enterprise Manager, and has similar
capabilities to the DBMS_OLAP package.
Target
Holds the intermediate or final results of any part of the ETL
process. The target of the entire ETL process is the data warehouse.
See Also: data warehouse and ETL
Third Normal Form (3NF)
A classical relational database modeling technique that minimizes
data redundancy through normalization.
Third Normal Form Schema
A schema that uses the same kind of normalization as typically
found in an OLTP system. Third normal form schemas are sometimes
chosen for large data warehouses, especially environments with significant
data loading requirements that are used to feed data marts and execute
long-running queries.
Transformation
The process of manipulating data. Any manipulation beyond copying
is a transformation. Examples include cleansing, aggregating, and
integrating data from multiple sources.
Transportation
The process of moving copied or transformed data from a source
to a data warehouse.
Unique Identifier
An identifier whose purpose is to differentiate between the same
item when it appears in more than one place.
Update Window
The length of time available for updating a warehouse. For example,
you might have 8 hours at night to update your warehouse.
Update Frequency
How often a data warehouse is updated with new information. For
example, a warehouse might be updated nightly from an OLTP system.
Validation
The process of verifying metadata definitions and configuration
parameters.
Versioning
The ability to create new versions of a data warehouse project
for new requirements and changes.
Back To Top
Data Warehouse Index | OLAP | Developing
A Data Warehouse | Measuring
DW Quality | DW Glossary
|