Data Warehouse Tutorial

What is Data Warehousing?

A data warehouse can be referred to as a relational database that is designed for query and analysis. It often contains historical data derived from transaction data, but it can also contain data from other sources. The main advantage of a data warehouse is that the analysis workload is separated from the transaction workload, thereby enabling the organization to consolidate data from various sources. A data warehouse environment which includes the following:

  • Relational database
  • Extraction,transformation,loading solution
  • Online analytical processing engine
  • Client analysis tool

Other applications are concerned with applications that manage the process of gathering and delivering data to business users.

A data warehouse can consolidate and integrate information from various external and internal sources and arrange the data in a more meaningful format, which can be thereby used for generating business decisions. Thus, a data warehouse can be defined as a single, complete and consistent store of data obtained from a variety of sources and can be made available to end-users, which would help them understand and use it in a business context.

A data warehouse is:

  • Subject oriented
  • Integrated
  • Time-varying
  • Nonvolatile
  • Gathering of data which helps in carrying out organizational business decision

Characteristics of the data warehouse

  • Subject-oriented ? The primary purpose of designing a data warehouse is to analyze data. For example, if an organization is interested in knowing company sales data, the organization can build a separate data warehouse that concentrates only on sales. This data warehouse can provide an answer to the following questions:
  • Who was declared as the best customer for a particular item?
  • Who can be most likely to be their best customer for next year?

  (2) Integrated ? Integration is related to subject orientation. Data can be collected from various sources, but all the collected data must be converted into any consistent format. Then only should it be inserted into the data warehouse? All the collected data must be free from naming conflicts and inconsistencies among units of measure. When this objective is achieved, then the data warehouse is said to be integrated.

Data Warehouse Tutorial

(3) Time-variant ? all data warehouses must look for change with time. This is what is meant by time-variant. To discover trends, identify hidden patterns and relationships in business, there is a need for large amounts of data for carrying out analysis.

The time-variant nature which is present in the data warehouse enables the following:

  • Analysis of the past.
  • Relating information to present.
  • Forecasting future.

 (4) Non-volatile ? means that it should not change over time once you insert data into the data warehouse. It must remain constant. Data are extracted from various operational systems, and data gathered from outside sources are subject to transformation, integration. After that, it is stored inside the data warehouse.

We can transfer the data from operational sources into the data warehouse at some specific intervals. The data which are present in the data warehouse are considered to be less volatile when compared with the data which is present in the operational database. The data which is present in the data warehouse is mainly used for performing query and analysis.

Data Warehouse Tutorial

(5) Data granularity ? in a data warehouse, for efficiency data are summarized at various levels and are kept inside the data warehouse. According to the query, there is a facility to go to the particular level of data and satisfy those queries. Data granularity in the context of a data warehouse refers to the level of detail. When the level of detail is more minor, data granularity is said to be more. On the other hand, if data is present at the lowest level of detail, there is a need to store more data in the data warehouse. Data granularity levels are decided based on the data types and expected system performance for queries.

Data Warehouse Tutorial

Data warehouse and data marts.

One of the most critical issues IT managers face every year is that "which should be built first? Data warehouse or data marts?"

Before building a data warehouse, there is a need to consider various factors and address relevant issues.

  •  Which approach should be used? Top-down or bottom-up?.
  • How should data marts be built? Dependent or independent?.
  • Firstly build a pilot or go with a full-fledged implementation?.

Prerequisites

Before learning data warehouse and mining subjects, students need to have a piece of knowledge about RDBMS and basic knowledge about database components.

Audience.

This tutorial will help students understand all the components involved in the data warehouse, the basics of dimensional modeling, star schema, data cleaning, data mining algorithms, and project management.

Problem

The information provided here is correct written easy language for your better understanding. If any issue found, please post the problem in the contact form.