In the last two articles (1, 2), we discussed the overall BI system and its different components. The important part, we know, is how each component of BI interacts with each other and provides the desired results. In the third article of this series, we'll discuss the importance of data analysis in an end-to-end BI system and how it can be achieved by using different analytical tools.
"Analytics" is the term that is used to analyze the data and produce results. Data analysis is the heart of any BI system. This data analysis process converts the raw data in information that can be presented to business owners to support their decision making using any data visualization or reporting tool. In the BI world, the data analysis process is known as Online Analytical Processing (OLAP).
Online Analytical Processing (OLAP)
Online Analysis Processing is a data analysis process. In this process, data can be analyzed using any available analytical tool and either provides results directly or stores information for further use. Before we discuss OLAP in further detail, I would like to tell you about the Online Transaction Processing (OLTP) system and how it's different from OLAP systems.
OLTP vs OLAP: In industry, we deal mainly with two types of systems: Real time, or transactional, systems and Decision Support Systems.
All real time or transactional systems come under OLTP systems. These systems help the user to sufficient real time needs, such as a banking system, retail system, ticket reservation system, and so on. These real time systems generate data with very high frequency and are known as transactional, or OLTP, systems. Conversely, OLAP systems are the opposite of OLTP systems; OLAP systems deal with historical data, not with real time, but OLTP systems work as a data source for an OLAP system. OLAP allows a user to quickly analyze the huge amount of data that has been summarized into multidimensional views and hierarchies.
BI systems help users in decision making and to know the trends to better understand an overall organization's health. This can be achieved by using historical data instead of real time data. Now, we can understand the difference between OLTP and OLAP systems and their need; it will help you understand the behavior of systems and designing the solution to meet different business objectives.
Further, we'll discuss key terminologies of OLAP that give basic understanding to start OLAP design.
Key Terminology in OLAP
OLAP is a multidimensional structure that makes data available for analytics. Technically, OLAP pre-calculates, summarizes, and stores the data in compressed form; this makes the reporting and predictive analysis fast and interactive exploration of aggregated data from different perspective possible by using various tools, such as Business Objects, IBM Cognos, SQL Server Analysis Services, MicroStrategy, and so on.
We should be aware of key terminologies for OLAP system designing. Let's discuss them:
Cubes: A cube contains a whole set of data that is required and becomes a single source of truth for analysis. It's multidimensional in nature and has two main elements: Dimensions and Facts. In cubes, data (measures) are categorized by dimensions and cubes are pre-summarized across dimensions to drastically improve query time over relational databases. Also, cubes are designed for analytical purposes, so that they can report on millions of records at a time.
Dimensions: Dimensions are objects that reside in a cube and have a collection of related attributes to provide information about data (measure). They provide a base for data analysis on stored data (measures) and can be related to one or more measures of groups in the cube. Also, Dimensions define the structure of the cube that helps to slice and dice the data. There are many type of dimensions that can be available in any cube, based on need. A few of them are as follows:
- Role playing
- Parent child
- Slowly changing
- Fast changing
Facts: Another type of objects residing in a cube and stored the data to be analyzed by dimension data is called measure/s. Related measures reside in one table known as a Fact table. As with dimensions, there are many types of facts and can be available in cubes. They are as follows:
Members: Members are a related set of attributes that reside in a dimension.
Level: A level is a collection of members that have the same distance from the root of the hierarchy.
Hierarchies: A logical tree structure that organizes the members of a dimension such that each member has one parent member and zero or more child members.
Cube Storage: An OLAP system has three storage approaches; we need to choose any one of the storage strategies based on data availability and the need for successful implementation of a BI system.
- MOLAP: In this mode, when the cube process, the source data is pulled from the relational store, and the required aggregation is performed. Finally, the data is stored in a compressed and optimized multidimensional format.
- ROLAP: It does not pull data from the underlying relational database source to the OLAP cube server but rather both cube detail data and aggregation stay at the relational database source.
- HOLAP: It is a hybrid of MOLAP and ROLAP and attempts to provide the greater data capacity of ROLAP and the fast processing and high query performance of MOLAP. In the HOLAP storage mode, the cube detail data remains in the underlying relational data store and the aggregations are stored on the OLAP cube server.
Data mining is a way to predict the future. There are tools available in the market that help business owners to read/analyze the data and predict the future. Data mining tells you the probability about your customer base and their choices, as in what they do and don't like. It is the process of discovering actionable information from large sets of data; also, data mining uses mathematical analysis to derive patterns and trends that exist in the data.
In this article, we discussed OLAP and its key terminologies. It's very important to know all the key terminologies upfront before you start to design an OLAP system. OLAP helps in data analysis and predictive analysis. In next article of this series, we'll discuss information delivery or data visualization.
About the Author
Anoop Agarwal worked for Microsoft for almost six and half years and has 12+ years of IT experience. Currently, he is working as a DW/BI Architect in one of the top Fortune Companies. He has worked on end-to-end delivery of enterprise-scale DW/BI projects. He has strong knowledge about database, data warehouse, and business intelligence application design and development. Also, he worked extensively on SQL Server, designing of ETL using SSIS, SSAS, SSRS, and SQL Azure.