Prerequisite – Introduction to Big Data, Benefits of Big Data
A star schema is a type of data modeling technique used in data warehousing to represent data in a structured and intuitive way. In a star schema, the data is organized into a central fact table containing the measures of interest, surrounded by dimension tables that describe the attributes of the measures.
The fact table of a star schema contains the measures or metrics that are of interest to the user or organization. For example, in a sales data warehouse, the fact table might contain sales revenue, units sold, and profit margins. Each record in the fact table represents a specific event or transaction, such as a sale or order.
The dimension tables in a star schema contain the descriptive attributes of the measures in the fact table. These attributes are used to split and split the data in the fact table, allowing users to analyze the data from different perspectives. For example, in a sales data warehouse, dimension tables can include product, customer, time, and location.
In a star scheme, each dimension table is joined to the fact table through a foreign key relationship. This allows users to query the fact table data using attributes from the dimension tables. For example, a user might want to view sales revenue by product category or by region and time period.
Star schema is a popular data modeling technique in data warehousing because it is easy to understand and query. The simple structure of the star schema allows for fast query response times and efficient use of database resources. In addition, the star schema can be easily extended by adding new dimension or measure tables to the fact table, making it a scalable and flexible solution for data storage.
The star schema is the fundamental schema among the data mart schema and is the simplest. This schema is widely used to develop or build a dimensional data warehouse and data marts. It includes one or more fact tables that index any number of dimensional tables. The star scheme is a necessary cause of the snowflake scheme. It is also efficient for handling basic queries.
It is said to be a star since its
physical model resembles the shape of the star which has a table of facts in its center and the tables of dimensions on its peripheral representing the points of the star. Here is an example to demonstrate the star scheme:
In the previous demonstration, SALES is a fact table that has attributes, i.e. (Product ID, Order ID, Customer ID, Employer ID, Total, Quantity, Discount) that references dimension tables. The employee dimension table contains the attributes: Emp ID, Emp Name, Title, Department, and Region. The product dimensions table contains the attributes: Product ID, Product Name, Product Category, Unit Price. The customer dimensions table contains the attributes: Customer ID, Customer Name, Address, City, Postal Code. The time dimensions table contains the attributes: Order ID, Order Date, Year, Quarter, Month.
Star
schema model: In Star Schema, business process data, which contains quantitative data about a business is distributed into fact tables, and dimensions are descriptive features related to fact data. The selling price, selling quantity, distance, speed, weight, and weight measurements are some examples of fact data in the star scheme. Often, a star scheme that has multiple dimensions is called a centipede scheme. It is easy to handle a star scheme that has dimensions of few attributes.
Advantages
of the star schema:
- Simpler queries: The join logic of the star schema is quite easy compared to other join logic that is needed to get data from a transactional schema that is highly normalized.
- Simplified business reporting logic: Compared to a transactional schema that is highly normalized, the star schema simplifies common business reporting logic, such as reporting and period after period.
- : The star scheme is widely used by all OLAP systems to design OLAP cubes efficiently. In fact, major OLAP systems offer a ROLAP mode of operation that can use a star schematic as a source without designing a cube structure.
Feed cubes
Disadvantages of star schema:
- Data integrity is not well enforced as it is in a highly denormalized schema state
- It is not flexible in terms of analytical needs as a normalized data model.
- Star schemes do not reinforce many-to-many relationships within business entities, at least not often.
.