Data Warehousing involves structuring data for fast querying and reporting. Two common dimensional modeling techniques are:
- Star Schema
- Snowflake Schema
🔸 What is a Star Schema?
A Star Schema is a denormalized data model where:
- There is a central fact table
- Surrounded by dimension tables
- Dimension tables are not normalized
✅ Key Features:
- Simple and intuitive
- Fast query performance
- Less complex joins
📊 Example:
Fact Table: SalesFact
SaleID | DateKey | ProductKey | CustomerKey | StoreKey | Revenue |
---|---|---|---|---|---|
1 | 202301 | 101 | 1001 | 1 | 500 |
Dimension Tables:
DateDim
: DateKey, FullDate, Month, Quarter, YearProductDim
: ProductKey, ProductName, Category, BrandCustomerDim
: CustomerKey, Name, Gender, AgeGroupStoreDim
: StoreKey, StoreName, Region
🟨 No normalization inside dimension tables. All attributes live in a flat structure.
🔹 What is a Snowflake Schema?
A Snowflake Schema is a normalized data model where:
- Dimension tables are split into multiple related tables
- Reduces data redundancy
- Requires more joins
✅ Key Features:
- More normalized
- Saves storage
- Complex query structure
📊 Example:
Fact Table: SalesFact
(Same as Star)
Normalized Dimensions:
ProductDim
: ProductKey, ProductName, SubCategoryKeySubCategoryDim
: SubCategoryKey, SubCategoryName, CategoryKeyCategoryDim
: CategoryKey, CategoryName
🟦 Normalized dimensions form a snowflake-like structure due to multiple levels of hierarchy.
🆚 Key Differences
Feature | ⭐ Star Schema | ❄️ Snowflake Schema |
---|---|---|
Structure | Denormalized | Normalized |
Joins | Fewer | More |
Query Performance | Faster | Slower (due to joins) |
Data Redundancy | Higher | Lower |
Storage Efficiency | Lower | Higher |
Design Complexity | Simpler | More complex |
Use Case | Fast queries, dashboards | Complex ETL, large dimensions |
💡 When to Use What?
Use Case | Recommended Schema |
---|---|
Dashboards, Ad-hoc reports | Star Schema |
Data marts for fast OLAP queries | Star Schema |
Central EDW with reusable dimensions | Snowflake Schema |
Data model has large dimensions (like Customer, Product Hierarchies) | Snowflake Schema |
✅ Summary
Aspect | Star Schema | Snowflake Schema |
---|---|---|
Simplicity | Easy to understand and use | More structured but complex |
Performance | Better for OLAP/reporting | May slow down due to joins |
Maintenance | Easier | Harder due to multiple related tables |
Normalization | Denormalized | Normalized |
🎯 Real Example – Retail
Star Schema:
SalesFact
— total salesProductDim
— includes product name, brand, category all in one
Snowflake Schema:
ProductDim
— product-level infoBrandDim
— connected to ProductCategoryDim
— connected to Product through subcategory
Discover more from Technology with Vivek Johari
Subscribe to get the latest posts sent to your email.