Star Schema |
Snowflake Schema |
Simple Database Design. |
Very Complex Database Design. |
De-normalized Data structure. |
Normalized Data Structure. |
Query also run faster. |
Query runs slower comparatively star schema. |
It contains one fact table surrounded by
dimension tables. |
One fact table surrounded by dimension table
which are in turn surrounded by dimension table |
It is easy to understand and provides optimal disk usage |
It uses smaller disk space. |
Only single join creates the relationship
between the fact table and any dimension tables. |
It requires many joins to fetch the data. |
Single Dimension table contains aggregated
data. |
Data Split into different Dimension Tables. |
High level of data redundancy |
Very low-level data redundancy |
Cube processing is faster. |
Cube processing might be slow because of the
complex join. |
Hierarchies for the dimensions are stored in
the dimensional table. |
Hierarchies are divided into separate
tables. |
Offers higher performing queries using Star
Join Query Optimization. Tables may be connected with multiple dimensions. |
It is represented by
centralized fact table which unlikely connected with multiple dimensions. |
Tips and good resources for all.. Oracle, Big data, Hadoop, Unix, Linux
Thursday, June 17, 2021
compare star schema and snowflake schema in datawarehouse modeling
Sunday, May 30, 2021
Standard SQL vs Legacy SQL - functions
Standard
SQL |
Legacy
SQL |
#standardSQL SELECT repository.url FROM `bigquery-public-data.samples.github_nested` LIMIT 5; |
#legacySQL SELECT repository.url FROM [bigquery-public-data.samples.github_nested] LIMIT 5; |
Numeric |
|
SAFE_CAST(x
AS INT64) |
INTEGER(x) |
SAFE_CAST(x
AS INT64) |
CAST(x
AS INTEGER) |
APPROX_COUNT_DISTINCT(x) |
COUNT(DISTINCT
x) |
COUNT(DISTINCT
x) |
EXACT_COUNT_DISTINCT(x) |
APPROX_QUANTILES(x,
buckets) |
QUANTILES(x, buckets
+ 1) |
APPROX_TOP_COUNT(x,
num) |
TOP(x, num), COUNT(*) |
MOD(x,
y) |
x
% y |
Datetime |
|
TIMESTAMP_DIFF(t1,
t2, DAY) |
DATEDIFF(t1, t2) |
CURRENT_TIMESTAMP |
NOW |
FORMAT_TIMESTAMP(fmt,
t) |
STRFTIME_UTC_USEC(t, fmt) |
TIMESTAMP_TRUNC(t,
DAY) |
UTC_USEC_TO_DAY(t) |
REGEXP_CONTAINS(s,
pattern) |
REGEXP_MATCH(s, pattern) |
x
IS NULL |
IS_NULL(x) |
Strings |
|
SAFE_CAST(x
AS STRING) |
STRING(x) |
SAFE_CAST(x
AS STRING) |
CAST(x
AS STRING) |
SUBSTR(s,
0, len) |
LEFT(s, len) |
SUBSTR(s,
-len) |
RIGHT(s, len) |
STRPOS(s,
"abc") > 0 or s LIKE '%abc%' |
s
CONTAINS "abc" |
STRING_AGG(s,
sep) |
GROUP_CONCAT_UNQUOTED(s, sep) |
IFNULL(LOGICAL_OR(x), false) |
SOME(x) |
IFNULL(LOGICAL_AND(x), true) |
EVERY(x) |
Arrays |
|
ARRAY_AGG(x) |
NEST(x) |
ANY_VALUE(x) |
ANY(x) |
arr[SAFE_ORDINAL(index)] |
NTH(index, arr)
WITHIN RECORD |
ARRAY_LENGTH(arr) |
COUNT(arr)
WITHIN RECORD |
Url / IP Address Functions |
|
NET.HOST(url) |
HOST(url) |
NET.PUBLIC_SUFFIX(url) |
TLD(url) |
NET.REG_DOMAIN(url) |
DOMAIN(url) |
NET.IPV4_TO_INT64( NET.IP_FROM_STRING( addr_string)) |
PARSE_IP(addr_string) |
NET.IP_TO_STRING( NET.IPV4_FROM_INT64( addr_int64
& 0xFFFFFFFF)) |
FORMAT_IP(addr_int64) |
NET.IP_FROM_STRING(addr_string) |
PARSE_PACKED_IP(addr_string) |
NET.IP_TO_STRING(addr_bytes) |
FORMAT_PACKED_IP(addr_bytes) |