for the beginners and laterals . . .

Thursday, June 17, 2021

compare star schema and snowflake schema in datawarehouse modeling

Star Schema	Snowflake Schema
Simple Database Design.	Very Complex Database Design.
De-normalized Data structure.	Normalized Data Structure.
Query also run faster.	Query runs slower comparatively star schema.
It contains one fact table surrounded by dimension tables.	One fact table surrounded by dimension table which are in turn surrounded by dimension table
It is easy to understand and provides optimal disk usage	It uses smaller disk space.
Only single join creates the relationship between the fact table and any dimension tables.	It requires many joins to fetch the data.
Single Dimension table contains aggregated data.	Data Split into different Dimension Tables.
High level of data redundancy	Very low-level data redundancy
Cube processing is faster.	Cube processing might be slow because of the complex join.
Hierarchies for the dimensions are stored in the dimensional table.	Hierarchies are divided into separate tables.
Offers higher performing queries using Star Join Query Optimization. Tables may be connected with multiple dimensions.	It is represented by centralized fact table which unlikely connected with multiple dimensions.

Sunday, May 30, 2021

Standard SQL vs Legacy SQL - functions

Standard SQL	Legacy SQL
*#standardSQL* *SELECT* *repository.url* *FROM* *`bigquery-public-data.samples.github_nested`* *LIMIT 5;*	*#legacySQL* *SELECT* *repository.url* *FROM* *[bigquery-public-data.samples.github_nested]* *LIMIT 5;*
Numeric
SAFE_CAST(x AS INT64)	INTEGER(x)
SAFE_CAST(x AS INT64)	CAST(x AS INTEGER)
APPROX_COUNT_DISTINCT(x)	COUNT(DISTINCT x)
COUNT(DISTINCT x)	EXACT_COUNT_DISTINCT(x)
APPROX_QUANTILES(x, buckets)	QUANTILES(x, buckets + 1)
APPROX_TOP_COUNT(x, num)	TOP(x, num), COUNT(*)
MOD(x, y)	x % y
Datetime
TIMESTAMP_DIFF(t1, t2, DAY)	DATEDIFF(t1, t2)
CURRENT_TIMESTAMP	NOW
FORMAT_TIMESTAMP(fmt, t)	STRFTIME_UTC_USEC(t, fmt)
TIMESTAMP_TRUNC(t, DAY)	UTC_USEC_TO_DAY(t)
REGEXP_CONTAINS(s, pattern)	REGEXP_MATCH(s, pattern)
x IS NULL	IS_NULL(x)
Strings
SAFE_CAST(x AS STRING)	STRING(x)
SAFE_CAST(x AS STRING)	CAST(x AS STRING)
SUBSTR(s, 0, len)	LEFT(s, len)
SUBSTR(s, -len)	RIGHT(s, len)
STRPOS(s, "abc") > 0 or s LIKE '%abc%'	s CONTAINS "abc"
STRING_AGG(s, sep)	GROUP_CONCAT_UNQUOTED(s, sep)
IFNULL(LOGICAL_OR(x), false)	SOME(x)
IFNULL(LOGICAL_AND(x), true)	EVERY(x)
Arrays
ARRAY_AGG(x)	NEST(x)
ANY_VALUE(x)	ANY(x)
arr[SAFE_ORDINAL(index)]	NTH(index, arr) WITHIN RECORD
ARRAY_LENGTH(arr)	COUNT(arr) WITHIN RECORD
Url / IP Address Functions
NET.HOST(url)	HOST(url)
NET.PUBLIC_SUFFIX(url)	TLD(url)
NET.REG_DOMAIN(url)	DOMAIN(url)
NET.IPV4_TO_INT64( NET.IP_FROM_STRING( addr_string))	PARSE_IP(addr_string)
NET.IP_TO_STRING( NET.IPV4_FROM_INT64( addr_int64 & 0xFFFFFFFF))	FORMAT_IP(addr_int64)
NET.IP_FROM_STRING(addr_string)	PARSE_PACKED_IP(addr_string)
NET.IP_TO_STRING(addr_bytes)	FORMAT_PACKED_IP(addr_bytes)