Standard
SQL |
Legacy
SQL |
#standardSQL SELECT repository.url FROM `bigquery-public-data.samples.github_nested` LIMIT 5; |
#legacySQL SELECT repository.url FROM [bigquery-public-data.samples.github_nested] LIMIT 5; |
Numeric |
|
SAFE_CAST(x
AS INT64) |
INTEGER(x) |
SAFE_CAST(x
AS INT64) |
CAST(x
AS INTEGER) |
APPROX_COUNT_DISTINCT(x) |
COUNT(DISTINCT
x) |
COUNT(DISTINCT
x) |
EXACT_COUNT_DISTINCT(x) |
APPROX_QUANTILES(x,
buckets) |
QUANTILES(x, buckets
+ 1) |
APPROX_TOP_COUNT(x,
num) |
TOP(x, num), COUNT(*) |
MOD(x,
y) |
x
% y |
Datetime |
|
TIMESTAMP_DIFF(t1,
t2, DAY) |
DATEDIFF(t1, t2) |
CURRENT_TIMESTAMP |
NOW |
FORMAT_TIMESTAMP(fmt,
t) |
STRFTIME_UTC_USEC(t, fmt) |
TIMESTAMP_TRUNC(t,
DAY) |
UTC_USEC_TO_DAY(t) |
REGEXP_CONTAINS(s,
pattern) |
REGEXP_MATCH(s, pattern) |
x
IS NULL |
IS_NULL(x) |
Strings |
|
SAFE_CAST(x
AS STRING) |
STRING(x) |
SAFE_CAST(x
AS STRING) |
CAST(x
AS STRING) |
SUBSTR(s,
0, len) |
LEFT(s, len) |
SUBSTR(s,
-len) |
RIGHT(s, len) |
STRPOS(s,
"abc") > 0 or s LIKE '%abc%' |
s
CONTAINS "abc" |
STRING_AGG(s,
sep) |
GROUP_CONCAT_UNQUOTED(s, sep) |
IFNULL(LOGICAL_OR(x), false) |
SOME(x) |
IFNULL(LOGICAL_AND(x), true) |
EVERY(x) |
Arrays |
|
ARRAY_AGG(x) |
NEST(x) |
ANY_VALUE(x) |
ANY(x) |
arr[SAFE_ORDINAL(index)] |
NTH(index, arr)
WITHIN RECORD |
ARRAY_LENGTH(arr) |
COUNT(arr)
WITHIN RECORD |
Url / IP Address Functions |
|
NET.HOST(url) |
HOST(url) |
NET.PUBLIC_SUFFIX(url) |
TLD(url) |
NET.REG_DOMAIN(url) |
DOMAIN(url) |
NET.IPV4_TO_INT64( NET.IP_FROM_STRING( addr_string)) |
PARSE_IP(addr_string) |
NET.IP_TO_STRING( NET.IPV4_FROM_INT64( addr_int64
& 0xFFFFFFFF)) |
FORMAT_IP(addr_int64) |
NET.IP_FROM_STRING(addr_string) |
PARSE_PACKED_IP(addr_string) |
NET.IP_TO_STRING(addr_bytes) |
FORMAT_PACKED_IP(addr_bytes) |
Tips and good resources for all.. Oracle, Big data, Hadoop, Unix, Linux
Sunday, May 30, 2021
Standard SQL vs Legacy SQL - functions
Wednesday, May 5, 2021
Difference between Cloud Storage and HDFS
Feature | Cloud Storage | HDFS |
I/O variance | Higher variance We can avoid it by using caching and read replica, considering cloud big table and cloud datastore. | Lower variance |
Support of file appends or truncates | No can do overwrite of files | Yes |
POSIX - compliant | not compliant | not fully compliant |
file system information | no information available | "hadoop fsck --files --blocks" - exposes all directory information's in HDFS. |
request latency | greater round trip latency | lesser latency |
cost | 57% lower total cost of ownership | Much higher compared to Cloud |
separation between cloud storage and HDFS | Multiple clusters can access same location usage of gs:// in cloud workloads much better performance on Persistent Disk | Single cluster access access using hdfs:// with in cluster lesser performance due to low grade HDD |
interoperability | seamless connectivity between spark and hadoop instances. exported files can import in big query. easier to manage cloud dataflow tasks | standalone cluster instances operations manual access provision between instances and storage |
storage management | Lesser overhead on operations team | Much overhead on operations team |
startup time | quicker job startup time during job execution | it takes significant time to start up jobs |
security | cloud inbuilt security and access using IAM | manual security and access setup. changing security key is an overhead to operations teams |
Subscribe to:
Posts (Atom)