for the beginners and laterals . . .

Sunday, May 30, 2021

Standard SQL vs Legacy SQL - functions

Standard SQL	Legacy SQL
*#standardSQL* *SELECT* *repository.url* *FROM* *`bigquery-public-data.samples.github_nested`* *LIMIT 5;*	*#legacySQL* *SELECT* *repository.url* *FROM* *[bigquery-public-data.samples.github_nested]* *LIMIT 5;*
Numeric
SAFE_CAST(x AS INT64)	INTEGER(x)
SAFE_CAST(x AS INT64)	CAST(x AS INTEGER)
APPROX_COUNT_DISTINCT(x)	COUNT(DISTINCT x)
COUNT(DISTINCT x)	EXACT_COUNT_DISTINCT(x)
APPROX_QUANTILES(x, buckets)	QUANTILES(x, buckets + 1)
APPROX_TOP_COUNT(x, num)	TOP(x, num), COUNT(*)
MOD(x, y)	x % y
Datetime
TIMESTAMP_DIFF(t1, t2, DAY)	DATEDIFF(t1, t2)
CURRENT_TIMESTAMP	NOW
FORMAT_TIMESTAMP(fmt, t)	STRFTIME_UTC_USEC(t, fmt)
TIMESTAMP_TRUNC(t, DAY)	UTC_USEC_TO_DAY(t)
REGEXP_CONTAINS(s, pattern)	REGEXP_MATCH(s, pattern)
x IS NULL	IS_NULL(x)
Strings
SAFE_CAST(x AS STRING)	STRING(x)
SAFE_CAST(x AS STRING)	CAST(x AS STRING)
SUBSTR(s, 0, len)	LEFT(s, len)
SUBSTR(s, -len)	RIGHT(s, len)
STRPOS(s, "abc") > 0 or s LIKE '%abc%'	s CONTAINS "abc"
STRING_AGG(s, sep)	GROUP_CONCAT_UNQUOTED(s, sep)
IFNULL(LOGICAL_OR(x), false)	SOME(x)
IFNULL(LOGICAL_AND(x), true)	EVERY(x)
Arrays
ARRAY_AGG(x)	NEST(x)
ANY_VALUE(x)	ANY(x)
arr[SAFE_ORDINAL(index)]	NTH(index, arr) WITHIN RECORD
ARRAY_LENGTH(arr)	COUNT(arr) WITHIN RECORD
Url / IP Address Functions
NET.HOST(url)	HOST(url)
NET.PUBLIC_SUFFIX(url)	TLD(url)
NET.REG_DOMAIN(url)	DOMAIN(url)
NET.IPV4_TO_INT64( NET.IP_FROM_STRING( addr_string))	PARSE_IP(addr_string)
NET.IP_TO_STRING( NET.IPV4_FROM_INT64( addr_int64 & 0xFFFFFFFF))	FORMAT_IP(addr_int64)
NET.IP_FROM_STRING(addr_string)	PARSE_PACKED_IP(addr_string)
NET.IP_TO_STRING(addr_bytes)	FORMAT_PACKED_IP(addr_bytes)

Wednesday, May 5, 2021

Difference between Cloud Storage and HDFS

Feature	Cloud Storage	HDFS
I/O variance	Higher variance We can avoid it by using caching and read replica, considering cloud big table and cloud datastore.	Lower variance
Support of file appends or truncates	No can do overwrite of files	Yes
POSIX - compliant	not compliant	not fully compliant
file system information	no information available	"hadoop fsck --files --blocks" - exposes all directory information's in HDFS.
request latency	greater round trip latency	lesser latency
cost	57% lower total cost of ownership	Much higher compared to Cloud
separation between cloud storage and HDFS	Multiple clusters can access same location usage of gs:// in cloud workloads much better performance on Persistent Disk	Single cluster access access using hdfs:// with in cluster lesser performance due to low grade HDD
interoperability	seamless connectivity between spark and hadoop instances. exported files can import in big query. easier to manage cloud dataflow tasks	standalone cluster instances operations manual access provision between instances and storage
storage management	Lesser overhead on operations team	Much overhead on operations team
startup time	quicker job startup time during job execution	it takes significant time to start up jobs
security	cloud inbuilt security and access using IAM	manual security and access setup. changing security key is an overhead to operations teams

Saturday, August 24, 2019

create ssh keys and copy it on the remote server

Step 1: Create public and private keys using ssh-key-gen on local-host

mkdir -p $HOME/.ssh
chmod 0700 $HOME/.ssh

ssh-keygen -t rsa -b 4096 <> -f ~/.ssh/newremoteweb.key <> -C "My new remote web key" <>
ssh-keygen -t rsa -P ""

Step 2: Copy the public key to remote-host using ssh-copy-id

ssh-copy-id -i ~/.ssh/id_rsa.pub remote-host
ssh-copy-id -i ~/.ssh/id_rsa.pub remote-user@remote-host

scp $HOME/.ssh/id_rsa.pub remote-user@remote-host:~/.ssh/authorized_keys

adding the public key in remote server

## First create .ssh directory on server
ssh remote-user@remote-host "umask 077; test -d .ssh || mkdir .ssh"

## cat local id.rsa.pub file and pipe over ssh to append the public key in remote servercat $HOME/.ssh/id_rsa.pub | ssh remote-user@remote-host "cat >> .ssh/authorized_keys"

Now try logging into the machine, with "ssh 'remote-host'", and check in:

.ssh/authorized_keys

Step 3: Login to remote-host without entering the password

ssh remote-host

It should connect without password. Our testing is over..

Some important stuffs which you can consider.

$HOME/.ssh/id_rsa– contains your private key.
$HOME/.ssh/id_rsa.pub – contain your public key.

restarting ssh daemon/service on the server.

## on centos/rhel/fedora
sudo service sshd reload

## on linux server
sudo systemctl reload sshd

## on debian/ubuntu - older version
sudo /etc/init.d/ssh reload

## on debian/ubuntu - latest
sudo systemctl reload ssh

## common method to reload sshd
sudo kill -HUP `cat /var/run/sshd.pid`
or
sudo kill -HUP $(cat /var/run/sshd.pid)

## list all public key parameters of all identities from the ssh-agent:
ssh-add -L

## deleting all private keys from the ssh-agent:
ssh-add -D

## kill the ssh agent, which is running:
kill $SSH_AGENT_PID
trap "kill $SSH_AGENT_PID" 0

Thursday, July 18, 2019

When HARD PARSE and SOFT PARSE and How we can avoid it?

When any new SQL arrives,

It tries to find a suitable child cursor on the library cache, then SOFT PARSE occurs.
If there is no parent cursor found, then HARD PARSE occurs.
If there is a parent cursor found, but It's existing children can't be reused by this call.
As it depend on then size of bind variables and optimizer settings and NLS settings, as well. At this time, there will be a HARD PARSE.
If there is a parent cursor found, but if any existing child cursors executed with similar execution plan, then it can be reused and, there will be a SOFT PARSE.

Ideally, the Parent cursor contains the SQL statement text only and Child cursor contains Execution plans.

By using bind variables we can avoid unnecessary HARD PARSE to the DB.
If bind variables are not used, then there is HARD PARSE of all SQL statements.
This has a major server impact on performance and we will face with the high waits time during SQL execution and higher cost of SQL.