Friday, February 5, 2016

What are Surrogate Keys, Primary Keys and Candidate keys? Where its used?

A Surrogate key is any column or set of columns that can be declared as the primary key instead of a real or natural key. Sometimes there can be several natural keys that could be declared as the primary key, and these are all called candidate keys. So we can call a  surrogate key is a candidate key.

A Surrogate key is the alternate of primary key that allows duplication of datas/records. It is an immutable set of attributes that uniquely identify a row that were generated specifically and solely to identify this row which is not in case of natural key or primary key.

A table could actually have more than one surrogate keys, although this would be unusual. A natural key is an immutable set of attributes that uniquely identify a row that occur naturally with the row itself. The most common type of surrogate key is an incrementing integer, such as an auto_increment column in MySQL, or a sequence in Oracle, or an identity column in SQL Server.

Primary key and Surrogate key are same but surrogate key is a system generated numeric or integer value to identify each row uniquely, it has a define incremental value for each row in a table.

Surrogate key does not have any business importance for the value it holds but primary key has a significant business value.

OLTP Databases are called as of  Normalised Form  whereas  Data warehouses - DWHs  are called as of De-normalised form as DWH is used to maintain the historic data for analyzing. To remain de-normalised, duplication is allowed. When data inserting in DWH, Surrogate key a new column named serial number is introduced to allow duplication.

A Surrogate key in a data warehouse is more than just a substitute for a natural key. In a data warehouse, a surrogate key is a necessary generalization of the natural production key and is one of the basic elements of data warehouse design. Surrogate Key is the solution for critical column problems.

Ex.  A customer purchases different items from stores at different locations. Here, we have to maintain historical data, by using surrogate key which introduces the row in the data warehouse to maintain historical data. Another example of its, a single mobile number is used by other person if it is not in use for more than one year, it is possible just because of this surrogate key.

Differences between B*Tree and Bitmapped Indexes.


B*Tree indexes:

  • Where we need to maintain the sort order of the data, making it easy to look up range data.
  • Required to use multicolumn indexes, we can use the leading edge columns to resolve a query, even if that query doesn't reference all columns of the index.
  • As per its behavior, its automatically stay balanced.
  • Relatively constant performance of any query.
  • Can also specify reverse and unique
  • Recommended for OLTP databases.

Bitmapped indexes:

  • Use them to index columns with that contain a relatively small number of distinct values.
  • Very compact and using lesser space.
  • Designed for query intensive databases.
  • Not good for range scans.
  • Are available only in Enterprise Edition, in relational databases only.