Feature | Cloud Storage | HDFS |
I/O variance | Higher variance We can avoid it by using caching and read replica, considering cloud big table and cloud datastore. | Lower variance |
Support of file appends or truncates | No can do overwrite of files | Yes |
POSIX - compliant | not compliant | not fully compliant |
file system information | no information available | "hadoop fsck --files --blocks" - exposes all directory information's in HDFS. |
request latency | greater round trip latency | lesser latency |
cost | 57% lower total cost of ownership | Much higher compared to Cloud |
separation between cloud storage and HDFS | Multiple clusters can access same location usage of gs:// in cloud workloads much better performance on Persistent Disk | Single cluster access access using hdfs:// with in cluster lesser performance due to low grade HDD |
interoperability | seamless connectivity between spark and hadoop instances. exported files can import in big query. easier to manage cloud dataflow tasks | standalone cluster instances operations manual access provision between instances and storage |
storage management | Lesser overhead on operations team | Much overhead on operations team |
startup time | quicker job startup time during job execution | it takes significant time to start up jobs |
security | cloud inbuilt security and access using IAM | manual security and access setup. changing security key is an overhead to operations teams |
Tips and good resources for all.. Oracle, Big data, Hadoop, Unix, Linux