hadoop - Where Does the HDFS Account for Triple Replication in Usage Reports? -


in latest version of hadoop distributions, hdfs usage reports seem report on space without accounting replication factor, correct?

when 1 looks @ namenode web ui and/or runs 'hadoop dfsadmin -report' command, 1 can see report looks this:

configured capacity: 247699161084 (230.69 gb) present capacity: 233972113408 (217.9 gb) dfs remaining: 162082414592 (150.95 gb) dfs used: 71889698816 (66.95 gb) dfs used%: 30.73% under replicated blocks: 40 blocks corrupt replicas: 6 missing blocks: 0 

based on machine sizes of cluster, seems report not account triple replication... i.e. if place file on hdfs, should account triple replication myself.

for example, if placed 50gb file on hdfs, hdfs dangerously close full (since seems file replicated 3 times, using 150gb remain)?

let define each of these terms mean.

  1. configured capacity: total capacity available hdfs storage. if have 4 nodes , each node has 50 gb capacity, configured capacity 200 gb. replication factor irrelevant in case of configured capacity.

  2. dfs used: amount of storage space has been used hdfs. divide dfs used replication factor actual size of files stored without replication. if dfs used 60 gb, , replication factor 3, actual size of files 60/3 = 20 gb.

  3. dfs remaining: amount of storage space still available hdfs. if have 150 gb remaining storage space, mean can store upto 150/3 = 50 gb of files without exceeding configured capacity (assuming replication factor = 3).

  4. present capacity: amount of storage space available storing user files after allocating space metadata. difference:(configured capacity - present capacity) used storing file system metadata. , inode information.

hope clears up.


Comments

Popular posts from this blog

curl - PHP fsockopen help required -

HTTP/1.0 407 Proxy Authentication Required PHP -

c# - Resource not found error -