十一：Centralized Cache Management in HDFS 集中缓存管理

时间：2017-08-30 23:42:15 阅读：328 评论：0 收藏：0 [点我收藏+]

集中的HDFS缓存管理，该机制可以让用户缓存特定的hdfs路径，这些块缓存在堆外内存中。namenode指导datanode完成这个工作。

Centralized cache management in HDFS has many significant advantages.

Explicit pinning prevents frequently used data from being evicted from memory. This is particularly important when the size of the working set exceeds the size of main memory, which is common for many HDFS workloads. 阻止经常使用的数据被逐出内存。
Because DataNode caches are managed by the NameNode, applications can query the set of cached block locations when making task placement decisions. Co-locating a task with a cached block replica improves read performance.
When block has been cached by a DataNode, clients can use a new , more-efficient, zero-copy read API. Since checksum verification of cached data is done once by the DataNode, clients can incur essentially zero overhead when using this new API.可以使用更高效的无复制的api读这些块。
Centralized caching can improve overall cluster memory utilization. When relying on the OS buffer cache at each DataNode, repeated reads of a block will result in all nreplicas of the block being pulled into buffer cache. With centralized cache management, a user can explicitly pin only m of the n replicas, saving n-m memory.减少重复读时使用的
来源： http://hadoop.apache.org/docs/r2.6.4/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html

适用的情况：

经常需要读的文件。比如一个小文件。

结构：