标签:pac mit 第一个 data when ade call ast 容错
顺序访问的数据是连续的。硬盘的磁头是按一定的顺序访问磁片,磁头不做频繁的寻道,这样带来的结果是速度很快。因为寻道时间是影响磁盘读写速度的主要原因。在平常的应用中顺序访问的应用很少。大文件的连续备份,是顺序读写的。dd就是典型的顺序读写,
随机访问主要是磁头在做频繁的移动,原因是数据在磁盘的不连续性,这和数据存放到磁盘的过程有关系,随机访问的速度要比顺序访问慢很多。原因也是因为磁头频繁的寻道,定位,磁头的移动消耗掉很多时间。大部分的应用在磁盘上的读写是随机的。
因为在实际应用中,以LINUX为例子,在写数据的时候,OS会预读8个block,也就是你刚开始写文件的时候OS会努力让数据在磁盘上是连续的,但在宏观上是做不到的。我们假如磁盘是新的,写300K的一个文件。这时候是连续的。写完后,其他文件又往硬盘里写,又是连续的。过一段时间,已经写了很多文件,当然文件会经常被修改的。我们可以看到,如果修改一个文件,会发现被修改文件附近的block已经被其他文件占用了。磁头只好把变化的block写在磁盘的其他位置,过一段时间。磁盘上的文件就会大部分不是连续的,分散在磁盘的各个位置。当你的程序读文件的时候,对硬盘来说,磁头就是在不停的寻道,把分散在磁盘不同位置的数据找出来,看上去没有丝毫的规律。当然磁头移动到什么位置是根据INODE来确定的。这时候程序对磁盘的访问就是随机的。
Sequential Access pattern is when you read your data in sequence (often from start to finish).
Consider a book example. When reading a novel, you use sequential order: you start with page 1, then move to page 2 and so on.
When you access sequentially, you only need to seek once and then read until you‘re done with that data. When doing random access, you need to seek every time you want to switch to a different place in your file. This can be quite a performance hit on hard drives, because seeking is really expensive on magnetic drives.
Hadoop uses blocks to store a file or parts of a file. A Hadoop block is a file on the underlying filesystem. Since the underlying filesystem stores files as blocks, one Hadoop block may consist of many blocks in the underlying file system. Blocks are large. They default to 64 megabytes each and most systems run with block sizes of 128 megabytes or larger.
Hadoop is designed for streaming or sequential data access rather than random access. Sequential data access means fewer seeks, since Hadoop only seeks to the beginning of each block and begins reading sequentially from there.
fixed in size. This makes it easy to calculate how many can fit on a disk.
by being made up of blocks that can be spread over multiple nodes, a file can be larger than any single disk in the cluster.
HDFS blocks also don‘t waste space. If a file is not an even multiple of the block size, the block containing the remainder does not occupy the space of an entire block. 【疑问:appendToFile操作怎么优化???】
hadoop fsck /test -blocks --------------------------------- /test/test1.txt: Under replicated BP-1610905963-10.3.242.99-1494403766821:blk10737424051583. Target Replicas is 2 but found 1 replica(s). . /test/test2.txt: Under replicated BP-1610905963-10.3.242.99-1494403766821:blk10737424061584. Target Replicas is 2 but found 1 replica(s).
hadoop fs -appendToFile /home/hhh/log.txt /test/test1.txt
hadoop fsck /test -blocks ------------------------------------ /test/test1.txt: Under replicated BP-1610905963-10.3.242.99-1494403766821:blk10737424051894. Target Replicas is 2 but found 1 replica(s). . /test/test2.txt: Under replicated BP-1610905963-10.3.242.99-1494403766821:blk10737424061584. Target Replicas is 2 but found 1 replica(s).
/test/test.csv 314015127 bytes, 3 block(s): Under replicated BP-1610905963-10.3.242.99-1494403766821:blk10737427091896. Target Replicas is 2 but found 1 replica(s). Under replicated BP-1610905963-10.3.242.99-1494403766821:blk10737427101897. Target Replicas is 2 but found 1 replica(s). Under replicated BP-1610905963-10.3.242.99-1494403766821:blk10737427111898. Target Replicas is 2 but found 1 replica(s). \0. BP-1610905963-10.3.242.99-1494403766821:blk10737427091896 len=134217728 repl=1 \1. BP-1610905963-10.3.242.99-1494403766821:blk10737427101897 len=134217728 repl=1 \2. BP-1610905963-10.3.242.99-1494403766821:blk10737427111898 len=45579671 repl=1
/test/test.csv 314015150 bytes, 3 block(s): Under replicated BP-1610905963-10.3.242.99-1494403766821:blk10737427091896. Target Replicas is 2 but found 1 replica(s). Under replicated BP-1610905963-10.3.242.99-1494403766821:blk10737427101897. Target Replicas is 2 but found 1 replica(s). Under replicated BP-1610905963-10.3.242.99-1494403766821:blk10737427111899. Target Replicas is 2 but found 1 replica(s). \0. BP-1610905963-10.3.242.99-1494403766821:blk10737427091896 len=134217728 repl=1 \1. BP-1610905963-10.3.242.99-1494403766821:blk10737427101897 len=134217728 repl=1 \2. BP-1610905963-10.3.242.99-1494403766821:blk10737427111899 len=45579694 repl=1
标签:pac mit 第一个 data when ade call ast 容错
原文地址:http://www.cnblogs.com/wttttt/p/6918942.html