码迷,mamicode.com
首页 > 其他好文 > 详细

Hadoop文件解压缩

时间:2017-07-29 15:09:07      阅读:166      评论:0      收藏:0      [点我收藏+]

标签:extension   addclass   javascrip   ras   rip   relevant   out   apache   success   

Class
org.apache.hadoop.io.compress .CompressionCodecFactory
A factory that will find the correct codec for a given filename.

Method
CompressionCodec getCodec(Path file)
Find the relevant compression codec for the given file based on its filename suffix.
获得这个压缩数据文件採用哪种压缩数据算法。

package Compress;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.compress.CompressionCodecFactory;
import org.apache.hadoop.io.compress.CompressionInputStream;
import org.apache.hadoop.mapreduce.Job;

/**
 * 解压缩
 * @author liguodong
 */
public class Decompression {

    final static String file = "/liguodong/data.gz";
    public static void main(String[] args) throws IOException {

        Configuration conf = new Configuration(); 
        Job job = Job.getInstance(conf, "DeCodec");  
        //打包执行必须执行的方法
        job.setJarByClass(Decompression.class);

        CompressionCodecFactory codecFactory = new CompressionCodecFactory(conf);
        //返回一个解压缩的实例
        CompressionCodec codec = codecFactory.getCodec(new Path(file));
        //返回被算法解压了的输入流
        CompressionInputStream inputStream = codec.createInputStream
                (new FileInputStream(new File(file)));
        //将输入流文件写出到去除了扩展名的文件
        FileOutputStream outputStream = new FileOutputStream
                (new File(codecFactory.removeSuffix(file, codec.getDefaultExtension())));
        IOUtils.copyBytes(inputStream, outputStream, conf);

    }
}

打成jar包:Decodec.jar

[root@master liguodong]# yarn jar Decodec.jar
15/06/05 21:54:25 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
[root@master liguodong]# ll
总用量 524824
-rw-r--r-- 1 root root      1492 6月   5 19:47 codec.jar
-rw-r--r-- 1 root root 536870912 6月   5 21:54 data
-rw-r--r-- 1 root root    521844 6月   5 21:40 data.gz

Hadoop文件解压缩

标签:extension   addclass   javascrip   ras   rip   relevant   out   apache   success   

原文地址:http://www.cnblogs.com/claireyuancy/p/7255676.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!