码迷,mamicode.com
首页 > 其他好文 > 详细

如何读取Hadoop中压缩的文件

时间:2017-03-25 14:30:53      阅读:443      评论:0      收藏:0      [点我收藏+]

标签:catch   base   buffer   数据导入   div   adf   inpu   思路   代码   

最近在处理离线数据导入HBase的问题,涉及从Hdfs中读取gz压缩文件,把思路记录下来,以作备用。具体代码如下:

package org.dba.util;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.PrintStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.compress.CompressionCodecFactory;
import org.apache.hadoop.io.compress.CompressionInputStream;

public class ReadHdfs {
    public static void ReadFile(String fileName) throws IOException{
        Configuration conf = new Configuration();
        Path file = new Path(fileName);
        FileSystem fs = FileSystem.get(conf);
        FSDataInputStream hdfsInstream = fs.open(file);
        CompressionCodecFactory factory = new CompressionCodecFactory(conf);
        CompressionCodec codec = factory.getCodec(file);
        BufferedReader reader = null;
        try{
            if(codec == null){
                reader = new BufferedReader(new InputStreamReader(hdfsInstream));
            }else{
                CompressionInputStream comInStream = codec.createInputStream(hdfsInstream);
                reader = new BufferedReader(new InputStreamReader(comInStream));
                System.out.println(reader.readLine().substring(0, 100));
            }
        }catch(Exception e){
            e.printStackTrace();
        }
    }
    public static void main(String[] args) throws IOException{
        ReadFile(args[0]);
    }

}

 

如何读取Hadoop中压缩的文件

标签:catch   base   buffer   数据导入   div   adf   inpu   思路   代码   

原文地址:http://www.cnblogs.com/ballwql/p/6616580.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!