码迷,mamicode.com
首页 > 编程语言 > 详细

Java网络爬虫 - 一个简单的爬虫例子

时间:2015-09-24 20:56:44      阅读:198      评论:0      收藏:0      [点我收藏+]

标签:

WikiScraper.java

package master.haku.scrape;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import java.net.*;
import java.io.*;

public class WikiScraper {
    public static void main(String[] args) {
        scrapeTopic("/wiki/Python");
    }

    public static void scrapeTopic(String url) {
        String html = getUrl("https://en.wikipedia.org" + url);
        Document doc = Jsoup.parse(html);
        String contentText = doc.select("#mw-content-text > p").first().text();
        System.out.println(contentText);
    }

    public static String getUrl(String url) {
        URL urlObj = null;
        try {
            urlObj = new URL(url);
        } catch (MalformedURLException e) {
            System.out.println("The url was malformed!");
            return "";
        }

        URLConnection urlCon = null;
        BufferedReader in = null;
        String outputText = "";

        try {
            urlCon = urlObj.openConnection();
            in = new BufferedReader(new InputStreamReader(urlCon.getInputStream()));
            String line = "";
            while ((line = in.readLine()) != null) {
                outputText += line;
            }
            in.close();
        } catch (IOException e) {
            System.out.println("There was an error connecting to the URL");
            return "";
        }

        return outputText;
    }
}

 

运行结果:

A python is a constricting snake belonging to the Python (genus), or, more generally, any snake in the family Pythonidae (containing the Python genus).

 

Java网络爬虫 - 一个简单的爬虫例子

标签:

原文地址:http://www.cnblogs.com/davidgu/p/4836305.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!