码迷,mamicode.com
首页 > 编程语言 > 详细

java htmlunit 抓取网页数据

时间:2016-12-25 02:10:13      阅读:233      评论:0      收藏:0      [点我收藏+]

标签:tab   [1]   server   pre   his   script   lis   orb   secure   

WebClient webClient=new WebClient(BrowserVersion.CHROME);
            webClient.setJavaScriptTimeout(5000);
            webClient.getOptions().setUseInsecureSSL(true);
            
            
            webClient.getOptions().setJavaScriptEnabled(true);
            webClient.getOptions().setCssEnabled(false);
            webClient.getOptions().setThrowExceptionOnScriptError(false);
            webClient.getOptions().setTimeout(100000);
            webClient.getOptions().setDoNotTrackEnabled(false);
            
            
            HtmlPage page=webClient.getPage(this.path);
            webClient.waitForBackgroundJavaScript(20000);
            
            Thread.sleep(5000);
            
            HtmlDivision div=(HtmlDivision)page.getElementById("forecast");
            String xml=div.asXml();
            if(xml.indexOf("forecast-data-loading")>=0)
            {
                System.out.println("htmlUnit解析页面失败");
            }
            else
            {
                System.out.println("htmlUnit解析页面成功");
                int[] aqis=new int[8];
                
                int i=0;
                List<HtmlTable> tables=(List<HtmlTable>)div.getByXPath("./div[2]/center[1]/table");
                if(tables.size()==8)
                {
                    for(HtmlTable table : tables)
                    {  
                        List<HtmlTableRow> trs=(List<HtmlTableRow>)table.getByXPath("./tbody/tr[4]");
                        HtmlTableRow tr=trs.get(0);
                        
                        int aqi=0;
                        List<HtmlTableCell> cells = (List<HtmlTableCell>)tr.getByXPath("./td");
                        for(HtmlTableCell cell : cells)
                        {
                            String s=cell.asText();
                            String [] values=s.split("\r\n");
                            aqi=aqi+(Integer.parseInt(values[0])+Integer.parseInt(values[1]))/2 ;
                        }
                        aqi=aqi/cells.size();
                        aqis[i]=aqi;
                        i=i+1;
                    }
                }

 

java htmlunit 抓取网页数据

标签:tab   [1]   server   pre   his   script   lis   orb   secure   

原文地址:http://www.cnblogs.com/tiandi/p/6218905.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!