码迷,mamicode.com
首页 > Web开发 > 详细

HtmlAgilityPack类库解析html

时间:2014-09-08 02:09:36      阅读:301      评论:0      收藏:0      [点我收藏+]

标签:style   blog   http   color   io   ar   strong   art   div   

一般解决方式:http://www.cnblogs.com/kissdodog/archive/2013/02/28/2936950.html 

 

特殊处理方式:如果请求页面ContentEncoding=gzip

//获取ContentEncoding

static void getch(string url)
        {
            WebRequest rebRequest = WebRequest.Create(url);
            HttpWebResponse web = (HttpWebResponse)rebRequest.GetResponse();
            string chart = web.CharacterSet;
            string conending = web.ContentEncoding;
            string contenttype = web.ContentType;
            Console.WriteLine(chart);
            Console.WriteLine(conending);
            Console.WriteLine(contenttype);
 }

1.给HttpWebRequest对象,添加如下Header:

request.Headers.Add("Accept-Encoding", "gzip");

2.对接收到的流进行解码:

private string GetResponseBody(HttpWebResponse response)
{
string responseBody = string.Empty;

if (response.ContentEncoding.ToLower().Contains("gzip")) {

using (GZipStream stream = new GZipStream(
response.GetResponseStream(), CompressionMode.Decompress))
{
using (StreamReaderreader = new StreamReader(stream))
{
responseBody = reader.ReadToEnd();
}
}
}
else if (response.ContentEncoding.ToLower().Contains("deflate"))
{
using (DeflateStream stream = new DeflateStream(
response.GetResponseStream(), CompressionMode.Decompress))
{
using (StreamReader reader =
new StreamReader(stream, Encoding.UTF8))
{
responseBody = reader.ReadToEnd();
}
}
}
else
{
using (Stream stream = response.GetResponseStream())
{
using (StreamReader reader =
new StreamReader(stream, Encoding.UTF8))
{
responseBody = reader.ReadToEnd();
}
}
}
return responseBody;
}

解析: HtmlDocument doc = new HtmlDocument();
            //string html = wc.DownloadString("agenthome/-i31-j310-kw/");

            doc.LoadHtml(responseBody);
            HtmlNode node = doc.DocumentNode.SelectSingleNode("/html/body/div[1]");
            Console.WriteLine(node.InnerText);
            Console.WriteLine(node.InnerHtml);
            Console.WriteLine(node.Name);

参考http://www.csharpwin.com/csharpspace/13345r5893.shtml

HTML解析利器HtmlAgilityPack

参考http://zhoufoxcn.blog.51cto.com/792419/595344/

 

HtmlAgilityPack类库解析html

标签:style   blog   http   color   io   ar   strong   art   div   

原文地址:http://www.cnblogs.com/henry-it/p/3961021.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!