网站数据获取

时间：2014-10-17 16:50:24 阅读：135 评论：0 收藏：0 [点我收藏+]

本例中主要是通过HtmlAgilityPack解析html源码获取所需的数据.

using HtmlAgilityPack;

1.通过C#中WebRequest，WebResponse，StreamReader类获取网页源代码

WebRequest request = WebRequest.Create(url);
using (WebResponse response = request.GetResponse())
using (StreamReader reader = new StreamReader(response.GetResponseStream(), encoding))
result = reader.ReadToEnd();

2．通过网页URL获取HtmlNode ，通过HtmlAgilityPack中的HtmlDocument类获取

HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
document.LoadHtml(htmlSource);
HtmlNode rootNode = document.DocumentNode;
return rootNode;

3.通过HtmlNode的SelectSingleNode方法就可获取你所需要的内容了，注意以下代码中path是HTML的标签路径如:path="//div[@class=‘article_title‘]/h1/span/a";//文章标题PATH

对应于

<h1>

<span>

<a>获取这里的内容

</a>

</span>

</h1>

</div>

参考源码如下:

HtmlNode temp = srcNode.SelectSingleNode(path);
if (temp == null)
return null;
return temp.InnerText;

返回值为: 获取这里的内容

其中temp.InnerHtml可获取网站HTML的内容如：<a>获取这里的内容</a>

通过以上操作就可获取到网站中你所需要的内容，希望此内容对大家有所帮助，引用源码文章链接http://blog.csdn.net/gdjlc/article/details/11620915

网站数据获取

标签：blog http ar strong sp 数据 div art on

原文地址：http://www.cnblogs.com/yinjianjing/p/4031483.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行