码迷,mamicode.com
首页 > 其他好文 > 详细

网络爬虫(抓取)正则表达式

时间:2016-01-09 00:51:09      阅读:199      评论:0      收藏:0      [点我收藏+]

标签:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Net;
using System.Text;
using System.Text.RegularExpressions;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;

namespace WebApplication8
{
public partial class WebForm1 : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
string keyword = "湖南艾华集团股份有限公司";
keyword = Server.HtmlEncode(keyword);
WebClient wc = new WebClient();
using (Stream stream = wc.OpenRead("http://bgcheck.cn/MemberCenter/FirmCredit/Search.html?Keywords="+keyword))
{
using (StreamReader sr = new StreamReader(stream, Encoding.UTF8))
{
string content = sr.ReadToEnd();
string ratingClasspatern = @"(?<=\[信用等级:([\s\S]*?)<a(.*)?[^>]*?>)([\s\S]*?)(?=</a>)";
string ratingSequencepatern= @"(?<=信用排名:([\s\S]*?)<span(.*)?[^>]*?>)([\s\S]*?)(?=</span>)";
string ratingStatepatern = @"(?<=信用状况:([\s\S]*?)<span(.*)?[^>]*?>)([\s\S]*?)(?=</span>)";
MatchCollection ratingClassmatches = Regex.Matches(content, ratingClasspatern);
MatchCollection ratingSequencematches = Regex.Matches(content, ratingSequencepatern);
MatchCollection ratingStatematches = Regex.Matches(content, ratingStatepatern);
string ratingClass = string.Empty;
string ratingSequence = string.Empty;
string ratingState = string.Empty;
foreach (Match match in ratingClassmatches)
{
ratingClass = match.Groups[0].Value;
break;
}
foreach (Match match in ratingSequencematches)
{
ratingSequence = match.Groups[0].Value;
break;
}
foreach (Match match in ratingStatematches)
{
ratingState = match.Groups[0].Value;
break;
}
}
}
}
}
}

网络爬虫(抓取)正则表达式

标签:

原文地址:http://www.cnblogs.com/kexb/p/5115233.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!