标签:
1、有些网站访问速度慢,而且这个网站的连接数(比如全球内衣,另外对于女生各种什么内衣不懂的也可以上去查看了解哈),因为没有即时的关闭,造成抓取页面数据的时候超时也严重。
解决:把相应的HttpWebResponse.Close(), HttpWebRequest.Abort(); 以及HttpWebRequest.KeepAlive=false,还有吧超时时间设置长一点, 之后连接超时的几率就贬低了。还有直接c盘的host文件的域名直接指向某个IP,减少去dns服务器查找的时间
2、抓中国供应商的时候开了多线程跑的太快,几十条就出现拉动类的验证码,这个只能降低频率。
下面是封装请求的类库
public static string getRequest(string url, string charset = "utf-8")
{
HttpWebRequest myreq = null;
HttpWebResponse myres = null;
StreamReader reader = null;
Stream stream = null;
string result = "";
string code = charset; //charset.ToLower()
//code = "utf-8";
try
{
myreq = (HttpWebRequest)WebRequest.Create(url);
myreq.Timeout = 20000;
myreq.Method = "GET";
myreq.KeepAlive = false;
myreq.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
//myreq.UserAgent = "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)";
myreq.Headers.Add("content", "text/html; charset=" + code);
//myreq.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727)";
myreq.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36";
myreq.KeepAlive = true;
myres = (HttpWebResponse)myreq.GetResponse();
stream = myres.GetResponseStream();
reader = new StreamReader(stream, System.Text.Encoding.GetEncoding(code));
result = reader.ReadToEnd();
reader.Close();
reader.Dispose();
stream.Close();
stream.Dispose();
}
catch
{ }
finally
{
if (myreq != null)
{
myres.Close();
}
if (myreq != null)
{
myreq.Abort();
}
}
return result;
}
标签:
原文地址:http://www.cnblogs.com/zhian/p/5810048.html