标签:cep section 行政区 robot 运行 deb 参考 proxy 电脑
摘要:[robot]送出HttpWebRequest(get,post)
以ie为例,观察fiddler之后的范例
GET:
原始的fiddler的raw数据:
GET http://w2.land.taipei.gov.tw/land4/loina.asp HTTP/1.1C#:
HttpWebRequest request;
CookieContainer cookies = new CookieContainer();
string url = "http://w2.land.taipei.gov.tw/land4/loina.asp";
string html = "";
request = WebRequest.Create(url) as HttpWebRequest;
//如果需要使用proxy的话....
WebProxy _proxy = new WebProxy("http://myproxy.com.tw:8888", true);
_proxy.Credentials = CredentialCache.DefaultCredentials;
request.Proxy = _proxy;
//end of proxy
request.Method = "GET";
request.Accept = "text/html, application/xhtml+xml, */*";
request.Headers.Set("Accept-Language", "zh-Hant-TW,zh-Hant;q=0.8,en-US;q=0.5,en;q=0.3");
request.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko";
//client跟server说,我要使用的加密方式是gzip,server会看设定才决定是否采用
request.Headers.Set("Accept-Encoding", "gzip, deflate");
//如果对方回传的数据有用gzip加密的话,会自动用gzip方式解开, 没加这行的话,可能解不开
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
request.Host = "w2.land.taipei.gov.tw";
request.CookieContainer = cookies;
//以下这是默认值true, 有时候故意设定为false,就会抓不到html啰
request.KeepAlive = true;
using (var response = (HttpWebResponse)request.GetResponse())
{
using (var responseStream = response.GetResponseStream())
{
using (var reader = new StreamReader(responseStream, Encoding.Default))
{
html = reader.ReadToEnd();
}
}
}
ps.20161009补充chrome的参考程序:
string result = "";
HttpWebRequest request;
CookieContainer cookies = new CookieContainer();
string url = "www.yoururl.com";
request = WebRequest.Create(url) as HttpWebRequest;
request.Method = "GET";
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
request.Headers.Set("Accept-Encoding", "gzip, deflate, sdch");
request.Headers.Set("Accept-Language", "zh-TW,zh;q=0.8,en-US;q=0.6,en;q=0.4,zh-CN;q=0.2");
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36";
request.CookieContainer = cookies;
request.Headers.Set("Upgrade-Insecure-Requests", "1");
//以下这是默认值true, 有时候故意设定为false,就会抓不到html啰
request.KeepAlive = true;
using (var response = (HttpWebResponse)request.GetResponse())
{
using (var responseStream = response.GetResponseStream())
{
using (var reader = new StreamReader(responseStream, Encoding.UTF8))
{
result = reader.ReadToEnd();
}
}
}
return result;
POST:
fiddler原始raw数据:
POST http://w2.land.taipei.gov.tw/land4/loina.asp HTTP/1.1C#:
HttpWebRequest requestPost;
CookieContainer cookiesPost = new CookieContainer();
requestPost = WebRequest.Create(url) as HttpWebRequest;
string html = "";
string postData = "destrict=03§ion=&land_mom=&land_son=";//行政区选择中正区, 有特殊符记得HttpUtility.UrlEncode
requestPost.Method = "POST";
requestPost.Accept = "text/html, application/xhtml+xml, */*";
requestPost.Referer = "http://w2.land.taipei.gov.tw/land4/loina.asp";
requestPost.Headers.Set("Accept-Language", "zh-Hant-TW,zh-Hant;q=0.8,en-US;q=0.5,en;q=0.3");
requestPost.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko";
requestPost.ContentType = "application/x-www-form-urlencoded";
requestPost.Headers.Set("Accept-Encoding", "gzip, deflate");
requestPost.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
requestPost.ContentLength = postData.Length;
requestPost.Host = "w2.land.taipei.gov.tw";
requestPost.Headers.Set("Pragma", "no-cache");
requestPost.CookieContainer = cookiesPost;
//碰到(417) Expectation Failed错误的时候,把下面这行加上去
//System.Net.ServicePointManager.Expect100Continue = false;
using (var stream = requestPost.GetRequestStream())
{
using (var writer = new StreamWriter(stream))
{
writer.Write(postData.ToString());
writer.Flush();
writer.Close();
}
stream.Close();
}
using (var response = (HttpWebResponse)requestPost.GetResponse())
{
using (var responseStream = response.GetResponseStream())
{
using (var reader = new StreamReader(responseStream, Encoding.Default))
{
html = reader.ReadToEnd();
}
}
}
ps.记得如果重复使用request变量的话,每次都要重新设定Method,Accept,Referer,Accept-Language....
因为.net在每次送出request之后,会把上述header都reset掉
补充20151116:
一般来说,在网站要抓东西时,常常是连续好几个request + response才能取得目目标数据,每一次每一次的request都要重新
设定相关header条件喔,例如不能偷懒只设定第一次request的request.Accept?=?"text/html,?application/xhtml+xml,?*/*";
这样子是抓不出数据来的...因为.net的默认似乎会将上一次的request设定的header的内容清空
(你看debug模式显示的变量状态,是显示没清空的,但是你如果只设定第一次request的header,事实上就是完全查不出数据喔)
除此之外,cookies也一定每次都要带入喔,因为连续的request的状态的连接,有时候是用ViewState,有时候是用cookies
ps.补充20151118:ViewState, ViewStateGenerator, EventValidation这三个参数会在传统的asp.net web form出现,如果出现的话,三个要一起改喔
ps.补充20160325:如果要改用其他浏览器(例如:chrome),再利用fiddler观察request以及response的内容与上面文章的差异之后,改成其他浏览器的header即可
PS. 补充20170720:如果是https且必需为TLS1.2较高等级的传输加密的话,需加上以下喔:(需要在电脑安装framework4.5才能正常运行喔,程序项目的版本设定为4.0 or 4.5都可以)
ServicePointManager.Expect100Continue = true;
ServicePointManager.SecurityProtocol = (SecurityProtocolType)3072;
ServicePointManager.DefaultConnectionLimit = 9999;
原文:大专栏 [robot]送出HttpWebRequest以及接收Response(get,post)
[robot]送出HttpWebRequest以及接收Response(get,post)
标签:cep section 行政区 robot 运行 deb 参考 proxy 电脑
原文地址:https://www.cnblogs.com/petewell/p/11526686.html