标签:
用程序模拟提交表单登录百度。
从实用意义上说,这种问题其实意义不大,并且也并不适合写成博客。百度网页在不断变化,而此博客内容却不会相应更新,无法保证内容的正确性。 从学习知识方面说,这种问题适合作为学习课题。这几天学了下python,感触良多。python确实比java灵活,语法也有许多漂亮的特性。比如多行字符串,raw字符串(无需转义的字符串),在java中都没有,好难受。 这种问题需要耐心,像破解密码一样,需要去尝试,去理解,去猜想,耗费时间和精力,性价比较低,有这功夫就不如多学点别的。还是应该多多学习,孔子曰:终日而思,不如须臾之所学也。意思是说:思考一天不如学习半晌。
chrome浏览器,ctrl+u打开源代码,f12打开开发者工具。重点监测network,设置成preserve模式,实验之前清空过去的cookie和缓存等信息,排除干扰。剩下的任务就是盯着network的同时,执行登录,登出,发帖,评论等动作,然后查看cookie的变化及返回结果。从数据中发挥想象力,大胆猜测,寻找规律。
三步走,登陆成功 0. 首先,访问百度的任何一个页面,都会获得一个百度id(BAIDUID),这是一个cookie; 0. 其次,访问https://passport.baidu.com/v2/api/?getapi&tpl=mn&apiver=v3&class=login页面获取token 0. 最后,对https://passport.baidu.com/v2/api/?login页面提交表单
v2表示version2。此页面get请求后面的参数不同返回的结果也不同。 一开始抓包时,在浏览器中看到的参数是这样的: - getapi: - tpl:mn - apiver:v3 - tt:1461752974956 登录时间 - class:login 行为是登录,而非其它行为 - gid:36400D4-3078-460D-ABD8-9DEFBA99604B - logintype:dialogLogin 登录类型,通过对话框登录 - callback:bdcbsgyljq1 回调 返回的结果是这样的:
bd__cbs__gyljq1({
"errInfo": {
"no": "0"
},
"data": {
"rememberedUserName": "1661686074@qq.com",
"codeString": "",
"token": "dda98eb93a3011ca4165a01b342a4622",
"cookie": "1",
"usernametype": "2",
"spLogin": "newuser",
"disable": "",
"loginrecord": {
‘email‘: [],
‘phone‘: []
}
}
})
这是一个不纯的json串,有些冗余。errInfo结果为0表示一切顺利,data是一个json串,里面唯一有用的信息是token。 这个get请求参数有许多是多余的,请求参数不同,返回的结果不同,可以直接在浏览器地址栏中测试。去掉callback参数之后,最后变成:getapi&apiver=v3参数,返回的json串就变得十分周正了。
{
"errInfo": {
"no": "0"
},
"data": {
"rememberedUserName": "",
"codeString": "",
"token": "dda98eb93a3011ca4165a01b342a4622",
"cookie": "1",
"usernametype": "",
"spLogin": "newuser",
"disable": "",
"loginrecord": {
‘email‘: [],
‘phone‘: []
}
}
}
再去掉apiver=v3(apiversion=3)属性,参数变为:getapi&class=login时,返回值就变成了键值对的方式:
var bdPass=bdPass||{};
bdPass.api=bdPass.api||{};
bdPass.api.params=bdPass.api.params||{};
bdPass.api.params.login_token=‘dda98eb93a3011ca4165a01b342a4622‘;
bdPass.api.params.login_tpl=‘mn‘;
document.write(‘<script type="text/javascript" charset="UTF-8" src="https://passport.baidu.com/js/pass_api_login.js?v=20131115"></script>‘);
所以,version2需要getapi和class=longin两个属性,version3需要getapi和apiver=v3两个属性,其中没有tpl=mn属性登录会失败,虽然可以返回json串,但是用于登录时,必须要有tpl=mn属性。 version2好像log4j得配置文件有没有。如何解析出来呢?可以用正则表达式,也可以用json解析version3的返回结果,还可以用属性解析version2的返回结果。正则表达式效果应该最好。 对于这种问题有两个原则(虽然矛盾,要寻找一个平衡): * 能删的参数尽量删掉 * 没有必要费精力试参数,直接全弄上,多了总不会错(但可能格式会难看一些)
在浏览器中看到的表单数据项很多,其中有许多是毫无用处的,没有它们照样登陆成功。经过删了测,测了删,发现只有如下表单有用:
{
"token": token,
"tpl": "mn",
"loginmerge": True,
"username": username,
"password": password
}
再删就要出错了,tpl=mn这个属性还是必不可少。在这一步里,用到了第二步获取的token。 登录百度,不需要设置浏览器头部伪装成浏览器,也不需要伪装referer等头部,直接get,get,post三步走就登陆成功了。cookie也不需要管,因为api自己处理了cookie。本次请求会自动带上上次获得的cookie。 登录百度首页之后,就可以访问百度的各个部分了(包括贴吧,知道等)。 如何验证有没有登陆成功呢?有两种方法: 0. 访问www.baidu.com,看看页面里面有没有自己的名字 1. 查看cookie里面有没有PTOKEN和STOKEN等关键cookie
百度返回值说明,no表示错误码(0为正常),errorcode也表示错误码,error表示错误信息,data表示数据:
"no": 40,
"err_code": 40,
"error": null,
"data"
import json
from pip._vendor.requests.sessions import Session
global username, password, token
username = ‘xxxxx‘
password = ‘xxxxx‘
s = Session()
# python2.x与python3.x差别非常大
# 过去使用urllib,urllib2,现在使用request包
def showCookie(cookies):
for i in cookies:
print(i)
i.domain = ‘*‘
print(‘*‘ * 20)
# 第一步,访问百度,获取cookie百度ID
s.get("http://www.baidu.com")
# 第二步,访问密码网页,获取token,此页面返回一个json串。后面的参数不同返回的结果不同,抓包之后,尝试着删除了许多没用的参数
resp = s.get("https://passport.baidu.com/v2/api/?getapi&tpl=mn&apiver=v3")
# json.dumps可以识别包含单引号的json串,json.loads却不能
t = json.loads(resp.text.replace(‘\‘‘, ‘\"‘))
token = t[‘data‘][‘token‘]
# 第三步,提交表单。经过测试,只有下面五个数据是必需的
data = {
"token": token,
"tpl": "mn",
"loginmerge": True,
"username": username,
"password": password
}
resp = s.post("https://passport.baidu.com/v2/api/?login", data)
用到第三方库fastjson进行json解析,apache httpclient进行网络请求。
static HttpClient client = HttpClients.createDefault();
static void login(String username, String password)
throws ClientProtocolException, IOException {
HttpGet homePage = new HttpGet("http://tieba.baidu.com");
client.execute(homePage);
HttpGet getToken = new HttpGet(
"https://passport.baidu.com/v2/api/?getapi&tpl=mn&apiver=v3&class=login");
HttpResponse resp = client.execute(getToken);
String json = EntityUtils.toString(resp.getEntity());
JSONObject obj = JSON.parseObject(json);
String token = obj.getJSONObject("data").getString("token");
HttpPost loginPost = new HttpPost(
"https://passport.baidu.com/v2/api/?login");
List<NameValuePair> list = new ArrayList<>();
list.add(new BasicNameValuePair("token", token));
list.add(new BasicNameValuePair("username", username));
list.add(new BasicNameValuePair("password", password));
list.add(new BasicNameValuePair("tpl", "mn"));
list.add(new BasicNameValuePair("loginmerge", "true"));
UrlEncodedFormEntity loginData = new UrlEncodedFormEntity(list);
loginPost.setEntity(loginData);
client.execute(loginPost);
}
终于登陆成功了,下一步就要发帖了。百度贴吧的数据类型十分重要,这个层次不分清楚就不好办。 * 贴吧forum,forum是论坛的意思,贴吧是一个大话题,是一个比较大的类型。它下面是许多分支,是大话题的细化。比如:金庸吧包括很多thread,武功最高的人是谁?扫地僧和独孤求败谁厉害?....每一个thread都会引发很多post(帖子),而每一个帖子又会引来人们的评论。 * 话题thread,提出话题就相当于准备盖个楼。 * 帖子post,每一个post都为thread添加了一层楼。 * 评论comment,每一个post下面可以有评论,这样针对性才强,盖楼是表达自己的观点犹如重武器,长枪大戟发前人之所未发,评论就像小匕首短兵刃一样更直接。
常见错误类型: * 没有tbs,返回230308,其中错误码是308,230是前缀 * 265是错误码,230是固定的前缀,这个大概是发帖太频繁,禁止发帖 * 40是验证码:发帖太频繁,需要验证码,如果能破解验证码,那自然是大大的好,下面就是返回的json串,str_reason表示“请点击验证码完成发贴”。百度设计的真是大大的好,因为这验证码是点击而不是输入,方便使用鼠标的用户。
{
"no": 40,
"err_code": 40,
"error": null,
"data": {
"autoMsg": "",
"fid": 1847502,
"fname": "\u5927\u5b66\u751f\u52b1\u5fd7",
"tid": 0,
"is_login": 1,
"content": "",
"vcode": {
"need_vcode": 1,
"str_reason": "\u8bf7\u70b9\u51fb\u9a8c\u8bc1\u7801\u5b8c\u6210\u53d1\u8d34",
"captcha_vcode_str": "captchaservice303662633978724c7655642b44707538683879667741516b6c2f4262726d36777477486b356749525449362b39495a426642746d6d744d5178716236766c4a575650742f6f4b4b57534d576656385534766158757678644979672b56742f56776237523631766d6e33754a567a654d62767a7238646a6632703447653477673568695544454a2b6146695a6651525763705657396b6c45614a334d6a75375664425452684977702b306d3866306a346350365755634f763835614f72426d4c5478596e41587749525773372b38746c66443949764156423478776a644d37476a746a674b4374396348574636644d617a457043714d796d48644641676c466a55716e5841587162646465624e4b6171356a733041502f456c7649636f5879326177514c67473164636638482f76487a55",
"captcha_code_type": 4,
"userstatevcode": 0
},
"mute_text": null
}
}
tbs从http://tieba.baidu.com/dc/common/tbs获得,只需要get一下,解析出json传中的tbs即可。tbs相当于贴吧通行证,你发的每一个评论,盖的每一层楼提交时都需要提交tbs,它们的tbs可以相同,这个tbs是你最近一次获得的tbs,服务器上维持着一份hashmap记录用户id和tbs值。
许多属性是没必要的,一次成功之后删繁就简,删了测测了删,发现header是没有用的,许多表单域也是可有可无的。 表单属性介绍如下: * kw:thread名称,也就是话题名称 * tid: threadId,也就是话题id,在地址栏中就可以看见tid。 * fid:一个thread好像fid都是一样的,大概跟tid差不多吧,反正据我观察,在一个话题下发了很多贴它是不变的。 打开一个话题主页,比如:http://tieba.baidu.com/p/4195311174,ctrl+u查看源代码,ctrl+f查找关键词fid,很容易发现整页上的fid都是一模一样的。 百度用什么做主键:用long做主键!百度也不用uuid,博客园也不用uuid,很明显可以从网页上看出来。
static void newPost() throws ClientProtocolException, IOException {
HttpPost post = new HttpPost(
"http://tieba.baidu.com/f/commit/post/add");
HashMap<String, String> paramMap = new HashMap<String, String>();
paramMap.put("kw", "大学生励志");
paramMap.put("fid", "1847502");
paramMap.put("tid", "4135933166");
paramMap.put("tbs", tbs);
paramMap.put("content", "天下大势为我所控");
List<NameValuePair> list = new ArrayList<>();
for (Map.Entry<String, String> i : paramMap.entrySet()) {
BasicNameValuePair pair = new BasicNameValuePair(i.getKey(),
i.getValue());
System.out.println(pair.getName() + ":" + pair.getValue());
list.add(pair);
}
RequestConfig config = RequestConfig.custom().setSocketTimeout(5000)
.setConnectTimeout(5000).build();
post.setConfig(config);
post.setEntity(new UrlEncodedFormEntity(list, "utf-8"));
HttpResponse resp = client.execute(post);
System.out.println(EntityUtils.toString(resp.getEntity()));
}
java确实很长,apache的程序往往设计的不甚合理(我最近才深刻感觉到的),还是看看python吧,简短有力更适合描述问题。java并不是总是冗长,它要想简单也很容易设计出简洁的API。java的复杂来源于两个方面: * 库设计的不合理,每步只做一件事有些冗长,并且不屑于设计语法糖 * 问题本身就很复杂,需要进行许多配置,更灵活,python虽短,可能有些事情没法办,因为封装地太严密了,留下的接口太少了。 * 库设计的不合理+问题本身就复杂。这里面有一个概率问题,复杂问题不常用,你却让人们用大量的时间去考虑它们,这就不如预先设计一种简单不完善的接口。宁可简单的缺憾,也不要复杂的完善。举一个例子,选中多行按下tab键之后是应该缩进还是应该替换,当然是缩进了,如果我要替换我是不会这么操作的,缩进带来的简捷性非常大。
resp = s.get("http://tieba.baidu.com/dc/common/tbs")
tbs = json.loads(resp.text)[‘tbs‘]
data = {
"kw": "大学生励志",
"fid": "1847502", # first post id
"tid": "4135933166", # 贴吧id
"tbs": tbs, # 很重要
"content": "现在下午两点四十二"
}
resp = s.post("http://tieba.baidu.com/f/commit/post/add",data)
print(resp.text)
print("over")
static void newReply() throws ClientProtocolException, IOException {
String data = "kw=大学生励志&fid=1847502&tid=4135933166"e_id=78685072053&rich_text=1&tbs="
+ tbs
+ "&content=五楼的也可以评论floornum不管用吗&lp_type=0&lp_sub_type=0&new_vcode=1&tag=11&repostid=78685072053&anonymous=0";
HttpPost post = new HttpPost(
"http://tieba.baidu.com/f/commit/post/add");
List<NameValuePair> list = new ArrayList<>();
for (String i : data.split("&")) {
int p = i.indexOf(‘=‘);
BasicNameValuePair pair = new BasicNameValuePair(i.substring(0, p),
i.substring(p + 1));
System.out.println(pair.getName() + ":" + pair.getValue());
list.add(pair);
}
post.setEntity(new UrlEncodedFormEntity(list, "utf-8"));
HttpResponse resp = client.execute(post);
System.out.println(EntityUtils.toString(resp.getEntity()));
}
把上面的代码串联起来,是这样子的:
String username = "xxxxxx"; ;
String password = "xxxxxx";
login(username, password);
tbs = getTbs();
newPost();
newReply();
关键是tbs只需要获取一次,然后作为全局变量存在就可以了,不需要反复获取。
tieba.baidu.com/f/index/feedlist?tagid=all&limit=2000000&offset=0 这个链接十分重要。有些链接需要复制到地址栏才能访问而不能直接跳转过去,因为服务器可能不允许跨域访问。 它的参数 tagid=like | all 表示请求的标签列表的类型,like表示只返回我喜欢的,all表示返回全部。 limit表示条数,offset表示偏移量。它还有许多其他参数,比如last_tid最后一条的时间(用于加载更多),&_表示 这个链接是怎么知道的,访问tieba.baidu.com加载更多就会向这个feedlist发出请求。 通过jsoup解析html就可以得到好多贴吧及它们的tid了,然后点进去就可以获得fid了,有了tid和fid就可以盖楼了。
import json
from pip._vendor.requests.sessions import Session
global username, password, token
username = ‘xxxxx‘
password = ‘xxxxx‘
s = Session()
# python2.x与python3.x差别非常大
# 过去使用urllib,urllib2,现在使用request包
def showCookie(cookies):
for i in cookies:
print(i)
i.domain = ‘*‘
print(‘*‘ * 20)
# 第一步,访问百度,获取cookie百度UID
s.get("http://www.baidu.com")
# 第二步,访问密码网页,获取token,此页面返回一个json串。后面的参数不同返回的结果不同,抓包之后,尝试着删除了许多没用的参数
resp = s.get("https://passport.baidu.com/v2/api/?getapi&tpl=mn&apiver=v3")
# 必须把单引号转化成双引号,否则无法通过json进行解析,python3.x开始走向严格和规范了
t = json.loads(resp.text.replace(‘\‘‘, ‘\"‘))
token = t[‘data‘][‘token‘]
# 第三步,提交表单。经过测试,只有下面五个数据是必需的
data = {
"token": token,
"tpl": "mn",
"loginmerge": True,
"username": username,
"password": password
}
resp = s.post("https://passport.baidu.com/v2/api/?login", data)
resp = s.get("http://tieba.baidu.com/dc/common/tbs")
tbs = json.loads(resp.text)[‘tbs‘]
data = {
"kw": "大学生励志",
"fid": "1847502", # first post id
"tid": "4135933166", # 贴吧id
"tbs": tbs, # 很重要
"content": "现在下午两点四十二"
}
resp = s.post("http://tieba.baidu.com/f/commit/post/add",data)
print(resp.text)
print("over")
public class Main {
static HttpClient client = HttpClients.createDefault();
static String tbs;
static String host = "Host: tieba.baidu.com";
static String useragent = "User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36";
static void login(String username, String password)
throws ClientProtocolException, IOException {
HttpGet homePage = new HttpGet("http://tieba.baidu.com");
client.execute(homePage);
HttpGet getToken = new HttpGet(
"https://passport.baidu.com/v2/api/?getapi&tpl=mn&apiver=v3&class=login");
HttpResponse resp = client.execute(getToken);
String json = EntityUtils.toString(resp.getEntity());
JSONObject obj = JSON.parseObject(json);
String token = obj.getJSONObject("data").getString("token");
HttpPost loginPost = new HttpPost(
"https://passport.baidu.com/v2/api/?login");
List<NameValuePair> list = new ArrayList<>();
list.add(new BasicNameValuePair("token", token));
list.add(new BasicNameValuePair("username", username));
list.add(new BasicNameValuePair("password", password));
list.add(new BasicNameValuePair("tpl", "mn"));
list.add(new BasicNameValuePair("loginmerge", "true"));
UrlEncodedFormEntity loginData = new UrlEncodedFormEntity(list);
loginPost.setEntity(loginData);
client.execute(loginPost);
}
static String getTbs() throws ClientProtocolException, IOException {
HttpGet get = new HttpGet("http://tieba.baidu.com/dc/common/tbs");
HttpResponse resp = client.execute(get);
String s = EntityUtils.toString(resp.getEntity());
JSONObject json = JSON.parseObject(s);
return json.getString("tbs");
}
static void newReply() throws ClientProtocolException, IOException {
String data = "kw=大学生励志&fid=1847502&tid=4135933166"e_id=78685072053&rich_text=1&tbs="
+ tbs
+ "&content=五楼的也可以评论floornum不管用吗&lp_type=0&lp_sub_type=0&new_vcode=1&tag=11&repostid=78685072053&anonymous=0";
HttpPost post = new HttpPost(
"http://tieba.baidu.com/f/commit/post/add");
List<NameValuePair> list = new ArrayList<>();
for (String i : data.split("&")) {
int p = i.indexOf(‘=‘);
BasicNameValuePair pair = new BasicNameValuePair(i.substring(0, p),
i.substring(p + 1));
System.out.println(pair.getName() + ":" + pair.getValue());
list.add(pair);
}
post.setEntity(new UrlEncodedFormEntity(list, "utf-8"));
HttpResponse resp = client.execute(post);
System.out.println(EntityUtils.toString(resp.getEntity()));
}
static void newPost() throws ClientProtocolException, IOException {
HttpPost post = new HttpPost(
"http://tieba.baidu.com/f/commit/post/add");
HashMap<String, String> paramMap = new HashMap<String, String>();
paramMap.put("kw", "大学生励志");
paramMap.put("fid", "1847502");
paramMap.put("tid", "4135933166");
paramMap.put("tbs", tbs);
paramMap.put("content", "魏印福");
List<NameValuePair> list = new ArrayList<>();
for (Map.Entry<String, String> i : paramMap.entrySet()) {
BasicNameValuePair pair = new BasicNameValuePair(i.getKey(),
i.getValue());
System.out.println(pair.getName() + ":" + pair.getValue());
list.add(pair);
}
RequestConfig config = RequestConfig.custom().setSocketTimeout(5000)
.setConnectTimeout(5000).build();
post.setConfig(config);
post.setEntity(new UrlEncodedFormEntity(list, "utf-8"));
HttpResponse resp = client.execute(post);
System.out.println(EntityUtils.toString(resp.getEntity()));
}
public static void main(String[] args)
throws ClientProtocolException, IOException {
String username = "xxxxxx";
String password = "xxxxxx";
login(username, password);
tbs = getTbs();
newPost();
newReply();
}
}
这代码确实是写得好
class HttpUtils {
/**
* map转换成entity
*
* <a href="http://twitter.com/param">@param</a> map
* 待处理的
* <a href="http://twitter.com/return">@return</a> 处理后的数据
*/
public static HttpEntity mapToEntity(HashMap<String, String> map)
throws Exception {
BasicNameValuePair pair = null;
List<BasicNameValuePair> params = new ArrayList<BasicNameValuePair>();
for (Map.Entry<String, String> m : map.entrySet()) {
pair = new BasicNameValuePair(m.getKey(), m.getValue());
params.add(pair);
}
HttpEntity entity = new UrlEncodedFormEntity(params, "UTF-8");
return entity;
}
/**
* 取文本之间的字符串
*
* <a href="http://twitter.com/param">@param</a> string
* 源字符串
* <a href="http://twitter.com/param">@param</a> start
* 开始字符串
* <a href="http://twitter.com/param">@param</a> end
* 结束字符串
* <a href="http://twitter.com/return">@return</a> 成功返回中间子串,失败返回null
*/
public static String mid(String string, String start, String end) {
int s = string.indexOf(start) + start.length();
int e = string.indexOf(end, s);
if (s > 0 && e > s)
return string.substring(s, e);
return null;
}
/**
*
* <a href="http://twitter.com/param">@param</a> regex
* 正则表达式
* <a href="http://twitter.com/param">@param</a> input
* 待匹配的字符串
* <a href="http://twitter.com/return">@return</a> 返回的是匹配的list集合(可能由于正则表达式的不同有多条记录)
*/
public static ArrayList<String> myRegex(String regex, String input) {
ArrayList<String> list = new ArrayList<String>();
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(input);
while (m.find()) {
list.add(m.group());
}
return list;
}
}
public class Baidu {
private CloseableHttpClient httpClient; // 模拟客户端
private String postFid; // 发帖用的fid
private String postName = ""; // 发帖指定的贴吧名
private CloseableHttpResponse response; // 存储请求返回的信息
private String html; // 存储返回的html页面
private boolean isQL = false; // 标记是否在抢二楼
public boolean isQL() {
return isQL;
}
public void setQL(boolean isQL) {
this.isQL = isQL;
if (isQL == true)
System.out.println("开始抢二楼了。");
else
System.out.println("关闭抢二楼了。");
}
/**
* 登录
**/
public boolean login(String username, String password) {
// 是否成功登陆的标记
boolean isLogin = false;
httpClient = HttpClients.createDefault();
try {
/** 1,BAIDUID **/
String baiduId = null;
HttpGet get_main = new HttpGet(
"http://tieba.baidu.com/dc/common/tbs/");
response = httpClient.execute(get_main);
get_main.abort();
HeaderIterator it = response.headerIterator("Set-Cookie");
while (it.hasNext())
baiduId = it.next().toString();
baiduId = HttpUtils.mid(baiduId, ":", ";");
System.out.println("1,BAIDUID:" + baiduId);
/** 2,token **/
HttpGet get_token = new HttpGet(
"https://passport.baidu.com/v2/api/?getapi&tpl=mn");
response = httpClient.execute(get_token);
String token = EntityUtils.toString(response.getEntity(), "utf-8");
get_token.abort();
token = HttpUtils.mid(token, "_token=‘", "‘");
System.out.println("2,TOKEN:" + token);
/** 3,Login **/
HashMap<String, String> map = new HashMap<String, String>();
map.put("username", username);
map.put("password", password);
map.put("token", token);
map.put("isPhone", "false");
map.put("quick_user", "0");
map.put("tt", System.currentTimeMillis() + "");
map.put("loginmerge", "true");
map.put("logintype", "dialogLogin");
map.put("splogin", "rate");
map.put("mem_pass", "on");
map.put("tpl", "mn");
map.put("apiver", "v3");
map.put("u", "http://www.baidu.com/");
map.put("safeflg", "0");
map.put("ppui_logintime", "43661");
map.put("charset", "utf-8");
// 封装
HttpEntity entity = HttpUtils.mapToEntity(map);
HttpPost http_login = new HttpPost(
"https://passport.baidu.com/v2/api/?login");
http_login.setEntity(entity);
response = httpClient.execute(http_login);
http_login.abort();
it = response.headerIterator();
while (it.hasNext()) {
// 这里是根据是否写入的BDUSS-cookie判断是否登录成功
if (it.next().toString().contains("BDUSS")) {
isLogin = true;
break;
}
}
System.out.println("3,登录状态" + isLogin);
return isLogin;
} catch (Exception e) {
throw new RuntimeException("未知错误");
}
}
/**
* 发布帖子
*
* <a href="http://twitter.com/throws">@throws</a> Exception
*/
public String writeTiebaItem(String tiebaName, String title, String content)
throws Exception {
String tbs = null;
HashMap<String, String> paramMap = new HashMap<String, String>();
String nowTime = System.currentTimeMillis() + "";
// 判断是否是第一次在这个吧发帖,如果不是就获取fid,反之不必,因为fid是固定不变的
if (!postName.equals(tiebaName)) {
postFid = getFid(tiebaName);
postName = tiebaName;
}
System.out.println("fid:" + postFid);
if (postFid == null) {
System.err.println("未知错误");
return "未知错误";
}
/** 拿到tbs */
tbs = getTbs();
System.out.println("tbs:" + tbs);
paramMap.put("ie", "utf-8");
paramMap.put("kw", postName);
paramMap.put("fid", postFid);
paramMap.put("tid", "0");
paramMap.put("vcode_md5", "");
paramMap.put("floor_num", "0");
paramMap.put("rich_text", "1");
paramMap.put("tbs", tbs);
paramMap.put("content", content);
paramMap.put("title", title);
paramMap.put("prefix", "");
paramMap.put("files", URLEncoder.encode("[]", "utf-8"));
paramMap.put("sign_id", "24179251");
paramMap.put("mouse_pwd",
"45,46,39,51,46,39,45,42,22,46,51,47,51,46,51,47,51,46,51,47,51,46,"
+ "51,47,51,46,51,47,22,46,41,39,38,47,22,46,44,41,41,51,40,41,39,"
+ nowTime + "0");
paramMap.put("mouse_pwd_t", nowTime);
paramMap.put("mouse_pwd_isclick", "0");
paramMap.put("__type__", "thread");
HttpEntity entity = HttpUtils.mapToEntity(paramMap);
HttpPost post = new HttpPost(
"http://tieba.baidu.com/f/commit/thread/add");
post.setEntity(entity);
response = httpClient.execute(post);
html = EntityUtils.toString(response.getEntity());
if (html.contains("\"no\":0,\"err_code\":0")) {
return "在" + tiebaName + "吧发帖成功";
} else {
return "发帖失败了,错误码信息:" + html;
}
}
private String getFid(String tiebaName) throws Exception {
HttpResponse response = null;
String fid = null;
ArrayList<String> urllist = getTieziUrl(tiebaName);
if (urllist.size() == 0) {
return null;
}
// 随便进个帖子 拿到 fid
HttpGet get = new HttpGet(urllist.get(1));
response = httpClient.execute(get);
html = EntityUtils.toString(response.getEntity());
fid = HttpUtils.myRegex("fid(=|:‘)[0-9].+?(&|‘,)", html).get(0);
if (fid.contains("=")) {
fid = HttpUtils.mid(fid, "=", "&");
}
if (fid.contains(":")) {
fid = HttpUtils.mid(fid, ":‘", "‘,");
}
return fid;
}
private ArrayList<String> get0Answer(String tiebaName) throws Exception {
HttpResponse response = null;
ArrayList<String> urlList = new ArrayList<String>();
ArrayList<String> topTidList;
String tiebaUrl = "http://tieba.baidu.com/f?ie=utf-8&kw="
+ URLEncoder.encode(tiebaName, "UTF-8");
HttpGet get = new HttpGet(tiebaUrl);
response = httpClient.execute(get);
html = EntityUtils.toString(response.getEntity());
if (html.contains("抱歉,根据相关法律法规和政策,本吧暂不开放")) {
return urlList;
}
Document doc = Jsoup.parse(html);
// 得到首页除置顶帖之外的所有帖子
Elements els = doc.select("li[class= j_thread_list clearfix]");
for (Element e : els) {
String str = e.text().toString();
// System.out.println(str);
// 如过开头是0代表0个回复。 str的内容是: 0 测试 陌生人左右丶 00:36
if (str.startsWith("0")) {
Elements els1 = e.getElementsByTag("a");
for (Element e1 : els1) {
String url = e1.attr("href");
topTidList = HttpUtils.myRegex("/p/[0-9]{2,12}", url);
for (int i = 0; i < topTidList.size(); i++) {
url = topTidList.get(i);
urlList.add("http://tieba.baidu.com" + url);
}
}
}
}
return urlList;
}
/**
* <a href="http://twitter.com/param">@param</a> tiebaName
* 要获取url的贴吧名称
* <a href="http://twitter.com/return">@return</a> 返回指定贴吧的首页帖子url集合
* <a href="http://twitter.com/throws">@throws</a> IOException
*/
private ArrayList<String> getTieziUrl(String tiebaName) throws Exception {
HttpResponse response = null;
ArrayList<String> urlList = new ArrayList<String>();
ArrayList<String> topTidList;
try {
String tiebaUrl = "http://tieba.baidu.com/f?ie=utf-8&kw="
+ URLEncoder.encode(tiebaName, "UTF-8");
HttpGet get = new HttpGet(tiebaUrl);
response = httpClient.execute(get);
html = EntityUtils.toString(response.getEntity());
if (html.contains("抱歉,根据相关法律法规和政策,本吧暂不开放")) {
return urlList;
}
Document doc = Jsoup.parse(html);
Elements els = doc.select("li[class= j_thread_list clearfix]");
for (Element e : els) {
Elements els1 = e.getElementsByTag("a");
for (Element e1 : els1) {
// 首先拿到指定贴吧的 首页的和所有帖子链接" "/p/2777392166"然后拼接成完整的url
String url = e1.attr("href");
topTidList = HttpUtils.myRegex("/p/[0-9]{2,12}", url);
for (int i = 0; i < topTidList.size(); i++) {
url = topTidList.get(i);
urlList.add("http://tieba.baidu.com" + url);
}
}
}
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return urlList;
}
/** 拿到tbs (下面是一个获取tbs的api) */
private String getTbs() throws Exception {
HttpGet get = new HttpGet("http://tieba.baidu.com/dc/common/tbs");
response = httpClient.execute(get);
html = EntityUtils.toString(response.getEntity());
return HttpUtils.mid(html, ":\"", "\",");
}
/** 回复帖 */
public String replyPost(String tid, String content, String tiebaName)
throws Exception {
/** 暂时还没想到办法来获取floor_num **/
String floor_num = "1";
if (!postName.equals(tiebaName)) {
postFid = getFid(tiebaName);
postName = tiebaName;
}
System.out.println("fid:" + postFid);
if (postFid == null) {
System.err.println("未知错误");
return "未知错误";
}
String tbs = getTbs();
String nowTime = System.currentTimeMillis() + "";
// 构造map集合形式的回帖表单
HashMap<String, String> paramMap = new HashMap<String, String>();
paramMap.put("ie", "utf-8");
paramMap.put("kw", postName);
paramMap.put("fid", postFid);
paramMap.put("tid", tid);
paramMap.put("vcode_md5", "");
paramMap.put("floor_num", floor_num);
paramMap.put("rich_text", "1");
paramMap.put("tbs", tbs);
paramMap.put("content", content);
paramMap.put("files", "[]");
paramMap.put("mouse_pwd",
"45,46,39,51,46,39,45,42,22,46,51,47,51,46,51,47,51,46,51,47,51,46,"
+ "51,47,51,46,51,47,22,46,41,39,38,47,22,46,44,41,41,51,40,41,39,"
+ nowTime + "0");
paramMap.put("mouse_pwd_t", nowTime);
paramMap.put("mouse_pwd_isclick", "0");
paramMap.put("__type__", "reply");
HttpEntity entity = HttpUtils.mapToEntity(paramMap);
HttpPost post = new HttpPost(
"http://tieba.baidu.com/f/commit/post/add");
// 设置回帖延迟,不然会被百度判定发帖过快
RequestConfig config = RequestConfig.custom().setSocketTimeout(5000)
.setConnectTimeout(5000).build();
post.setConfig(config);
post.setEntity(entity);
response = httpClient.execute(post);
html = EntityUtils.toString(response.getEntity());
System.out.println(html);
if (html.contains("\"no\":0,\"err_code\":0")) {
return "在" + tiebaName + "吧成功抢到一个二楼";
} else {
return "回帖失败了,错误码信息:" + html;
}
}
/** 抢二楼 **/
public void TakeTheSecondFloor(final String tiebaName,
final String contents[], final int time) {
final int len = contents.length;
new Thread(new Runnable() {
<a href="http://twitter.com/Override">@Override</a>
public void run() {
while (isQL) {
try {
Random random = new Random();
int index = random.nextInt(len);
String tid;
ArrayList<String> linksList = get0Answer(tiebaName);
for (int i = 0; i < linksList.size()
&& linksList.size() != 0; i++) {
tid = linksList.get(i).substring(25);
String message = replyPost(tid, contents[index],
tiebaName);
System.out.println(message);
}
Thread.sleep(time);
} catch (Exception e) {
e.printStackTrace();
}
}
}
}).start();
}
}
标签:
原文地址:http://www.cnblogs.com/weidiao/p/5440363.html