IDF实验室：初探乾坤--简单编程-字符统计

时间：2015-06-02 09:20:44 阅读：135 评论：0 收藏：0 [点我收藏+]

标签：idf实验室爬虫 python

地址：

ctf.idf.cn/index.php?g=game&m=article&a=index&id=37

题目：

这里这里 → http://ctf.idf.cn/game/pro/37

Writeup:

（第二份代码引用他人新浪博客：blog.sina.com.cn/s/blog_e53f38130102vjlz.html ）

很明显，编写代码分别统计woldy五个字母的数量，并提交。但是注意需要在2秒内提交，所以需要写爬虫，With Python ！！！

第一次：自己用Python3.4写的：( 源代码如下 )

我连Cookies ，和 Headers 都全部伪装了。。。

但是他总是返回给我说，“你数学是小学体育老师教的吗？”

我就无语了！！！！

import urllib.request
import urllib.parse
import re

url = "http://ctf.idf.cn/game/pro/37/index.php"
req = urllib.request.Request(url)
response = urllib.request.urlopen(url)
html = response.read().decode(‘utf-8‘)

A = html.find(‘<hr />‘) + 6
B = html.find(‘<hr />‘, A)
f = html[A:B]

w = f.count(‘w‘)
o = f.count(‘o‘)
l = f.count(‘l‘)
d = f.count(‘d‘)
y = f.count(‘y‘)
ans1 = ‘%d‘%w+‘%d‘%o+‘%d‘%l+‘%d‘%d+‘%d‘%y

length = str(len(ans1))

head = {}
head[‘Host‘] = "ctf.idf.cn"
head[‘User-Agent‘] = "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0"
head[‘Accept-Language‘] = "zh-CN,en-US;q=0.7,en;q=0.3"
head[‘Accept-Encoding‘] = "gzip, deflate"
head[‘Referer‘] = "http://ctf.idf.cn/game/pro/37/"
head[‘Cookie‘] = "Hm_lvt_184d7dcce9f76d1f5ab23d66e447d9a8=1432209840,1432307014,1433080747,1433157423; PHPSESSID=23ro01bddb6a7604ovumie8nr7; Hm_lpvt_184d7dcce9f76d1f5ab23d66e447d9a8=1433157799"
head[‘Connection‘] = "keep-alive"
head[‘Cache-Control‘] ="max-age=0"

xdata[‘Content-Type‘] = "application/x-www-form-urlencoded"
xdata[‘Content-Length‘] = length
xdata = {‘anwser‘:ans1}
xdata = urllib.parse.urlencode(xdata).encode(‘utf-8‘)

req = urllib.request.Request(url, data = xdata)
response = urllib.request.urlopen(req)
html = response.read().decode(‘utf-8‘)

A = html.find(‘<body>‘) + 6
B = html.find(‘<hr />‘, A )
f = html[A:B]

print(f)

第二次：用 Python2.7 编写：（原代码如下）

首先安装BeautifulSoup

代码参考一个新浪博客：blog.sina.com.cn/s/blog_e53f38130102vjlz.html

#! /usr/python
#coding:utf-8
import sys, urllib,urllib2
import requests
#from BeautifulSoup import BeautifulSoup
from bs4 import BeautifulSoup
url = "http://ctf.idf.cn/game/pro/37/" #网页地址
s = requests.session()
content = s.get("http://ctf.idf.cn/game/pro/37/").text #获取页面内容
test=content.split(‘<hr />‘) #把字符串用分成3部分
print test[1]
w=0
o=0
l=0
d=0
y=0
for i, ch in enumerate(test[1]): #遍历分割后的第二部分字符串
   if ch=="w":
       w=w+1
   elif ch=="o":
       o=o+1
   elif ch=="l":
       l=l+1
   elif ch=="d":
       d=d+1
   elif ch=="y":
       y=y+1
tem=‘%d‘ %w +‘%d‘ %o +‘%d‘ %l +‘%d‘ %d +‘%d‘ %y #把数字拼成字符串
print tem
values = {‘anwser‘:tem} #填写表单
result = s.post(‘http://ctf.idf.cn/game/pro/37/‘, data=values) #提交表单
print(result.text)

第二次可以跑出结果。

在这里，我想请教一下，我自己写的东西哪里出了问题呢？？？

本人python 新手一只，求高手解答，十分感激。

IDF实验室：初探乾坤--简单编程-字符统计

标签：idf实验室爬虫 python

原文地址：http://blog.csdn.net/shinukami/article/details/46319363

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行