标签:
发起http请求后,检查返回的数据是否含有特征码,几百w的数据跑了一天后也没跑完,尝试了下ruby多线程,发现并不能提高运行速度,果断换JAVA来写,ruby代码贴下:
#coding:gbk
require ‘rubygems‘
require "net/http"
require "uri"
require ‘zlib‘
threads = []
urls = IO.readlines "test.txt"
pFile = File.open("errortest.txt","a")
urls.each do|page|
#url="http://www.baidu.com"
begin
threads << Thread.new(page) do|url|
uri = URI.parse("#{url}")
res = Net::HTTP.get_response(uri)
data=res.body
mes=res.code
if res["content-encoding"]=="gzip"
body_io=StringIO.new(data)
data=Zlib::GzipReader.new(body_io).read
end
#puts data
data=data.force_encoding("ASCII-8BIT")
if mes=="200" or mes=="404" or mes=="500" or mes=="502"
unless data.include?("_dctc._account") and data.include?("UA-") and data.include?("count/load.min.js")
#puts url
pFile.puts mes+","+url
end
end
end
threads.each { |t|t.join }
rescue
puts url
end
end
pFile.close
标签:
原文地址:http://www.cnblogs.com/hudiefeifei/p/4316444.html