标签:
通过 WebClient 的内置浏览器,可以执行页面抓取工作,有时可能需要设置代理,
WebClient webClient = new WebClient(BrowserVersion.x);
webClient.setProxyConfig(ProxyConfig pc);
在单线程情况下,使用这样创建的webClient不会有问题:客户端到代理服务器的连接能够很有次序的建立、关闭。
考虑这样的情况:多个线程并发地访问 WebClient,可能就会报下面的异常:
[Thread-7] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute - Closing connection [HttpRoute[{}->http://192.168.5.29:3128->http://58.223.139.151:8080]][null]
........
2012-04-19 10:31:32,926 [Thread-8] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute - Total connections kept alive: 0
2012-04-19 10:31:32,926 [Thread-8] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute - Total issued connections: 0
2012-04-19 10:31:32,926 [Thread-8] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute - Total allocated connection: 0 out of 20
2012-04-19 10:31:32,926 [Thread-8] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute - No free connections [HttpRoute[{}->http://192.168.5.29:3128->http://58.223.139.151:8080]][null]
2012-04-19 10:31:32,926 [Thread-8] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute - Available capacity: 2 out of 2 [HttpRoute[{}->http://192.168.5.29:3128->http://58.223.139.151:8080]][null]
2012-04-19 10:31:32,926 [Thread-8] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute - Creating new connection [HttpRoute[{}->http://192.168.5.29:3128->http://58.223.139.151:8080]]
2012-04-19 10:31:32,926 [Thread-6] DEBUG org.apache.http.impl.client.DefaultHttpClient - socket closed
java.net.SocketException: socket closed
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(Unknown Source)
at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:130)
at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:127)
at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:233)
at org.apache.http.impl.conn.LoggingSessionInputBuffer.readLine(LoggingSessionInputBuffer.java:100)
at org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:98)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:210)
at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:271)
at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:227)
at org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:209)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:292)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:126)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:483)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:641)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:597)
at com.gargoylesoftware.htmlunit.HttpWebConnection.getResponse(HttpWebConnection.java:134)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1406)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1460)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponse(WebClient.java:1325)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:304)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:370)
异常信息显示:thread-6使用webclient时,检测到 socket closed异常,查看上面的异常,存在 socket (http://192.168.5.29:3128->http://58.223.139.151:8080)被 thread-7 关闭的情况,thread-8 创建了新 socket,可能之后某个时间点,socket又被关闭,导致 thread-6 报socket closed异常。
通过使用ThreadLocal为不同的线程创建各自独立的 WebClient 对象,就能避免上述问题:
WebClient在多线程、使用代理情况下 socket closed 问题的一个解决办法[htmlunit]
标签:
原文地址:http://www.cnblogs.com/rr163/p/4205935.html