今天完成了图书的检索功能。相对来说,还是有点复杂,因为图书检索结果页面的Html并不是那么规范,解析时需要很大的耐心。
首先需要根据查询条件获取结果的HTML,查询条件可以有很多种,这里为了实用、方便,我特意限制了查询条件为:keyword、东校区、可借出
获取结果HTML的方法如下:
/**
* 根据关键字检索图书
*
* 检索可以是没有登录的情况,也可以是登录后的情况。 目前是声明了一个新的HTTPclient,即不需要登录,
* 如果想设置为在登陆后才可以检索,则需要使用全局的HTTPclient,而不能再声明一个
*
* @param keyword
* 关键字
* @return 检索结果的html
*/
public static String serchBook(String keyword) {
HttpGet httpGet = null;
String searchResultHtml = null;
HttpClient httpclient = new DefaultHttpClient();
HttpResponse response;
/**
* 字段顺序很重要
*
* 设置查询条件为:关键字、东校区、可借出
*/
List<NameValuePair> params = new ArrayList<NameValuePair>();
params.add(new BasicNameValuePair("searchtype", "X"));
params.add(new BasicNameValuePair("searcharg", keyword));// 查询关键字
params.add(new BasicNameValuePair("searchscope", "1"));// 1代表东区
params.add(new BasicNameValuePair("sortdropdown", "-"));
params.add(new BasicNameValuePair("SORT", "DZ"));// 设置排序方式为按日期倒排
params.add(new BasicNameValuePair("extended", "0"));
params.add(new BasicNameValuePair("SUBMIT", "检索"));// 查询按钮
params.add(new BasicNameValuePair("availlim", "1"));// 设置查询条件---可借出
params.add(new BasicNameValuePair("searchlimits", ""));
params.add(new BasicNameValuePair("searchorigarg", ""));// 设置上次查询的关键字及排序方式
// 对参数编码
String param = URLEncodedUtils.format(params, "UTF-8");
System.out.println(param);
try {
// 将URL与参数拼接
// http://innopac.lib.xjtu.edu.cn/search~S1*chx/
String test_url = "http://innopac.lib.xjtu.edu.cn/search~S1*chx/";
httpGet = new HttpGet(test_url + "?" + param);
httpGet.setHeader("Host", "innopac.lib.xjtu.edu.cn");
httpGet.setHeader("Referer", test_url + "?" + param);
response = httpclient.execute(httpGet);
int code = response.getStatusLine().getStatusCode();
System.out
.println("---------------searchbook------------------------");
System.out.println(response.getStatusLine());
if (code == 200) {
if (response != null) {
searchResultHtml = EntityUtils.toString(
response.getEntity(), HTTP.UTF_8);
return searchResultHtml;
}
}
} catch (ClientProtocolException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
httpGet.abort();
}
return "";
}首先来看看,页面上的显示状况:
根据所需解析的信息,我们需要两个封装类。
一个类封装书目信息,另一个封装馆藏信息。这两个类如下:
1.类BookInfo
package com.ali.login.bean;
import java.util.List;
/**
* 搜索结果中书目的具体信息
*
* @author shuyan
*
*/
public class BookInfo {
private String imgLink;// 图片链接
private String briefTitle;// Java JDK 7实例宝典 Java JDK 7 shi li bao dian / 韩雪,
// 郭天娇编著
private String year;// 2014 文字印刷资料
private List<BookAddress> bookAddresses;// 书目馆藏信息
private String reserveLink;// 预约链接
public BookInfo() {
super();
}
public BookInfo(String imgLink, String briefTitle, String year,
List<BookAddress> bookAddresses, String reserveLink) {
super();
this.imgLink = imgLink;
this.briefTitle = briefTitle;
this.year = year;
this.bookAddresses = bookAddresses;
this.reserveLink = reserveLink;
}
public String getImgLink() {
return imgLink;
}
public void setImgLink(String imgLink) {
this.imgLink = imgLink;
}
public String getBriefTitle() {
return briefTitle;
}
public void setBriefTitle(String briefTitle) {
this.briefTitle = briefTitle;
}
public String getYear() {
return year;
}
public void setYear(String year) {
this.year = year;
}
public List<BookAddress> getBookAddresses() {
return bookAddresses;
}
public void setBookAddresses(List<BookAddress> bookAddresses) {
this.bookAddresses = bookAddresses;
}
public String getReserveLink() {
return reserveLink;
}
public void setReserveLink(String reserveLink) {
this.reserveLink = reserveLink;
}
@Override
public String toString() {
return "BookInfo [imgLink=" + imgLink + ", briefTitle=" + briefTitle
+ ", year=" + year + ", bookAddresses=" + bookAddresses
+ ", reserveLink=" + reserveLink + "]";
}
}
2.BookAdress
package com.ali.login.bean;
/**
* 书目的馆藏信息
*
* @author shuyan
*
*/
public class BookAddress {
private String holdLand;// 馆藏地
private String callNumber;// 索书号
private String status;// 状态
public BookAddress() {
super();
}
public BookAddress(String holdLand, String callNumber, String status) {
this.holdLand = holdLand;
this.callNumber = callNumber;
this.status = status;
}
public String getHoldLand() {
return holdLand;
}
public void setHoldLand(String holdLand) {
this.holdLand = holdLand;
}
public String getCallNumber() {
return callNumber;
}
public void setCallNumber(String callNumber) {
this.callNumber = callNumber;
}
public String getStatus() {
return status;
}
public void setStatus(String status) {
this.status = status;
}
@Override
public String toString() {
return "BookAddress [holdLand=" + holdLand + ", callNumber="
+ callNumber + ", status=" + status + "]";
}
}
这里开始变得有点麻烦,因为这里的标签很是不规范。
代码如下:
/**
* 处理查询结果的HTML
*
* @param searchResultHtml
* html字符串
*
* @return 书目信息集合
*/
public static List<BookInfo> getSearchResult(String searchResultHtml) {
List<BookInfo> bookInfos = new ArrayList<BookInfo>();
Document document = Jsoup.parse(searchResultHtml);
Elements items = document.getElementsByClass("briefCitRow");// 书目集合
int i = 1;
for (Element item : items) {
BookInfo bookInfo = null;
List<BookAddress> bookAddresses = new ArrayList<>();
Element ele_par = item.select("a[href]").get(0);
// http://202.117.24.227/bibimage/zycover.php?isbn=9787121217074
String imgLink = ele_par.child(0).attr("src");// 图片链接
Element ele_reserve = item.getElementsByClass("briefcitRequest")
.get(0);// 预约图书的链接
Element ele_ahref = ele_reserve.select("a[href]").get(0);
String reserveLink = ele_ahref.attr("href");// 预约链接
// 需要添加host在前面
// /availlim/search~S1*chx?/XJava&searchscope=1&SORT=DZ/XJava&searchscope=1&SORT=DZ&extended=0&SUBKEY=Java/1%2C2973%2C2973%2CC/requestbrowse~b3838346&FF=XJava&searchscope=1&SORT=DZ&1%2C1%2C
// 注意有两个 class = briefcitDetail
Elements ele_briefcitDetails = item
.getElementsByClass("briefcitDetail");
// 先处理第一个
String briefTitle = ele_briefcitDetails.get(0)
.getElementsByClass("briefcitTitle").get(0).text();// 书目简要描述
// 处理第二个
String year = ele_briefcitDetails.get(1).text();// 年份
Elements ele_addresses = item.getElementsByClass("briefcitItems")
.get(0).getElementsByClass("bibItems").get(0)
.getElementsByClass("bibItemsEntry");// 书目的馆藏信息
/**
* 预约也在这里面处理
*/
for (Element ele_add : ele_addresses) {
BookAddress address = null;
// 这里有3个td标签
Elements ele_tds = ele_add.getElementsByTag("td");
String bookStore = ele_tds.get(0).text();
String callNumber = ele_tds.get(1).text();
String status = ele_tds.get(2).text();
address = new BookAddress(bookStore, callNumber, status);
bookAddresses.add(address);
}
bookInfo = new BookInfo(imgLink, briefTitle, year, bookAddresses,
reserveLink);
bookInfos.add(bookInfo);
}
return bookInfos;
}这样便得到了封装好的书目信息集合,测试如下:
public static void main(String[] args) {
String searchResultHtml = LibraryUtil.serchBook("Java");
List<BookInfo> bookInfos = getSearchResult(searchResultHtml);
int i = 0;
for (BookInfo bookInfo : bookInfos) {
if(i<5)
{
System.out.println(bookInfo.toString());
}
i++;
}
}
如此便实现了图书的检索功能...
当然这里还有许多需要考虑的地方,如:检索结果总数,每页需要显示多少条记录,筛选无意义的结果,预约功能等...
原文地址:http://blog.csdn.net/leokelly001/article/details/42042355