使用PhantomJS实现网页截图服务

时间：2016-01-12 01:13:41 阅读：208 评论：0 收藏：0 [点我收藏+]

标签：

使用PhantomJS实现网页截图服务

2015-12-12来源：Java教程人气：99

这是上半年遇到的一个小需求，想实现网页的抓取，并保存为图片。研究了不少工具，效果都不理想，不是显示太差了（Canvas、Html2Image、Cobra），就是性能不怎么样（如SWT的Brower）。后发现无界面浏览器可以满足这个条件，大致研究了一下PhantomJS与CutyCapt，两者都是Webkit内核，其中PhantomJS使用上更方便一些，尤其在Windows平台上，如果在linux下，从2.0版本后需要自己去机器上编译了（大概要编译3个小时，不得不说，g++就是个渣渣，同样的项目，在vc下编译快得，不谈了，毕竟是免费开源的编译器）。下面介绍PhantomJS结合java代码实现的网页截图技术：

一、环境准备

1、PhantomJS脚本的目录：D:/xxx/phantomjs-2.0.0-windows/bin/phantomjs

2、截图脚本：D:/xxx/phantomjs-2.0.0-windows/bin/rasterize.js

截图的脚本在官网上有提供，但是我这里需要说明一下它的高宽度设计原理：

page.viewportSize = { width: 600, height: 600 };

这个是默认的高度，也就是600X600，我建议大家把height设置小一点，我这边设置的是width:800，height:200。因为实际上，在不同时设置高度与亮度的情况下，如果真实的网页的高度大于设置值时，图片会自动扩充高宽度的，直到整个页面显示完（当你想截取小的图片时，可能由于默认设置的太大，会使图片有很大一块空的）。如果同时设置了高宽度，下面的代码会被执行，就会对网页的部分进行截取了：

page.clipRect = { top: 0, left: 0, width: pageWidth, height: pageHeight };

3、先用命令行测试一下：

D:/xxx/phantomjs-2.0.0-windows/bin/phantomjs D:/xxx/phantomjs-2.0.0-windows/bin/rasterize.js http://www.QQ.com D:/test.png

如果配置好了，应该可以看到生成的图片了。当然还可以配置高宽度的参数，在上面的命令后加上：" 1000px"或" 1000px*400px"，都是可以的。

二、服务器代码

作为一个网页截图服务，这部分代码片段应当被布署在服务器上，当然不必全照搬啦，根据自己的需求来用就好了：

  1 package lekkoli.test;
  2 
  3 import java.io.BufferedInputStream;
  4 import java.io.BufferedReader;
  5 import java.io.ByteArrayOutputStream;
  6 import java.io.File;
  7 import java.io.FileInputStream;
  8 import java.io.IOException;
  9 import org.apache.log4j.Logger; 
 10 
 11 /**
 12  * 网页转图片处理类，使用外部CMD
 13  * @author lekkoli
 14  */
 15 public class PhantomTools {
 16 
 17     PRivate static final Logger _logger = Logger.getLogger(PhantomTools.class);
 18 
 19     // private static final String _tempPath = "/data/temp/phantom_";
 20     // private static final String _shellCommand = "/usr/local/xxx/phantomjs /usr/local/xxx/rasterize.js ";  Linux下的命令
 21     private static final String _tempPath = "D:/data/temp/phantom_";
 22     private static final String _shellCommand = "D:/xxx/phantomjs-2.0.0-windows/bin/phantomjs D:/xxx/phantomjs-2.0.0-windows/bin/rasterize.js ";    
 23 
 24     private String _file;
 25     private String _size;
 26 
 27     /**
 28      * 构造截图类
 29      * @parm hash 用于临时文件的目录唯一化
 30      */
 31     public PhantomTools(int hash) {
 32         _file = _tempPath + hash + ".png";
 33     }
 34 
 35     /**
 36      * 构造截图类
 37      * @parm hash 用于临时文件的目录唯一化
 38      * @param size 图片的大小，如800px*600px（此时高度会裁切），或800px（此时 高度最少=宽度*9/16，高度不裁切）
 39      */
 40     public PhantomTools(int hash, String size) {
 41         this(hash);
 42         if (size != null)
 43             _size = " " + size;
 44     }
 45 
 46     /**
 47      * 将目标网页转为图片字节流
 48      * @param url 目标网页地址
 49      * @return 字节流
 50      */
 51     public byte[] getByteImg(String url) throws IOException {
 52         BufferedInputStream in = null;
 53         ByteArrayOutputStream out = null;
 54         File file = null;
 55         byte[] ret = null;
 56         try {
 57             if (exeCmd(_shellCommand + url + " " + _file + (_size != null ? _size : ""))) {
 58                 file = new File(_file);
 59                 if (file.exists()) {
 60                     out = new ByteArrayOutputStream();
 61                     byte[] b = new byte[5120];
 62                     in = new BufferedInputStream(new FileInputStream(file));
 63                     int n;
 64                     while ((n = in.read(b, 0, 5120)) != -1) {
 65                         out.write(b, 0, n);
 66                     }
 67                     file.delete();
 68                     ret = out.toByteArray();
 69                 }
 70             } else {
 71                 ret = new byte[] {};
 72             }
 73         } finally {
 74             try {
 75                 if (out != null) {
 76                     out.close();
 77                 }
 78             } catch (IOException e) {
 79                 _logger.error(e);
 80             }
 81             try {
 82                 if (in != null) {
 83                     in.close();
 84                 }
 85             } catch (IOException e) {
 86                 _logger.error(e);
 87             }
 88             if (file != null && file.exists()) {
 89                 file.delete();
 90             }
 91         }
 92         return ret;
 93     }
 94 
 95     /**
 96      * 执行CMD命令
 97      */
 98     private static boolean exeCmd(String commandStr) {
 99         BufferedReader br = null;
100         try {
101             Process p = Runtime.getRuntime().exec(commandStr);
102             if (p.waitFor() != 0 && p.exitValue() == 1) {
103                 return false;
104             }
105         } catch (Exception e) {
106             _logger.error(e);
107         } finally {
108             if (br != null) {
109                 try {
110                     br.close();
111                 } catch (Exception e) {
112                     _logger.error(e);
113                 }
114             }
115         }
116         return true;
117     }
118 }

使用上面的PhantomTools类，可以很方便地调用getByteImg方法来生成并获取图片内容。　　

附上我的截图配置脚本：rasterize.js，至于PhantomJS，大家就自行去官网下载吧。

转载请注明原址：http://www.cnblogs.com/lekko/p/4796062.html 　

使用PhantomJS实现网页截图服务

标签：

原文地址：http://www.cnblogs.com/firstdream/p/5123007.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行