码迷,mamicode.com
首页 > 其他好文 > 详细

使用Servlet获取用户日志

时间:2015-02-16 10:14:09      阅读:162      评论:0      收藏:0      [点我收藏+]

标签:clickstream   用户行为日志   

前段时间,实验室需要开发一个用户日志模块,来对实验室的Web项目监控,获取用户的行为日志。个人首先觉得应该主要使用js来实现相关功能,无奈js水平着实太低,最终采用了servlet的方式来实现。

自己先从github上查询到了一个相关项目,clickstream,我先来介绍一下该项目是怎么实现的。

Clickstream的实现

它首先使用了一个Listener来监听ServletContext和HttpSession,代码如下

public class ClickstreamListener implements ServletContextListener, HttpSessionListener {

	private static final Log log = LogFactory.getLog(ClickstreamListener.class);

	/** The servlet context attribute key. */
	public static final String CLICKSTREAMS_ATTRIBUTE_KEY = "clickstreams";

	/**
	 * The click stream (individual) attribute key: this is
	 * the one inserted into the HttpSession.
	 */
	public static final String SESSION_ATTRIBUTE_KEY = "clickstream";

	/** The current clickstreams, keyed by session ID. */
	private Map<String, Clickstream> clickstreams = new ConcurrentHashMap<String, Clickstream>();

	public ClickstreamListener() {
		log.debug("ClickstreamLogger constructed");
	}

	/**
	 * Notification that the ServletContext has been initialized.
	 *
	 * @param sce The context event
	 */
	public void contextInitialized(ServletContextEvent sce) {
		log.debug("ServletContext initialised");
		sce.getServletContext().setAttribute(CLICKSTREAMS_ATTRIBUTE_KEY, clickstreams);
	}

	/**
	 * Notification that the ServletContext has been destroyed.
	 *
	 * @param sce The context event
	 */
	public void contextDestroyed(ServletContextEvent sce) {
		log.debug("ServletContext destroyed");
		// help gc, but should be already clear except when exception was thrown during sessionDestroyed
		clickstreams.clear();
	}

	/**
	 * Notification that a Session has been created.
	 *
	 * @param hse The session event
	 */
	public void sessionCreated(HttpSessionEvent hse) {
		final HttpSession session = hse.getSession();
		if (log.isDebugEnabled()) {
			log.debug("Session " + session.getId() + " was created, adding a new clickstream.");
		}

		Object attrValue = session.getAttribute(SESSION_ATTRIBUTE_KEY);
		if (attrValue != null) {
			log.warn("Session " + session.getId() + " already has an attribute named " +
					SESSION_ATTRIBUTE_KEY + ": " + attrValue);
		}

		final Clickstream clickstream = new Clickstream();
		session.setAttribute(SESSION_ATTRIBUTE_KEY, clickstream);
		clickstreams.put(session.getId(), clickstream);
	}

	/**
	 * Notification that a session has been destroyed.
	 *
	 * @param hse The session event
	 */
	public void sessionDestroyed(HttpSessionEvent hse) {
		final HttpSession session = hse.getSession();

		// check if the session is not null (expired)
		if (session == null) {
	  	  return;
		}

		if (log.isDebugEnabled()) {
			log.debug("Session " + session.getId() + " was destroyed, logging the clickstream and removing it.");
		}

		final Clickstream stream = clickstreams.get(session.getId());
		if (stream == null) {
			log.warn("Session " + session.getId() + " doesn't have a clickstream.");
			return;
		}

		try {
			if (stream.getSession() != null) {
				ClickstreamLoggerFactory.getLogger().log(stream);
			}
		}
		catch (Exception e) {
			log.error(e.getMessage(), e);
		}
		finally {
			clickstreams.remove(session.getId());
		}
	}
}

在这里,读者应该明白session和request之间的区别,一次session可以对应多个request,而多个request可以封装成一个Clickstream。所以使用了
private Map<String, Clickstream> clickstreams = new ConcurrentHashMap<String, Clickstream>();

来存储session和Clickstream之间的映射。

每次创建一个session的时候,就在session里面绑定一个Clickstream。


Clickstream的定义如下:

public class Clickstream implements Serializable {

	private static final long serialVersionUID = 1;

	/** The stream itself: a list of click events. */
	private List<ClickstreamRequest> clickstream = new CopyOnWriteArrayList<ClickstreamRequest>();

	/** The attributes. */
	private Map<String, Object> attributes = new HashMap<String, Object>();

	/** The host name. */
	private String hostname;

	/** The original referer URL, if any. */
	private String initialReferrer;

	/**  The stream start time. */
	private Date start = new Date();

	/** The time of the last request made on this stream. */
	private Date lastRequest = new Date();

	/** Flag indicating this is a bot surfing the site. */
	private boolean bot = false;

	/**
	 * The session itself.
	 *
	 * Marked as transient so that it does not get serialized when the stream is serialized.
	 * See JIRA issue CLK-14 for details.
	 */
	private transient HttpSession session;

	/**
	 * Adds a new request to the stream of clicks. The HttpServletRequest is converted
	 * to a ClickstreamRequest object and added to the clickstream.
	 *
	 * @param request The serlvet request to be added to the clickstream
	 */
	public void addRequest(HttpServletRequest request) {
		lastRequest = new Date();

		if (hostname == null) {
			hostname = request.getRemoteHost();
			session = request.getSession();
		}

		// if this is the first request in the click stream
		if (clickstream.isEmpty()) {
			// setup initial referrer
			if (request.getHeader("REFERER") != null) {
				initialReferrer = request.getHeader("REFERER");
			}
			else {
				initialReferrer = "";
			}

			// decide whether this is a bot
			bot = BotChecker.isBot(request);
		}

		clickstream.add(new ClickstreamRequest(request, lastRequest));
	}

	/**
	 * Gets an attribute for this clickstream.
	 *
	 * @param name
	 */
	public Object getAttribute(String name) {
		return attributes.get(name);
	}

	/**
	 * Gets the attribute names for this clickstream.
	 */
	public Set<String> getAttributeNames() {
		return attributes.keySet();
	}

	/**
	 * Sets an attribute for this clickstream.
	 *
	 * @param name
	 * @param value
	 */
	public void setAttribute(String name, Object value) {
		attributes.put(name, value);
	}

	/**
	 * Returns the host name that this clickstream relates to.
	 *
	 * @return the host name that the user clicked through
	 */
	public String getHostname() {
		return hostname;
	}

	/**
	 * Returns the bot status.
	 *
	 * @return true if the client is bot or spider
	 */
	public boolean isBot() {
		return bot;
	}

	/**
	 * Returns the HttpSession associated with this clickstream.
	 *
	 * @return the HttpSession associated with this clickstream
	 */
	public HttpSession getSession() {
		return session;
	}

	/**
	 * The URL of the initial referer. This is useful for determining
	 * how the user entered the site.
	 *
	 * @return the URL of the initial referer
	 */
	public String getInitialReferrer() {
		return initialReferrer;
	}

	/**
	 * Returns the Date when the clickstream began.
	 *
	 * @return the Date when the clickstream began
	 */
	public Date getStart() {
		return start;
	}

	/**
	 * Returns the last Date that the clickstream was modified.
	 *
	 * @return the last Date that the clickstream was modified
	 */
	public Date getLastRequest() {
		return lastRequest;
	}

	/**
	 * Returns the actual List of ClickstreamRequest objects.
	 *
	 * @return the actual List of ClickstreamRequest objects
	 */
	public List<ClickstreamRequest> getStream() {
		return clickstream;
	}

ClickstreamRequest是对HttpServletRequest的简化封装,定义如下:

public class ClickstreamRequest implements Serializable {

	private static final long serialVersionUID = 1;

	private final String protocol;
	private final String serverName;
	private final int serverPort;
	private final String requestURI;
	private final String queryString;
	private final String remoteUser;
	private final long timestamp;

	public ClickstreamRequest(HttpServletRequest request, Date timestamp) {
		protocol = request.getProtocol();
		serverName = request.getServerName();
		serverPort = request.getServerPort();
		requestURI = request.getRequestURI();
		queryString = request.getQueryString();
		remoteUser = request.getRemoteUser();
		this.timestamp = timestamp.getTime();
	}

	public String getProtocol() {
		return protocol;
	}

	public String getServerName() {
		return serverName;
	}

	public int getServerPort() {
		return serverPort;
	}

	public String getRequestURI() {
		return requestURI;
	}

	public String getQueryString() {
		return queryString;
	}

	public String getRemoteUser() {
		return remoteUser;
	}

	public Date getTimestamp() {
		return new Date(timestamp);
	}

	/**
	 * Returns a string representation of the HTTP request being tracked.
	 * Example: <b>www.opensymphony.com/some/path.jsp?arg1=foo&arg2=bar</b>
	 * 
	 * @return a string representation of the HTTP request being tracked.
	 */
	@Override
	public String toString() {
		return serverName + (serverPort != 80 ? ":" + serverPort : "") + requestURI
				+ (queryString != null ? "?" + queryString : "");
	}
}

所以,当每次有请求时,使用Filter对request进行过滤,完善Clickstream的内容

public void doFilter(ServletRequest req, ServletResponse res, FilterChain chain) throws IOException, ServletException {
		// Ensure that filter is only applied once per request.
		if (req.getAttribute(FILTER_APPLIED) == null) {
			log.debug("Applying clickstream filter to request.");

			req.setAttribute(FILTER_APPLIED, true);

			HttpServletRequest request = (HttpServletRequest)req;

			HttpSession session = request.getSession();
			Clickstream stream = (Clickstream) session.getAttribute(ClickstreamListener.SESSION_ATTRIBUTE_KEY);
			stream.addRequest(request);
		}
		else {
			log.debug("Clickstream filter already applied, ignoring it.");
		}

		// pass the request on
		chain.doFilter(req, res);
	}

当session销毁的时候,把Clickstream持久化即可。


改进

1. Clickstram项目,使用ServletContext来存储Map,意味着只能使用一个web容器,

不然无法保证ClickstreamRequest的顺序性,不利于拓展。所以在集群情况下,

比如tomcat集群,可以使用Redis来存储相关的对象。

把Clickstream拆成三部分:

Redis中的List, 每个元素对应着一个序列化之后的ClickstreamRequest 字符串;

Redis中的Hash,存储private Map<String, Object> attributes = new HashMap<String, Object>();

Redis中的Hash,存储hostname,initialReferrer,start,lastRequest,bot,HttpSession -id等字段

使用集合session.ids 来存储相关的session-id,每个字符串session:{id}对应着一个ClickStream,

然后就可以完成相关的持久化操作。


如何统计与分析


持久化使用了下面的两个表(简化一下):
session 会话表:
id   ip referer  is_bot   request_count  start_time   end_time

referer  :入口页面;
is_bot   :是否是搜索引擎;
request_count  :该次会话中,请求的次数,如果为1,则表明用户访问了一个页面,就离开了,可以用于计算跳出率


request 请求表:
id  ip  referer  uri  query is_ajax  start_time   last_time(持续时间) refresh_count   sid

refresh_count   :页面的刷新次数;


List类型的clickstream 中,两个相邻的ClickstreamRequest元素A,B,

B的发生时间减去A的发生时间,可以当做A页面的停留时间;

如果A和B是同一个页面,说明用户在刷新页面A,用户计算request_count  

想要获知在线人数,查看session_ids中的元素个数即可;

其它功能省略……

如果想要获得用户浏览器、操作系统等,可以从User-Agent中获得

由于没使用js,所以无法获知浏览器分辨率等情况。


相关项目

Google Analytic

使用js,据说为了保护用户隐私,不提供用户的浏览顺序。


Piwik

使用js,后端使用php。





使用Servlet获取用户日志

标签:clickstream   用户行为日志   

原文地址:http://blog.csdn.net/u010786672/article/details/43850465

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!