mod_jk2引起Apache+Tomcat重复提交问题

时间：2014-08-24 11:28:42 阅读：549 评论：0 收藏：0 [点我收藏+]

标签：style blog http color java os 使用 io 文件

以前项目中遇到的一个很诡异的问题，记录下来分享一下。

一个很老的项目的生产环境是采用Apache httpd + Tomcat ，使用mod_jk2的插件进行整合，其实这个插件早已停止更新了，反而是mod_jk（1.x）插件的生命周期还在维持。

具体问题的现象是，项目中有一些耗时较多的处理页面，例如一个创建新项目的业务画面，前台页面submit之后，后台要处理一系列的文件，还要登录数据库等，其实在设计阶段已经考虑到了重复提交的问题，所以画面上做了控制（提交之后画面按钮禁用，直到后台操作完成）。开发测试阶段没有出现过问题，然而到了客户的生产环境，在运行了一段时间之后，出现了几次数据重复提交的问题，客户提交给我们解决。

搭设测试环境后，尝试后发现提交按钮按下后，如果关闭浏览器或者通过地址栏输入其他URL并跳转后就会发生服务器端重复提交。经过反复的调查，首先确定了浏览器端没有任何问题，提交的请求只有一次；但是Tomcat中部署的服务程序确实在个别情况下接收到两次请求，与程序代码也没有关系；最后把注意力集中到了Apache的httpd上面。

先是怀疑httpd存在问题，但是google之后没有发现有类似的反馈，考虑到如果真的有问题应该会有很多用户发现的，所以最后怀疑到了mod_jk2上面，更换成了mod_jk（1.x）插件之后就没有问题了。

照理说事情到此就可以解决了，但是不幸的是项目的客户非常较真，认为更换mod_jk插件的话整个系统就需要重新测试才能上线，所以不同意，要求调查清楚原先为什么有问题，被逼无奈只好去查mod_jk2的源代码。

mod_jk2的代码可以从这里下载：http://archive.apache.org/dist/tomcat/tomcat-connectors/jk2/source/jakarta-tomcat-connectors-jk2-2.0.4-src.zip

因为对C不是很熟悉，所以从httpd的log入手，在出现重复提交问题的时候，log中能查到一条对应的记录：“ajp13.service() ajpGetReply recoverable error 3”，在代码中搜索可以找到是jk_worker_ajp13.c的line547输出的，分析其所在的函数 jk2_worker_ajp13_forwardStream之后，发现问题的大致原理是：服务端处理完请求之后，发回响应消息，但是因为浏览器端已经被关闭或者迁移到其他页面，所以试图发回响应时会得到失败的消息，但是mod_jk2插件会试图恢复这一错误（但是这种情况显然无法恢复了），试图恢复的方式是再次向Tomcat发送一开始的请求，这就构成了第二次提交，而在浏览器和Tomcat服务器来看确实是没有任何问题的。（这个仅仅是个人的理解，因为对这方面不是很了解，所以可能理解的不对，不过对于解决问题影响不大）

出问题的函数代码片段如下，可以看到一开始定义了JK_RETRIES变量值为2，如果把这个值改为1就不会出现上面的问题了，但是显然客户是不会接受这一方案的，于是继续调查其他解决方法。

  1 /** There is no point of trying multiple times - each channel may
  2     have built-in recovery mechanisms
  3 */
  4 #define JK_RETRIES 2
  5 
  6 
  7 static int JK_METHOD
  8 jk2_worker_ajp13_forwardStream(jk_env_t *env, jk_worker_t *worker,
  9                                jk_ws_service_t *s, jk_endpoint_t *e)
 10 {
 11     int err = JK_OK;
 12     int attempt;
 13     int has_post_body = JK_FALSE;
 14 
 15     e->recoverable = JK_TRUE;
 16     s->is_recoverable_error = JK_TRUE;
 17 
 18     /*
 19      * Try to send the request on a valid endpoint. If one endpoint
 20      * fails, close the channel and try again ( maybe tomcat was restarted )
 21      * 
 22      * XXX JK_RETRIES could be replaced by the number of workers in
 23      * a load-balancing configuration 
 24      */
 25     for (attempt = 0; attempt < JK_RETRIES; attempt++) {
 26 
 27         if (e->sd == -1) {
 28             err = jk2_worker_ajp13_connect(env, e);
 29             if (err != JK_OK) {
 30                 env->l->jkLog(env, env->l, JK_LOG_ERROR,
 31                               "ajp13.service() failed to connect endpoint errno=%d %s\n",
 32                               errno, strerror(errno));
 33                 e->worker->in_error_state = JK_TRUE;
 34                 return err;
 35             }
 36             if (worker->mbean->debug > 0)
 37                 env->l->jkLog(env, env->l, JK_LOG_DEBUG,
 38                               "ajp13.service() connecting to endpoint \n");
 39         }
 40 
 41         err = e->worker->channel->send(env, e->worker->channel, e,
 42                                        e->request);
 43 
 44         if (e->worker->mbean->debug > 10)
 45             e->request->dump(env, e->request, "Sent");
 46 
 47         if (err != JK_OK) {
 48             /* Can‘t send - bad endpoint, try again */
 49             env->l->jkLog(env, env->l, JK_LOG_ERROR,
 50                           "ajp13.service() error sending, reconnect %s %d %d %s\n",
 51                           e->worker->channelName, err, errno,
 52                           strerror(errno));
 53             jk2_close_endpoint(env, e);
 54             continue;
 55         }
 56 
 57         /* We should have a channel now, send the post data */
 58 
 59         /* Prepare to send some post data ( ajp13 proto ). We do that after the
 60            request was sent ( we‘re receiving data from client, can be slow, no
 61            need to delay - we can do that in paralel. ( not very sure this is
 62            very usefull, and it brakes the protocol ) ! */
 63 
 64         /* || s->is_chunked - this can‘t be done here. The original protocol sends the first
 65            chunk of post data ( based on Content-Length ), and that‘s what the java side expects.
 66            Sending this data for chunked would break other ajp13 serers.
 67 
 68            Note that chunking will continue to work - using the normal read.
 69          */
 70         if (has_post_body || s->left_bytes_to_send > 0
 71             || s->reco_status == RECO_FILLED) {
 72             /* We never sent any POST data and we check it we have to send at
 73              * least of block of data (max 8k). These data will be kept in reply
 74              * for resend if the remote Tomcat is down, a fact we will learn only
 75              * doing a read (not yet) 
 76              */
 77 
 78             /* If we have the service recovery buffer FILLED and we‘re in first attempt */
 79             /* recopy the recovery buffer in post instead of reading it from client */
 80             if (s->reco_status == RECO_FILLED && (attempt == 0)) {
 81                 /* Get in post buf the previously saved POST */
 82 
 83                 if (s->reco_buf->copy(env, s->reco_buf, e->post) < 0) {
 84                     s->is_recoverable_error = JK_FALSE;
 85                     env->l->jkLog(env, env->l, JK_LOG_ERROR,
 86                                   "ajp13.service() can‘t use the LB recovery buffer, aborting\n");
 87                     return JK_ERR;
 88                 }
 89 
 90                 env->l->jkLog(env, env->l, JK_LOG_DEBUG,
 91                               "ajp13.service() using the LB recovery buffer\n");
 92             }
 93             else {
 94                 if (attempt == 0)
 95                     err = jk2_serialize_postHead(env, e->post, s, e);
 96                 else
 97                     err = JK_OK;        /* We already have the initial body chunk */
 98 
 99                 if (e->worker->mbean->debug > 10)
100                     e->request->dump(env, e->request, "Post head");
101 
102                 if (err != JK_OK) {
103                     /* the browser stop sending data, no need to recover */
104                     /* e->recoverable = JK_FALSE; */
105                     s->is_recoverable_error = JK_FALSE;
106                     env->l->jkLog(env, env->l, JK_LOG_ERROR,
107                                   "ajp13.service() Error receiving initial post %d %d %d\n",
108                                   err, errno, attempt);
109 
110                     /* BR #27281 : Should we return HTTP 500 since its the user who stop the sending ? */
111                     /* may be not, so return another HTTP code -> use PARTIAL CONTENT, 206 instead */
112                     s->status = 206;
113                     return JK_ERR;
114                 }
115 
116                 /* If a recovery buffer exist (LB mode), save here the post buf */
117                 if (s->reco_status == RECO_INITED) {
118                     /* Save the post for recovery if needed */
119                     if (e->post->copy(env, e->post, s->reco_buf) < 0) {
120                         s->is_recoverable_error = JK_FALSE;
121                         env->l->jkLog(env, env->l, JK_LOG_ERROR,
122                                       "ajp13.service() can‘t save the LB recovery buffer, aborting\n");
123                         return JK_ERR;
124                     }
125                     else
126                         s->reco_status = RECO_FILLED;
127                 }
128             }
129 
130             has_post_body = JK_TRUE;
131             err = e->worker->channel->send(env, e->worker->channel, e,
132                                            e->post);
133             if (err != JK_OK) {
134                 /* e->recoverable = JK_FALSE; */
135                 /*                 s->is_recoverable_error = JK_FALSE; */
136                 env->l->jkLog(env, env->l, JK_LOG_ERROR,
137                               "ajp13.service() Error sending initial post %d %d %d\n",
138                               err, errno, attempt);
139                 jk2_close_endpoint(env, e);
140                 continue;
141                 /*  return JK_ERR; */
142             }
143         }
144 
145         err =
146             e->worker->workerEnv->processCallbacks(env, e->worker->workerEnv,
147                                                    e, s);
148 
149         /* if we can‘t get reply, check if no recover flag was set 
150          * if is_recoverable_error is cleared, we have started received 
151          * upload data and we must consider that operation is no more recoverable
152          */
153         if (err != JK_OK && !e->recoverable) {
154             s->is_recoverable_error = JK_FALSE;
155             env->l->jkLog(env, env->l, JK_LOG_ERROR,
156                           "ajp13.service() ajpGetReply unrecoverable error %d\n",
157                           err);
158             /* The connection is compromised, need to close it ! */
159             e->worker->in_error_state = 1;
160             return JK_ERR;
161         }
162 
163         if (err != JK_OK) {
164             env->l->jkLog(env, env->l, JK_LOG_ERROR,
165                           "ajp13.service() ajpGetReply recoverable error %d\n",
166                           err);
167             jk2_close_endpoint(env, e);
168         }
169 
170         if (err == JK_OK)
171             return err;
172     }
173     return err;
174 }

从代码if (err != JK_OK && !e->recoverable) 可以看出，当出现发送响应失败时，如果e->recoverable是false，则不会继续整个的loop从而推出整个函数，但是从结果来看显然这个值默认情况下不是false，否则就不会出现问题了。具体查找给e->recoverable赋值的过程忘了是怎样的了，如果借助开发工具（例如VS等）好像容易些，写这篇文章的时候手头恰巧没有C的开发工具，所以用文本编辑器花了点儿事件才找到，这里直接给出，节省各位的时间。

给这个变量赋值是在jk_workerEnv.c的line550~563，大致的逻辑是如果配置文件中指定了相关处理方式，则 recoverable的值是false，否则默认设定为true（曾经对比了mod_jk 1.x的对应代码，默认设定值就是false），看来问题就是出在这里，最可气的是在设定默认为true的代码旁边还有一行作者留的注释“/* Should we do this ? not sure */”，啥意思就不用解释了，费了我这么多力气，真是f**k！

 1 case JK_HANDLER_ERROR:
 2     /* Normal error ( for example writing to the client failed ).
 3      * The ajp connection is still in a stable state but if we ask in configuration
 4      * to abort when header has been send to client, mark as unrecoverable.
 5      */
 6     if (wEnv->options & JK_OPT_RECO_ABORTIFTCSENDHEADER) {
 7         req->is_recoverable_error = JK_FALSE;
 8         env->l->jkLog(env, env->l, JK_LOG_ERROR,
 9                       "workerEnv.processCallbacks() by configuration, avoid recovery when tomcat has started to send headers to client\n");
10     }
11     else
12         ep->recoverable = JK_TRUE;      /* Should we do this ? not sure */
13 
14     return rc;

剩下的事儿就简单了，顺蔓摸瓜，决定wEnv->options & JK_OPT_RECO_ABORTIFTCSENDHEADER的值的代码在同一个文件的line98，代码从设定文件中读取了一个名字是 “noRecoveryIfHeaderSent”的变量，Google之，这个属性可以设定在workers2.properties中，具体例子如下（这个例子是从网络搜索来的，不是我的项目中实际使用的，仅仅是为了各位参考“noRecoveryIfHeaderSent”的使用方式）。

 1 [workerEnv]
 2 logger=logger.apache2
 3 sslEnable=1
 4 timing=1
 5 forwardURICompatUnparsed
 6 noRecoveryIfRequestSent
 7 noRecoveryIfHeaderSent
 8 disabled=0
 9 debug=5
10  
11 [logger.apache2]
12 level=DEBUG
13  
14 [shm]
15 file=${serverRoot}/logs/shm.file
16 size=1048576
17 disabled=0
18 debug=5
19  
20 [channel.socket:192.168.13.4:8009]
21 tomcatId=server2
22 keepalive=0
23 timeout=0
24 disabled=0
25 debug=5
26 #---LB---
27 lb_factor=1
28 ……

注意：只需要写上noRecoveryIfHeaderSent就可以了，如果不写这个属性，那么就是默认值。

以上的都经过了测试，如果有哪位朋友遇到类似的问题，请随意参考~

mod_jk2引起Apache+Tomcat重复提交问题

标签：style blog http color java os 使用 io 文件

原文地址：http://www.cnblogs.com/listen1984/p/3932572.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行