一直以来,个人概念中只有非阻塞socket才会产生EAGAIN的错误,意思是当前不可读写,只要继续重试就好。当最近我们redis模块的一个报错纠正我的这个概念错误。
事件回顾:hiredis的redisConnectWithTimeout和redisContextSetTimeout接口会设置与redis-server连接的socket为阻塞模式,并且设置读写超时,我们项目中设置超时为50ms。接着在下面几天的日志中发现hiredis报错“Resource temporarily unavailable”,一开始非常奇怪,因为这个错误对应的就是EAGAIN,而这种情况在我知识概念中只
有非阻塞模式下才会报。下面介绍下我理清这个概念的过程:
1.查看redis的读写接口
/* Use this function to handle a read event on the descriptor. It will try * and read some bytes from the socket and feed them to the reply parser. * * After this function is called, you may use redisContextReadReply to * see if there is a reply available. */ int redisBufferRead(redisContext *c) { char buf[1024*16]; int nread; /* Return early when the context has seen an error. */ if (c->err) return REDIS_ERR; nread = read(c->fd,buf,sizeof(buf)); if (nread == -1) { if ((errno == EAGAIN && !(c->flags & REDIS_BLOCK)) || (errno == EINTR)) { /* Try again later */ } else { __redisSetError(c,REDIS_ERR_IO,NULL); return REDIS_ERR; </span> //分支A1</span> } } else if (nread == 0) { __redisSetError(c,REDIS_ERR_EOF,"Server closed the connection"); return REDIS_ERR; </span> //分支A2</span> } else { if (redisReaderFeed(c->reader,buf,nread) != REDIS_OK) { __redisSetError(c,c->reader->err,c->reader->errstr); return REDIS_ERR; } } return REDIS_OK; }
/* Write the output buffer to the socket. * * Returns REDIS_OK when the buffer is empty, or (a part of) the buffer was * succesfully written to the socket. When the buffer is empty after the * write operation, "done" is set to 1 (if given). * * Returns REDIS_ERR if an error occured trying to write and sets * c->errstr to hold the appropriate error string. */ int redisBufferWrite(redisContext *c, int *done) { int nwritten; /* Return early when the context has seen an error. */ if (c->err) return REDIS_ERR; if (sdslen(c->obuf) > 0) { nwritten = write(c->fd,c->obuf,sdslen(c->obuf)); if (nwritten == -1) { if ((errno == EAGAIN && !(c->flags & REDIS_BLOCK)) || (errno == EINTR)) { /* Try again later */ } else { __redisSetError(c,REDIS_ERR_IO,NULL); //分支B1 return REDIS_ERR; } } else if (nwritten > 0) { if (nwritten == (signed)sdslen(c->obuf)) { sdsfree(c->obuf); c->obuf = sdsempty(); } else { sdsrange(c->obuf,nwritten,-1); } } } if (done != NULL) *done = (sdslen(c->obuf) == 0); return REDIS_OK; }注意,错误日志中显示hiredis设置错误为REDIS_ERR_IO,并且errstr为“Resource temporarily unavailable”,那么只可能是分支A1和分支B1,再往下追究
2.查看错误设置过程
void __redisSetError(redisContext *c, int type, const char *str) { size_t len; c->err = type; if (str != NULL) { len = strlen(str); len = len < (sizeof(c->errstr)-1) ? len : (sizeof(c->errstr)-1); memcpy(c->errstr,str,len); c->errstr[len] = '\0'; } else { <span style="color:#ff0000;"> /* Only REDIS_ERR_IO may lack a description! */ assert(type == REDIS_ERR_IO); strerror_r(errno,c->errstr,sizeof(c->errstr));</span> } }分支中设置了c->errstr为“Resource temporarily unavailable”,从而反推errno为EAGAIN
3.为什么blocking的read和write会导致errno为EAGAIN?
1)我们对socket做了什么? 设置了超时时间
int redisContextSetTimeout(redisContext *c, const struct timeval tv) { if (setsockopt(c->fd,SOL_SOCKET,SO_RCVTIMEO,&tv,sizeof(tv)) == -1) { __redisSetErrorFromErrno(c,REDIS_ERR_IO,"setsockopt(SO_RCVTIMEO)"); return REDIS_ERR; } if (setsockopt(c->fd,SOL_SOCKET,SO_SNDTIMEO,&tv,sizeof(tv)) == -1) { __redisSetErrorFromErrno(c,REDIS_ERR_IO,"setsockopt(SO_SNDTIMEO)"); return REDIS_ERR; } return REDIS_OK; }2)socket设置SO_RCVTIMEO和SO_SNDTIMEO对read/write有什么影响?看man怎么说
终于清晰了:SO_RCVTIMEO和SO_SNDTIMEO会导致read/write函数返回EAGAIN
另外,在确定错误过程中,同事提到O_NODELAY会导致write接口返回EAGAIN,的确,如果设置了O_NODELAY而当前不可写,那么write接口会设置errno为EAGAIN,但是write接口会返回0而不是-1.在本案中,hiredis接口中并没有设置O_NODELAY
阻塞socket上read/write出现errno为EAGAIN的原因解密
原文地址:http://blog.csdn.net/cleanfield/article/details/41649985