TCP
1. 背景
- TCP 是啥?
TCP是一种协议,和 UDP 一样属于 OSI 七层模型里面的第四层传输层的一种协议。
- TCP 协议是干啥的?
TCP provides a connection-oriented, reliable, byte stream service.
TCP 提供一种面向连接的、可靠的字节流服务。
-
怎么理解 TCP 的面向连接(connection-oriented)?
-
怎么理解 TCP 的可靠性(reliable)?
-
主动打开(active open)与被动打开(passive open)
active open 发送第一个 SYN 的一端将执行主动打开
passive open 接受 active open 这端发送的 SYN 并返回下一个 SYN 的执行被动打开
- 主动关闭与被动关闭
发送第一个 FIN 的一方执行主动关闭,接受这个 FIN 的一方执行被动关闭。
不管是客户端还是服务端,都能主动关闭这个连接。一般由客户端决定何时终止连接。
2. 术语
reordering 排列,排序
acknowledgement 承认,确认
synchronize 同步 同步; 进行同步处理; 三为同步化; 同步处理
Sequence Number 是包的顺序,用来解决网络包的乱序(reordering)问题。
Acknowledgement Number就是ACK--用于确认收到,用来解决不丢包的问题。
SYN 全称 Synchronize Sequence Numbers.
ISN 全称 Inital Sequence Number
socket ?
3. TCP 状态迁移
-
TCP 状态
ESTABLISHED -- The socket has an established connection. SYN_SENT -- The socket is actively attempting to establish a connection. SYN_RECV -- A connection request has been received from the network. FIN_WAIT1 -- The socket is closed, and the connection is shutting down. FIN_WAIT2 -- Connection is closed, and the socket is waiting for a shutdown from the remote end. TIME_WAIT -- The socket is waiting after close to handle packets still in the network. CLOSE -- The socket is not being used. CLOSE_WAIT -- The remote end has shut down, waiting for the socket to close. LAST_ACK -- The remote end has shut down, and the socket is closed. Waiting for acknowledgement. LISTEN -- The socket is listening for incoming connections. Such sockets are not included in the output unless you specify the --listening (-l) or --all (-a) option. CLOSING -- Both sockets are shut down but we still don‘t have all our data sent. UNKNOWN -- The state of the socket is unknown.
-
正常的客户端TCP状态迁移:
CLOSED -> SYN_SENT -> ESTABLISHED -> FIN_WAIT_1 -> FIN_WAIT_2 -> TIME_WAIT -> CLOSED
- 正常的服务器TCP状态迁移:
CLOSED -> LISTEN -> SYN_RCVD -> ESTABLISHED -> CLOSE_WAIT -> LAST_ACK -> CLOSED
- 服务端主动关闭
client
CLOSED -> SYN_SENT -> ESTABLISHED -> CLOSE_WAIT -> LAST_ACK -> CLOSED
server
CLOSED -> LISTEN -> SYN_RCVD -> ESTABLISHED -> FIN_WAIT_1 -> FIN_WAIT_2 -> TIME_WAIT -> CLOSED
-
客户端和服务端同时关闭
-
TCPIP_State_Transition_Diagram
link http://www.cs.northwestern.edu/~agupta/cs340/project2/TCPIP_State_Transition_Diagram.pdf
- TCP 三次握手(three way handshake) {#tcp-三次握手-three-way-handshake}
对于建链接的三次握手,主要是要初始化Sequence Number 的初始值(ISN)。
通信的双方要互相通知对方自己的初始化的Sequence Number。
- TCP 三次握手
client SYN seq=x ---> server
server <--- SYN seq=y, ACK=x+1 client
client ACK=y+1 ---> server
-
三次握手 TCP 状态变迁
-
当 client 开始连接时,server 还处于 LISTENING 状态。
-
client 发送第一个SYN包后,client 就处于 SYN_SENT 状态。
-
server 收到 client 的 SYN 包后就处于 SYN_RCVD 状态,并且给 client 返回一个 ACK 包,发送一个 SYN 包。
-
client 收到 server 返回的 ACK 包和 SYN 包,client 就处于 ESTABLISHED 状态,并且给 server 返回一个 ACK 包。
-
server 收到 client 返回的 ACK 包,server 就处于 ESTABLISHED 状态。
-
client Status server Status
CLOSED LISTENING
| client SYN seq=x ---> server |
SYN_SENT |
| SYN_RCVD
| client <--- SYN seq=y, ACK=x+1 server |
ESTABLISHED |
client ACK=y+1 ---> server |
ESTABLISHED
- tcpdump 三次握手
$ sudo tcpdump -S -nn -i p33p1 tcp and host www.baidu.com
16:10:57.732489 IP 10.240.155.182.49884 > 115.239.211.110.80: Flags [S], seq 2696447430, win 29200, options [mss 1460,sackOK,TS val 2695718698 ecr 0,nop,wscale 7], length 0
16:10:57.734058 IP 115.239.211.110.80 > 10.240.155.182.49884: Flags [S.], seq 282159576, ack 2696447431, win 29200, options [mss 1440,sackOK,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,wscale 7], length 0
16:10:57.734099 IP 10.240.155.182.49884 > 115.239.211.110.80: Flags [.], ack 282159577, win 229, length 0
- 关于 sequence number 的产生
在三次握手协议中,Clinet一定要监听服务器发送过来的ISNs, TCP使用的sequence number是一个32位的计数器,从0-4294967295。TCP为每一个连接选择一个初始序号ISN,为了防止因为延迟、重传等扰乱三次握手, ISN不能随便选取,不同系统有不同算法。
5. TCP 四次挥手
- TCP 四次挥手
client FIN ---> server
client <--- ACK server
client <--- FIN server
client ACK ---> server
-
四次挥手状态变迁
-
当 client 请求关闭连接时,client 给 server 发送一个 FIN 包,客户端就进入 FIN_WAIT_1 状态,等待对方的确认包
-
server 收到 client 发来的 FIN 包后,给 client 返回一个 ACK 确认包,进入 CLOSE_WAIT 状态
-
client 收到 server 发来的 ACK 包后,结束 FIN_WAIT_1 状态,进入 FIN_WAIT_2 状态,等待 server 发过来关闭请求
-
server 认为数据传输完成,给 client 发送一个 FIN 包后,进入 LAST_ACK 状态
-
当 client 收到 server 的 FIN 包后,FIN_WAIT_2 状态结束,同时给 server 返回一个 ACK 确认包,client 进入 TIME_WAIT状态
-
当 server 收到 client 的ACK 包后,server 结束 LAST_ACK 状态,进入 CLOSED 状态
-
此时 server 真正关闭啦连接,但是客户端还在 TIME_WAIT 状态,client 等待 2MSL 后返回到 CLOSED 状态
Note 2MSL
-
client Status server Status
ESTABLISHED ESTABLISHED
| client FIN ---> server |
FIN_WAIT_1 |
| |
| client <--- ACK server |
FIN_WAIT_2 CLOSE_WAIT
| client <--- FIN server |
| LAST_ACK
| client ACK ---> server |
TIME_WAIT CLOSED
|
CLOSED
- tcpdump 四次挥手
16:10:57.737305 IP 10.240.155.182.49884 > 115.239.211.110.80: Flags [F.], seq 2696447509, ack 282160286, win 240, length 0 16:10:57.738883 IP 115.239.211.110.80 > 10.240.155.182.49884: Flags [.], ack 2696447510, win 193, length 0 16:10:57.738931 IP 115.239.211.110.80 > 10.240.155.182.49884: Flags [F.], seq 282160286, ack 2696447510, win 193, length 0 16:10:57.738964 IP 10.240.155.182.49884 > 115.239.211.110.80: Flags [.], ack 282160287, win 240, length 0
6. TIME_WAIT 状态
- MSL (Maximum Segment Lifetime) 报文段最大生存时间
任何报文段被丢弃前在网络中的最大时间。
Note MSL 时间,refer RFC 793
- TIME_WAIT状态
TIME_WAIT状态也称为 2MSL 等待状态。
- MSL 处理原则
当 TCP 执行一个主动关闭,并发回最后一个 ACk,该连接必须在 TIME_WAIT 状态停留的时间为2倍的 MSL。这样可让 TCP 再次发送最后的 ACk 以防这个 ACk 丢失(另一端超市并重发最后的 FIN )。
6.1. 关于 TIME_WAIT 数量太多
link http://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux.html
-
关于tcp_tw_reuse
-
关于tcp_tw_recycle
-
关于tcp_max_tw_buckets
7. LVS web 应用抓包分析
7.1. 说明
client 123.58.180.188
lvs vip 123.58.180.123
realserver
lvs --> realserver DR
7.2. 抓包
- client
curl -I -H Host:photo.163.com 123.58.180.123
sudo netstat -anlp|grep 123.58.180.123
tcp 0 0 123.58.180.188:14514 123.58.180.123:80 TIME_WAIT -
sudo tcpdump -i eth0 tcp and host 123.58.180.123 -nnn
14:42:51.151998 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [S], seq 1660001800, win 2920, options [mss 1460,sackOK,TS val 3477472664 ecr 0,nop,wscale 9], length 0
14:42:51.152299 IP 123.58.180.123.80 > 123.58.180.188.32935: Flags [S.], seq 122778619, ack 1660001801, win 2896, options [mss 1460,sackOK,TS val 2328633369 ecr 3477472664,nop,wscale 9], length 0
14:42:51.152346 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [.], ack 122778620, win 6, options [nop,nop,TS val 3477472664 ecr 2328633369], length 0
14:42:51.152508 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [P.], seq 1660001801:1660001878, ack 122778620, win 6, options [nop,nop,TS val 3477472664 ecr 2328633369], length 77
14:42:51.152701 IP 123.58.180.123.80 > 123.58.180.188.32935: Flags [.], ack 1660001878, win 6, options [nop,nop,TS val 2328633376 ecr 3477472664], length 0
14:42:51.280587 IP 123.58.180.123.80 > 123.58.180.188.32935: Flags [P.], seq 122778620:122779256, ack 1660001878, win 6, options [nop,nop,TS val 2328633408 ecr 3477472664], length 636
14:42:51.280616 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [.], ack 122779256, win 9, options [nop,nop,TS val 3477472696 ecr 2328633408], length 0
14:42:51.280773 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [F.], seq 1660001878, ack 122779256, win 9, options [nop,nop,TS val 3477472696 ecr 2328633408], length 0
14:42:51.280930 IP 123.58.180.123.80 > 123.58.180.188.32935: Flags [F.], seq 122779256, ack 1660001879, win 6, options [nop,nop,TS val 2328633408 ecr 3477472696], length 0
14:42:51.280962 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [.], ack 122779257, win 9, options [nop,nop,TS val 3477472696 ecr 2328633408], length 0
- LVS
eth0 抓包
sudo tcpdump -i eth0 tcp and host 123.58.180.188 -nnn -c 50
14:42:51.152382 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [S], seq 1660001800, win 2920, options [mss 1460,sackOK,TS val 3477472664 ecr 0,nop,wscale 9], length 0
14:42:51.152672 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [.], ack 122778620, win 6, options [nop,nop,TS val 3477472664 ecr 2328633369], length 0
14:42:51.152848 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [P.], seq 1660001801:1660001878, ack 122778620, win 6, options [nop,nop,TS val 3477472664 ecr 2328633369], length 77
14:42:51.280959 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [.], ack 122779256, win 9, options [nop,nop,TS val 3477472696 ecr 2328633408], length 0
14:42:51.281096 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [F.], seq 1660001878, ack 122779256, win 9, options [nop,nop,TS val 3477472696 ecr 2328633408], length 0
14:42:51.281291 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [.], ack 122779257, win 9, options [nop,nop,TS val 3477472696 ecr 2328633408], length 0
eth1 抓包
sudo tcpdump -i eth1 tcp and host 123.58.180.188 and port 80 -nnn -c 500 -S
14:42:51.152422 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [S], seq 1660001800, win 2920, options [mss 1460,sackOK,TS val 3477472664 ecr 0,nop,wscale 9], length 0
14:42:51.152690 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [.], ack 122778620, win 6, options [nop,nop,TS val 3477472664 ecr 2328633369], length 0
14:42:51.152864 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [P.], seq 1660001801:1660001878, ack 122778620, win 6, options [nop,nop,TS val 3477472664 ecr 2328633369], length 77
14:42:51.280975 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [.], ack 122779256, win 9, options [nop,nop,TS val 3477472696 ecr 2328633408], length 0
14:42:51.281112 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [F.], seq 1660001878, ack 122779256, win 9, options [nop,nop,TS val 3477472696 ecr 2328633408], length 0
14:42:51.281309 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [.], ack 122779257, win 9, options [nop,nop,TS val 3477472696 ecr 2328633408], length 0
- realserver
bond0 抓包
sudo tcpdump -i bond0 tcp and host 123.58.180.188 -nnn -c 50
14:42:51.152419 IP 123.58.180.123.80 > 123.58.180.188.32935: Flags [S.], seq 122778619, ack 1660001801, win 2896, options [mss 1460,sackOK,TS val 2328633369 ecr 3477472664,nop,wscale 9], length 0
14:42:51.152853 IP 123.58.180.123.80 > 123.58.180.188.32935: Flags [.], ack 1660001878, win 6, options [nop,nop,TS val 2328633376 ecr 3477472664], length 0
14:42:51.280718 IP 123.58.180.123.80 > 123.58.180.188.32935: Flags [P.], seq 122778620:122779256, ack 1660001878, win 6, options [nop,nop,TS val 2328633408 ecr 3477472664], length 636
14:42:51.281079 IP 123.58.180.123.80 > 123.58.180.188.32935: Flags [F.], seq 122779256, ack 1660001879, win 6, options [nop,nop,TS val 2328633408 ecr 3477472696], length 0
bond1 抓包
sudo tcpdump -i bond1 tcp and host 123.58.180.188 and port 80 -nnn -c 500 -S
14:42:51.152356 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [S], seq 1660001800, win 2920, options [mss 1460,sackOK,TS val 3477472664 ecr 0,nop,wscale 9], length 0
14:42:51.152619 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [.], ack 122778620, win 6, options [nop,nop,TS val 3477472664 ecr 2328633369], length 0
14:42:51.152836 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [P.], seq 1660001801:1660001878, ack 122778620, win 6, options [nop,nop,TS val 3477472664 ecr 2328633369], length 77
14:42:51.280922 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [.], ack 122779256, win 9, options [nop,nop,TS val 3477472696 ecr 2328633408], length 0
14:42:51.281048 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [F.], seq 1660001878, ack 122779256, win 9, options [nop,nop,TS val 3477472696 ecr 2328633408], length 0
14:42:51.281240 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [.], ack 122779257, win 9, options [nop,nop,TS val 3477472696 ecr 2328633408], length 0
7.3. 三次握手分析
- client抓包分析
client 对 lvs 发起连接,通过抓包,能看到 client 和 lvs vip 之间的完整的三次握手信息,并没有看到 client 和 realserver 之间有任何的交互信息。
14:42:51.151998 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [S], seq 1660001800, win 2920, options [mss 1460,sackOK,TS val 3477472664 ecr 0,nop,wscale 9], length 0
14:42:51.152299 IP 123.58.180.123.80 > 123.58.180.188.32935: Flags [S.], seq 122778619, ack 1660001801, win 2896, options [mss 1460,sackOK,TS val 2328633369 ecr 3477472664,nop,wscale 9], length 0
14:42:51.152346 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [.], ack 122778620, win 6, options [nop,nop,TS val 3477472664 ecr 2328633369], length 0
所以,从 client 这边来看,lvs 就是完整提供服务的 web server, client 是无法分辨 lvs 后端是否存在 realserver。
-
LVS 抓包分析
-
收到 client 发来的第一个 SYN 包
lvs 看到 client 发的 SYN 包 14:42:51.152382 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [S], seq 1660001800, win 2920, options [mss 1460,sackOK,TS val 3477472664 ecr 0,nop,wscale 9], length 0 client 给 lvs 发的 SYN 包 14:42:51.151998 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [S], seq 1660001800, win 2920, options [mss 1460,sackOK,TS val 3477472664 ecr 0,nop,wscale 9], length 0
-
收到 client 返回的 ACk 包
hlight} lvs 看到 client 返回的三次握手最后一个 ACK 包 14:42:51.152672 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [.], ack 122778620, win 6, options [nop,nop,TS val 3477472664 ecr 2328633369], length 0 client 给 lvs 发送三次握手的最后一个 ACK 包 14:42:51.152346 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [.], ack 122778620, win 6, options [nop,nop,TS val 3477472664 ecr 2328633369], length 0
-
-
realserver 抓包分析
-
realserver 在网卡 bond1 收到 client 发来的 SYN 包
hlight} bond1 收到 client 发来的 SYN 包 14:42:51.152356 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [S], seq 1660001800, win 2920, options [mss 1460,sackOK,TS val 3477472664 ecr 0,nop,wscale 9], length 0 client 给 lvs 发送 SYN 包 14:42:51.151998 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [S], seq 1660001800, win 2920, options [mss 1460,sackOK,TS val 3477472664 ecr 0,nop,wscale 9], length 0
-
realserver 通过网卡 bond0 给 client 发送 SYN 包,并返回 client 发送过来的 SYN 包的确认 ACK 包
hlight} realserver 给 client 发送 SYN + ACK 包 14:42:51.152419 IP 123.58.180.123.80 > 123.58.180.188.32935: Flags [S.], seq 122778619, ack 1660001801, win 2896, options [mss 1460,sackOK,TS val 2328633369 ecr 3477472664,nop,wscale 9], length 0 client 14:42:51.152299 IP 123.58.180.123.80 > 123.58.180.188.32935: Flags [S.], seq 122778619, ack 1660001801, win 2896, options [mss 1460,sackOK,TS val 2328633369 ecr 3477472664,nop,wscale 9], length 0
-
realserver 在网卡 bond1 收到 client 返回的确认 ACK 包
hlight} bond1 收到 client 返回的 ACK 包 14:42:51.152619 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [.], ack 122778620, win 6, options [nop,nop,TS val 3477472664 ecr 2328633369], length 0 client 给 lvs 返回 ACK 包 14:42:51.152346 IP 123.58.180.188.32935 > 123.58.180.123.80: Flags [.], ack 122778620, win 6, options [nop,nop,TS val 3477472664 ecr 2328633369], length 0
-
给 client 返回 ACK 包,这个 ACK 包用来确认 client 给 web server 发送的 http 请求头的数据包
realserver 10:42:29.079043 IP 123.58.180.123.80 > 123.58.180.188.14514: Flags [.], ack 78, win 6, options [nop,nop,TS val 2325027858 ecr 3473867146], length 0 client 10:42:29.079175 IP 123.58.180.123.80 > 123.58.180.188.14514: Flags [.], ack 78, win 6, options [nop,nop,TS val 2325027858 ecr 3473867146], length 0
-
7.4. LVS DR 模式数据包流向
-
client 发送 SYN 包流向
client -→ lvs-vip-pub-ip-eth0 -→ lvs-pri-ip-eth1 -→ rs-pri-ip-bond1
-
realserver 发送 SYN 包,并返回 ACK 包
rs-pub-bond0 -→ client
-
client 返回 ACK 包
client -→ lvs-vip-pub-ip-eth0 -→ lvs-pri-ip-eth1 -→ rs-pri-ip-bond1
7.5. 网卡流量分析
7.5.1. LVS 节点
- 外网网卡流量
外网只进不出
- 内网网卡流量
内网只出不进
eth0 流量 = eth1 流量
LVS 在 DR 模式下的流量主要为 http 的请求头信息,从外网网卡 eth0 流进通过内网网卡 eth1 转发到后端 realserver, LVS 本身不处理任何请求。所以外网 eth0 流进的流量和内网 eth1 流出的流量相同。
网卡选型 LVS 外网网卡接受的 SYN 和 ACK 小包的数量还是很多的,总体流量上面不算太高,数据包的数量上面还是比较多的,网络中断会比较高,对网卡的处理小包的要求比较高。所以网卡最好选择 Intel 家的多对列网卡,慎用 Broadcom 家的网卡。
7.5.2. realserver 节点
-
内网网卡进的流量
-
lvs 到 realserver 的流量
-
后端节点到 upstream server 的流量
-
-
内网网卡出的流量
- upstream server 到 upstream 节点的流量
-
外网网卡流量
- realserver 到外网的节点的流Last edited by root, 2015-03-16 15:25:49