TCP Protocol: Flow Control
In the last two posts here and here, we have discussed how TCP stack initiates a TCP 3-way handshake and create the appropriate Transmission Control Block for the data to flow reliably. We knew that the main function of the 3-way handshaking is to exchange the sequence numbers, MSS, receive window and other parameters between two endpoints.
In this post, we will discuss the concept of flow control and how TCP will reliably make sure that data segments are delivered to the other end in-order for the correct data reassembly using the sequence and acknowledgement numbers and sliding windows.
To better demonstrate what happens, I have used an FTP client on my laptop to download 10MB file from a test FTP server running a Debian distribution of Linux while capturing the traffic between us. The below figure is taken from my Wireshark. I have added some columns that makes us better read what’s happening. Those columns are TCP Segment Length which shows how much payload is in the TCP segment without the header. I have also included the Sequence, ACK number and Delta columns.
In the figure, the first packet which is No.26 is the SYN packet sent from my client toward the server exchanging an MSS of 1260, Window of 65535 and some other parameters which will be explained later.
In TCP normal operation, the ACK number sent in response to the segments that have a data payload, will represent the amount of bytes the receiver has received.
For example, let us assume that we have a sender that has 100-byte of data into a TCP segment, and has started its SEQ number with 0. In this case, it will send a TCP packet with a SEQ=0, TCP Segment Len=100. The receiver will receive 100-byte and will confirm that it has received all those bytes by responding with an ACK=100. The sender will hereby know that the next byte that can be sent is byte number 101.
However, this method of "counting bytes" is not completely true due to something called “Phantom Byte”. In the TCP 3-way handshaking conducted above in the figure, the first SYN packet is sent with an initial sequence number of 0 and has no data payload, thus the Segment Length is 0. However, in exchange of the SYN packet, the server respond with an ACK packet, No.27 which has a sequence number of 0 and ACK of 1.
Although the first TCP packet in the TCP stream has no data, the ACK in response to this packet will confirm that 1 byte has been received. The phantom byte is distracting and confuses many people who do the math between the SEQ and ACK number in TCP.
So, in reality, and in the above 100-bytes example, the receiver will respond to the sender with an ACK=101, because the start-of-count has been initiated from the number 1, not 0. I see some people ease this a little bit and say that the ACK number is the first byte expected by the next TCP segment.
Getting back to the figure; we can see that the client window size is 65K while the server is only 14K, the maximum segment size the client can receive is 1260, while the server can receive 1460. In this case, the lowest MSS value will be set, which is 1260.
Let’s do the math; my client said to the server that I can receive 65K of bytes, divided into segment, each is 1260 bytes before getting an ACK from me (This is the definition of the Window size that I hope you already remember from the previous posts). In other words, you are free to send me 65535/1260=52 segments before getting an ACK from me.
Being said that; the server will not fire 52 segments initially to me, because this might congest the pipe and the path between the server and my client. This is the beauty of TCP, because TCP always presume that some other TCP sessions somewhere over the same path are currently working and carrying traffic, so I might collide with them.
So, TCP instead will blindly try to slowly start and send few segment instead of sending all the 52. This is called TCP slow start and the number of segments that the server can send me will be determined by another window size called “Congestion Window”.
Opposite to the "Receive Window" in direction, the Congestion Window is the number of bytes that the sender can send to the receiver without congesting the link regardless of the "Receive Window" of the receiver. So, in my case, my client Receive Window is 65K and the server Congestion Window is 12K (1260*10).
How did I calculate the "Congestion Window" of the server from the TCP? I didn’t. The TCP Congestion Window is initially set by the operating system, and is then adapted based upon the throughput and the conditions thereafter. As I said before, the server is Debian Linux, and after some search, I found that the Initial Congestion Window value set is 10 times the MSS.
So, continuing our explanation; the server will send only 10 segments at a time before its Congestion Window exhaust. The server will start off by sending the packets, and we can see closely that packets No.31 through No.44 have been sent nearly at the same time (15:40:31:862 ~ 864). Take into consideration that I'm capturing these packets on the client, and if we observed the RTT between the first SYN and ACK packets that TCP will drive its initial RTO from, we will find it around 60ms. So, the server has initially sent those packets at (15:40:31:832 ~ 834).
In exchange of those packets, 5 ACK packets have been sent in return. The first one of them is packet No.33. Again, doing the math; 1260 bytes in the first segment + 1260 bytes in the second gets us 2520 so we expect an ACK number of 2521 because of the extra phantom byte.
Notice that packet No.33 from the client to the server has been sent with SEQ number of 1, and for every ACK packet in the coming capture in this direction we will find that the SEQ number is 1, this is because there was no actual data sent from the client to the server. The client only ACK the data that is being sent from the server, so the SEQ will always be 1, and will not increment in this case.
Something else deserve a notice here; every ACK packet is acknowledging two TCP segments. In early drafts of TCP operation, every TCP segment sent must be acknowledged with an ACK, however this would be a waste of bandwidth. So, the concept of delayed ACK has come to effect. Delayed ACK is an ACK packet that can acknowledge more than one TCP segments. Most operating systems sets their delayed ACK to be on by default, and to send one ACK packet for two segments received.
Considering the RTT, the server will receive those 5 consecutive ACK packets at around 15:40:31.892 ~ 9894. Once the server receive those ACKs, it will do two things;
1) It will restfully move the 10 ACKed segments out of its Congestion Window making a room for new segments. This is what is called the sliding window. The window size is constant at the moment (holds only 10 segments) but it can get rid of old ACKed segments in favor of new segments.
2) TCP always keep a copy of every byte sent until it gets an ACK in return, because the bytes sent might get lost in transit and the receiver will ask for the bytes to be sent again. In case where the window slid, the copies of these bytes will be thrown away by the system.
As shown in the figure above, the Window has slid to the right. The 10 segments to the left are bytes that has been acknowledged, segments from 11 to 20 are called bytes in flight, because they are bytes sent but hasn't been acknowledged yet. The server will send those are 15:40:31.892 and will be captured at our client at 15:40:31.922, again because of the 30ms trip-time.
I will end this long post here, and will continue to use this capture file in the future posts. In the coming posts, I will show you how TCP slow start worked here in this example, and how TCP react to packet loss, etc.