Abbreviation of Transmission Control Protocol, and pronounced as separate letters. TCP is one of the main protocols in TCP/IP networks. Whereas the IP protocol deals only with packets, TCP enables two hosts to establish a connection and exchange streams of data. TCP guarantees delivery of data and also guarantees that packets will be delivered in the same order in which they were sent.
TCP stands for Transmission Control Protocol. It is described in STD-7/RFC-793. TCP is a connection-oriented protocol that is responsible for reliable communication between two end processes. The unit of data transferred is called a stream, which is simply a sequence of bytes.
Being connection-oriented means that before actually transmitting data, you must open the connection between the two end points. The data can be transferred in full duplex (send and receive on a single connection). When the transfer is done, you have to close the connection to free system resources. Both ends know when the session is opened (begin) and is closed (end). The data transfer cannot take place before both ends have agreed upon the connection. The connection can be closed by either side; the other is notified. Provision is made to close gracefully or just abort the connection.
Being stream oriented means that the data is an anonymous sequence of bytes. There is nothing to make data boundaries apparent. The receiver has no means of knowing how the data was actually transmitted. The sender can send many small data chunks and the receiver receive only one big chunk, or the sender can send a big chunk, the receiver receiving it in a number of smaller chunks. The only thing that is guaranteed is that all data sent will be received without any error and in the correct order. Should any error occur, it will automatically be corrected (retransmitted as needed) or the error will be notified if it can't be corrected.
At the program level, the TCP stream look like a flat file. When you write data to a flat file, and read it back later, you are absolutely unable to know if the data has been written in only one chunk or in several chunks. Unless you write something special to identify record boundaries, there is nothing you can do to learn it afterward. You can, for example, use CR or CR LF to delimit your records just like a flat text file.
At the programming level, TWSocket is fairly simple to use. To send data, you just need to call the Send method (or any variation such as SendStr) to give the data to be transmitted. TWSocket will put it in a buffer until it can be actually transmitted. Eventually the data will be sent in the background (the Send method returns immediately without waiting for the data to be transmitted) and the OnDataSent event will be generated once the buffer is emptied.
To receive data, a program must wait until it receives the OnDataAvailable event. This event is triggered each time a data packet comes from the lower level. The application must call the Receive method to actually get the data from the low-level buffers. You have to Receive all the data available or your program will go in an endless loop because TWSocket will trigger the OnDataAvailable again if you didn't Receive all the data.
As the data is a stream of bytes, your application must be prepared to receive data as sent from the sender, fragmented in several chunks or merged in bigger chunks. For example, if the sender sent "Hello " and then "World!", it is possible to get only one OnDataAvailable event and receive "Hello World!" in one chunk, or to get two events, one for "Hello " and the other for "World!". You can even receive more smaller chunks like "Hel", "lo wo" and "rld!". What happens depends on traffic load, router algorithms, random errors and many other parameters you can't control.
On the subject of client/server applications, most applications need to know command boundaries before being able to process data. As data boundaries are not always preserved, you cannot suppose your server will receive a single complete command in one OnDataAvailable event. You can receive only part of a request or maybe two or more request merged in one chunk. To overcome this difficulty, you must use delimiters.
Most TCP/IP protocols, like SMTP, POP3, FTP and others, use CR/LF pair as command delimiter. Each client request is sent as is with a CR/LF pair appended. The server receives the data as it arrives, assembles it in a receive buffer, scans for CR/LF pairs to extract commands from the received stream, and removes them from the receive buffer.
Short for User Datagram Protocol, a connectionless protocol that, like TCP, runs on top of IP networks. Unlike TCP/IP, UDP/IP provides very few error recovery services, offering instead a direct way to send and receive datagrams over an IP network. It's used primarily for broadcasting messages over a network.
UDP stands for User Datagram Protocol. It is described in STD-6/RFC-768 and provides a connectionless host-to-host communication path. UDP has minimal overhead:; each packet on the network is composed of a small header and user data. It is called a UDP datagram.
UDP preserves datagram boundaries between the sender and the receiver. It means that the receiver socket will receive an OnDataAvailable event for each datagram sent and the Receive method will return a complete datagram for each call. If the buffer is too small, the datagram will be truncated. If the buffer is too large, only one datagram is returned, the remaining buffer space is not touched.
UDP is connectionless. It means that a datagram can be sent at any moment without prior advertising, negotiation or preparation. Just send the datagram and hope the receiver is able to handle it.
UDP is an unreliable protocol. There is absolutely no guarantee that the datagram will be delivered to the destination host. But to be honest, the failure rate is very low on the Internet and nearly null on a LAN unless the bandwidth is full.
Not only the datagram can be undelivered, but it can be delivered in an incorrect order. It means you can receive a packet before another one, even if the second has been sent before the first you just received. You can also receive the same packet twice.
Your application must be prepared to handle all those situations: missing datagram, duplicate datagram or datagram in the incorrect order. You must program error detection and correction. For example, if you need to transfer some file, you'd better set up a kind of zmodem protocol.
The main advantages for UDP are that datagram boundaries are respected, you can broadcast, and it is fast.
The main disadvantage is unreliability and therefore complicated to program at the application level.
TCP and UDP use the same addressing scheme. An IP address (32 bits number, always written as four 8-bit number expressed as unsigned 3-digit decimal numbers separated by dots such as 18.104.22.168) and a port number (a 16-bit number expressed as a unsigned decimal number).
The IP address is used by the low-level protocol (IP) to route the datagram to the correct host on the specified network. Then the port number is used to route the datagram to the correct host process (a program on the host).
For a given protocol (TCP or UDP), a single host process can exist at a time to receive data sent to the given port. Usually one port is dedicated to one process.
tcp vs. udp for file transfer
i think tcp is an overused protocol; i think udp is an underused protocol. this is an argument i've been having quite a bit with people lately, so i've decided i'll lay out my reasoning here so i don't have to type or recite it at people over and over.
since i first wrote this (jan 1998 or so), tcp extensions that fix many of my complaints (selective acknowledgment, large windows, tcp for transactions) have seen more widespread implementation. while they are a step in the right direction (well, t/tcp was a terrible security blunder...), tcp will always be a stream protocol, and thus will never be an optimal transport for some applications.
advantages of tcp
- the operating system does all the work. you just sit back and watch the show. no need to have the same bugs in your code that everyone else did on their first try; it's all been figured out for you.
- since it's in the os, handling incoming packets has fewer context switches from kernel to user space and back; all the reassembly, acking, flow control, etc is done by the kernel.
- tcp guarantees three things: that your data gets there, that it gets there in order, and that it gets there without duplication. (the truth, the whole truth, and nothing but the truth...)
- routers may notice tcp packets and treat them specially. they can buffer and retransmit them, and in limited cases preack them.
- tcp has good relative throughput on a modem or a lan.
disadvantages of tcp
- the operating system may be buggy, and you can't escape it. it may be inefficient, and you have to put up with it. it may be optimized for conditions other than the ones you are facing, and you may not be able to retune it.
- tcp makes it very difficult to try harder; you can set a few socket options, but beyond that you have to tolerate the built in flow control.
- tcp may have lots of features you don't need. it may waste bandwidth, time, or effort on ensuring things that are irrelevant to the task at hand.
- tcp has no block boundaries; you must create your own.
- routers on the internet today are out of memory. they can't pay much attention to tcp flying by, and try to help it. design assumptions of tcp break down in this environment.
- tcp has relatively poor throughput on a lossy, high bandwidth, high latency link, such as a satellite connection or an overfull t1.
- tcp cannot be used for broadcast or multicast transmission.
- tcp cannot conclude a transmission without all data in motion being explicitly acked.
disadvantages of udp
- there are no guarantees with udp. a packet may not be delivered, or delivered twice, or delivered out of order; you get no indication of this unless the listening program at the other end decides to say something. tcp is really working in the same environment; you get roughly the same services from ip and udp. however, tcp makes up for it fairly well, and in a standardized manner.
- udp has no flow control. implementation is the duty of user programs.
- routers are quite careless with udp. they never retransmit it if it collides, and it seems to be the first thing dropped when a router is short on memory. udp suffers from worse packet loss than tcp.
advantages of udp
- it doesn't restrict you to a connection based communication model, so startup latency in distributed applications is much lower, as is operating system overhead.
- all flow control, acking, transaction logging, etc is up to user programs; a broken os implementation is not going to get in your way. additionally, you only need to implement and use the features you need.
- the recipient of udp packets gets them unmangled, including block boundaries.
- broadcast and multicast transmission are available with udp.
disadvantages of tcp for file transfer
- startup latency is significant. it takes at least twice rtt to start getting data back.
- tcp allows a window of at most 64k, and the acking mechanism means that packet loss is misdetected. tcp stalls easily under packet loss. tcp is more throttled by rtt than bandwidth.
- tcp transfer servers have to maintain a separate socket (and often separate thread) for each client.
- load balancing is crude and approximate. especially on local networks that allow collisions, two simultaneous tcp transfers have a tendency to fight with each other, even if the sender is the same.
advantages of udp for file transfer
- latency can be as low as rtt if the protocol is suitably designed.
- flow control is up to user space; windows can be infinite, artificial stalls nonexistant, latency well tolerated, and maximum speeds enforced only by real network bandwidth, yet actual speeds chosen by agreement of sender and receiver.
- receiving an image simultaneously from multiple hosts is much easier with udp, as is sending one to multiple hosts, especially if they happen to be part of the same broadcast or multicast group.
a single sending host with multiple transfers proceeding can balance them with excellent precision.
The Internet runs on a hierarchical protocol stack. A simplified version of this is shown in figure 1 . The layer common to all Internet applications is the IP (Internet Protocol) layer. This layer provides a connectionless, unreliable packet based delivery service. It can be described as connectionless because packets are treated independently of all others. The service is unreliable because there is no guarantee of delivery. Packets may be silently dropped, duplicated or delayed and may arrive out of order. The service is also called a best effort service, all attempts to deliver a packet will be made, with unreliability only caused by hardware faults or exhausted resources.
As there is no sense of a connection at the IP level there are no simple methods to provide a quality of service (QoS). QoS is a request from an application to the network to provide a guarantee on the quality of a connection. This allows an application to request a fixed amount of bandwidth from the network, and assume it will be provided, once the QoS request has been accepted. Also a fixed delay, i.e. no jitter and in order delivery can be assumed. A network that supports QoS will be protected from congestion problems, as the network will refuse connections that request larger resources than can be supplied. An example of a network that supports QoS is the current telephone network, where every call is guaranteed the bandwidth for the call. Most users at some point have heard the overloaded signal where the network cannot provide the requested resource required to make a call.
The application decides which transport protocol is used. The two protocols shown here, TCP and UDP are the most commonly used ones. TCP provides a reliable connection and is used by the majority of current Internet applications. TCP, besides being responsible for error checking and correcting, is also responsible for controlling the speed at which this data is sent. TCP is capable of detecting congestion in the network and will back off transmission speed when congestion occurs. These features protect the network from congestion collapse.
As discussed in the introduction, VoIP is a real-time service. For real-time properties to be guaranteed to be met, a network with QoS must be used to provide fixed delay and bandwidth. It has already been said that IP cannot provide this. This then presents a choice. If IP is a requirement, which transport layer should be used to provide a system that is most likely to meet real-time constraints.
As TCP provides features such as congestion control, it would be the preferred protocol to use. Unfortunately due to the fact that TCP is a reliable service, delays will be introduced whenever a bit error or packet loss occurs. This delay is caused by retransmission of the broken packet, along with any successive packets that may have already been sent. This can be a large source of jitter.
TCP uses a combination of four algorithms to provide congestion control, slow start, congestion avoidance, fast retransmit and fast recovery. These algorithms all use packet loss as an indication of congestion, and all alter the number of packets TCP will send before waiting for acknowledgments of those packets. These alterations affect the bandwidth available and also change delays seen on a link, providing another source of jitter.