Comparative
analysis - TCP - UDP
TCP
Abbreviation of Transmission
Control Protocol, and pronounced
as separate letters. TCP is
one of the main protocols
in TCP/IP networks.
Whereas the IP
protocol deals only with packets,
TCP enables two hosts
to establish a connection and
exchange streams of data. TCP
guarantees delivery of data
and also guarantees that packets
will be delivered in the same
order in which they were sent.
TCP stands for Transmission
Control Protocol. It is described
in STD-7/RFC-793. TCP is a connection-oriented
protocol that is responsible
for reliable communication between
two end processes. The unit
of data transferred is called
a stream, which is simply a
sequence of bytes.
Being connection-oriented means
that before actually transmitting
data, you must open the connection
between the two end points.
The data can be transferred
in full duplex (send and receive
on a single connection). When
the transfer is done, you have
to close the connection to free
system resources. Both ends
know when the session is opened
(begin) and is closed (end).
The data transfer cannot take
place before both ends have
agreed upon the connection.
The connection can be closed
by either side; the other is
notified. Provision is made
to close gracefully or just
abort the connection.
Being stream oriented means
that the data is an anonymous
sequence of bytes. There is
nothing to make data boundaries
apparent. The receiver has no
means of knowing how the data
was actually transmitted. The
sender can send many small data
chunks and the receiver receive
only one big chunk, or the sender
can send a big chunk, the receiver
receiving it in a number of
smaller chunks. The only thing
that is guaranteed is that all
data sent will be received without
any error and in the correct
order. Should any error occur,
it will automatically be corrected
(retransmitted as needed) or
the error will be notified if
it can't be corrected.
At the program level, the TCP
stream look like a flat file.
When you write data to a flat
file, and read it back later,
you are absolutely unable to
know if the data has been written
in only one chunk or in several
chunks. Unless you write something
special to identify record boundaries,
there is nothing you can do
to learn it afterward. You can,
for example, use CR or CR LF
to delimit your records just
like a flat text file.
At the programming level, TWSocket
is fairly simple to use. To
send data, you just need to
call the Send method (or any
variation such as SendStr) to
give the data to be transmitted.
TWSocket will put it in a buffer
until it can be actually transmitted.
Eventually the data will be
sent in the background (the
Send method returns immediately
without waiting for the data
to be transmitted) and the OnDataSent
event will be generated once
the buffer is emptied.
To receive data, a program
must wait until it receives
the OnDataAvailable event. This
event is triggered each time
a data packet comes from the
lower level. The application
must call the Receive method
to actually get the data from
the low-level buffers. You have
to Receive all the data available
or your program will go in an
endless loop because TWSocket
will trigger the OnDataAvailable
again if you didn't Receive
all the data.
As the data is a stream of
bytes, your application must
be prepared to receive data
as sent from the sender, fragmented
in several chunks or merged
in bigger chunks. For example,
if the sender sent "Hello
" and then "World!",
it is possible to get only one
OnDataAvailable event and receive
"Hello World!" in
one chunk, or to get two events,
one for "Hello " and
the other for "World!".
You can even receive more smaller
chunks like "Hel",
"lo wo" and "rld!".
What happens depends on traffic
load, router algorithms, random
errors and many other parameters
you can't control.
On the subject of client/server
applications, most applications
need to know command boundaries
before being able to process
data. As data boundaries are
not always preserved, you cannot
suppose your server will receive
a single complete command in
one OnDataAvailable event. You
can receive only part of a request
or maybe two or more request
merged in one chunk. To overcome
this difficulty, you must use
delimiters.
Most TCP/IP protocols, like
SMTP, POP3, FTP and others,
use CR/LF pair as command delimiter.
Each client request is sent
as is with a CR/LF pair appended.
The server receives the data
as it arrives, assembles it
in a receive buffer, scans for
CR/LF pairs to extract commands
from the received stream, and
removes them from the receive
buffer.
UDP
Short for User Datagram Protocol,
a connectionless protocol
that, like TCP, runs on top
of IP networks. Unlike TCP/IP,
UDP/IP provides very few error
recovery services, offering
instead a direct way to send
and receive datagrams over an
IP network. It's used primarily
for broadcasting
messages over a network.
UDP stands for User Datagram
Protocol. It is described in
STD-6/RFC-768 and provides a
connectionless host-to-host
communication path. UDP has
minimal overhead:; each packet
on the network is composed of
a small header and user data.
It is called a UDP datagram.
UDP preserves datagram boundaries
between the sender and the receiver.
It means that the receiver socket
will receive an OnDataAvailable
event for each datagram sent
and the Receive method will
return a complete datagram for
each call. If the buffer is
too small, the datagram will
be truncated. If the buffer
is too large, only one datagram
is returned, the remaining buffer
space is not touched.
UDP is connectionless. It means
that a datagram can be sent
at any moment without prior
advertising, negotiation or
preparation. Just send the datagram
and hope the receiver is able
to handle it.
UDP is an unreliable protocol.
There is absolutely no guarantee
that the datagram will be delivered
to the destination host. But
to be honest, the failure rate
is very low on the Internet
and nearly null on a LAN unless
the bandwidth is full.
Not only the datagram can be
undelivered, but it can be delivered
in an incorrect order. It means
you can receive a packet before
another one, even if the second
has been sent before the first
you just received. You can also
receive the same packet twice.
Your application must be prepared
to handle all those situations:
missing datagram, duplicate
datagram or datagram in the
incorrect order. You must program
error detection and correction.
For example, if you need to
transfer some file, you'd better
set up a kind of zmodem protocol.
The main advantages for UDP
are that datagram boundaries
are respected, you can broadcast,
and it is fast.
The main disadvantage is unreliability
and therefore complicated to
program at the application level.
ADDRESSING
TCP and UDP use the same addressing
scheme. An IP address (32 bits
number, always written as four
8-bit number expressed as unsigned
3-digit decimal numbers separated
by dots such as 193.174.25.26)
and a port number (a 16-bit
number expressed as a unsigned
decimal number).
The IP address is used by the
low-level protocol (IP) to route
the datagram to the correct
host on the specified network.
Then the port number is used
to route the datagram to the
correct host process (a program
on the host).
For a given protocol (TCP or
UDP), a single host process
can exist at a time to receive
data sent to the given port.
Usually one port is dedicated
to one process.
tcp vs. udp for file
transfer
i think tcp is an overused protocol;
i think udp is an underused
protocol. this is an argument
i've been having quite a bit
with people lately, so i've
decided i'll lay out my reasoning
here so i don't have to type
or recite it at people over
and over.
since i first wrote this (jan
1998 or so), tcp extensions
that fix many of my complaints
(selective acknowledgment, large
windows, tcp for transactions)
have seen more widespread implementation.
while they are a step in the
right direction (well, t/tcp
was a terrible security blunder...),
tcp will always be a stream
protocol, and thus will never
be an optimal transport for
some applications.
advantages of tcp
· the operating system
does all the work. you just
sit back and watch the show.
no need to have the same bugs
in your code that everyone else
did on their first try; it's
all been figured out for you.
· since it's in the
os, handling incoming packets
has fewer context switches from
kernel to user space and back;
all the reassembly, acking,
flow control, etc is done by
the kernel.
· tcp guarantees three
things: that your data gets
there, that it gets there in
order, and that it gets there
without duplication. (the truth,
the whole truth, and nothing
but the truth...)
· routers may notice
tcp packets and treat them specially.
they can buffer and retransmit
them, and in limited cases preack
them.
· tcp has good relative
throughput on a modem or a lan.
disadvantages of tcp
· the operating system
may be buggy, and you can't
escape it. it may be inefficient,
and you have to put up with
it. it may be optimized for
conditions other than the ones
you are facing, and you may
not be able to retune it.
· tcp makes it very
difficult to try harder; you
can set a few socket options,
but beyond that you have to
tolerate the built in flow control.
· tcp may have lots
of features you don't need.
it may waste bandwidth, time,
or effort on ensuring things
that are irrelevant to the task
at hand.
· tcp has no block boundaries;
you must create your own.
· routers on the internet
today are out of memory. they
can't pay much attention to
tcp flying by, and try to help
it. design assumptions of tcp
break down in this environment.
· tcp has relatively
poor throughput on a lossy,
high bandwidth, high latency
link, such as a satellite connection
or an overfull t1.
· tcp cannot be used
for broadcast or multicast transmission.
· tcp cannot conclude
a transmission without all data
in motion being explicitly acked.
disadvantages of udp
· there are no guarantees
with udp. a packet may not be
delivered, or delivered twice,
or delivered out of order; you
get no indication of this unless
the listening program at the
other end decides to say something.
tcp is really working in the
same environment; you get roughly
the same services from ip and
udp. however, tcp makes up for
it fairly well, and in a standardized
manner.
· udp has no flow control.
implementation is the duty of
user programs.
· routers are quite
careless with udp. they never
retransmit it if it collides,
and it seems to be the first
thing dropped when a router
is short on memory. udp suffers
from worse packet loss than
tcp.
advantages of udp
· it doesn't restrict
you to a connection based communication
model, so startup latency in
distributed applications is
much lower, as is operating
system overhead.
· all flow control,
acking, transaction logging,
etc is up to user programs;
a broken os implementation is
not going to get in your way.
additionally, you only need
to implement and use the features
you need.
· the recipient of udp
packets gets them unmangled,
including block boundaries.
· broadcast and multicast
transmission are available with
udp.
disadvantages of tcp
for file transfer
· startup latency is
significant. it takes at least
twice rtt to start getting data
back.
· tcp allows a window
of at most 64k, and the acking
mechanism means that packet
loss is misdetected. tcp stalls
easily under packet loss. tcp
is more throttled by rtt than
bandwidth.
· tcp transfer servers
have to maintain a separate
socket (and often separate thread)
for each client.
· load balancing is
crude and approximate. especially
on local networks that allow
collisions, two simultaneous
tcp transfers have a tendency
to fight with each other, even
if the sender is the same.
advantages of udp for
file transfer
· latency can be as low
as rtt if the protocol is suitably
designed.
· flow control is up
to user space; windows can be
infinite, artificial stalls
nonexistant, latency well tolerated,
and maximum speeds enforced
only by real network bandwidth,
yet actual speeds chosen by
agreement of sender and receiver.
· receiving an image
simultaneously from multiple
hosts is much easier with udp,
as is sending one to multiple
hosts, especially if they happen
to be part of the same broadcast
or multicast group.
a single sending host with
multiple transfers proceeding
can balance them with excellent
precision.
--------------------------------------------------------------------------------
The Internet runs on a hierarchical
protocol stack. A simplified
version of this is shown in
figure 1 . The layer common
to all Internet applications
is the IP (Internet Protocol)
layer. This layer provides a
connectionless, unreliable packet
based delivery service. It can
be described as connectionless
because packets are treated
independently of all others.
The service is unreliable because
there is no guarantee of delivery.
Packets may be silently dropped,
duplicated or delayed and may
arrive out of order. The service
is also called a best effort
service, all attempts to deliver
a packet will be made, with
unreliability only caused by
hardware faults or exhausted
resources.
As there is no sense of a connection
at the IP level there are no
simple methods to provide a
quality of service (QoS). QoS
is a request from an application
to the network to provide a
guarantee on the quality of
a connection. This allows an
application to request a fixed
amount of bandwidth from the
network, and assume it will
be provided, once the QoS request
has been accepted. Also a fixed
delay, i.e. no jitter and in
order delivery can be assumed.
A network that supports QoS
will be protected from congestion
problems, as the network will
refuse connections that request
larger resources than can be
supplied. An example of a network
that supports QoS is the current
telephone network, where every
call is guaranteed the bandwidth
for the call. Most users at
some point have heard the overloaded
signal where the network cannot
provide the requested resource
required to make a call.
The application decides which
transport protocol is used.
The two protocols shown here,
TCP and UDP are the most commonly
used ones. TCP provides a reliable
connection and is used by the
majority of current Internet
applications. TCP, besides being
responsible for error checking
and correcting, is also responsible
for controlling the speed at
which this data is sent. TCP
is capable of detecting congestion
in the network and will back
off transmission speed when
congestion occurs. These features
protect the network from congestion
collapse.
As discussed in the introduction,
VoIP is a real-time service.
For real-time properties to
be guaranteed to be met, a network
with QoS must be used to provide
fixed delay and bandwidth. It
has already been said that IP
cannot provide this. This then
presents a choice. If IP is
a requirement, which transport
layer should be used to provide
a system that is most likely
to meet real-time constraints.
As TCP provides features such
as congestion control, it would
be the preferred protocol to
use. Unfortunately due to the
fact that TCP is a reliable
service, delays will be introduced
whenever a bit error or packet
loss occurs. This delay is caused
by retransmission of the broken
packet, along with any successive
packets that may have already
been sent. This can be a large
source of jitter.
TCP uses a combination of four
algorithms to provide congestion
control, slow start, congestion
avoidance, fast retransmit and
fast recovery. These algorithms
all use packet loss as an indication
of congestion, and all alter
the number of packets TCP will
send before waiting for acknowledgments
of those packets. These alterations
affect the bandwidth available
and also change delays seen
on a link, providing another
source of jitter.

Figure 1: Simplified IP protocol
stack
Combined, TCP raises jitter
to an unacceptable level rendering
TCP unusable for real-time services.
Voice communication has the
advantage of not requiring a
completely reliable transport
level. The loss of a packet
or bit error will often only
introduce a click or a minor
break into the output.
For these reasons most VoIP
applications use UDP for the
voice data transmission. UDP
is a thin layer on top of IP
that provides a way to distinguish
among multiple programs running
on a single machine. UDP also
inherits all of the properties
of IP that TCP attempts to hide.
UDP is therefore also a packet
based, connectionless, best-effort
service. It is up to the application
to split data into packets,
and provide any necessary error
checking that is required.
Because of this, UDP allows
the fastest and most simple
way of transmitting data to
the receiver. There is no interference
in the stream of data that can
be possibly avoided. This provides
the way for an application to
get as close to meeting real-time
constraints as possible.
UDP however provides no congestion
control systems. A congested
link that is only running TCP
will be approximately fair to
all users. When UDP data is
introduced into this link, there
is no requirement for the UDP
data rates to back off, forcing
the remaining TCP connections
to back off even further. This
can be though of as UDP data
not being a ``good citizen''.
The aim of this project is to
characterise the quantity of
this drop off in TCP performance.
TCP vs. UDP
| TCP |
UDP |
· Connection-Oriented
· Reliability in
delivery of messages
· Splitting messages
into datagrams
· Keep track of
order (or sequence)
· Use checksums
for detecting errors |
·
Connectionless ·
No attempt to fragment messages
· No reassembly and
synchronization ·
In case of error, message
is retransmitted ·
No acknowledgment |
o Remote procedures are
not idempotent
o Reliability is a must
o Messages exceed UDP packet
size |
o Remote procedures are
idempotent
o Server and client messages
fit completely within a
packet
o The server handles multiple
clients (UDP is stateless) |
| Server Process |
|
|
|
|
|
| socket() |
|
|
|
|
|
| | |
|
|
|
|
|
| bind() |
|
|
|
|
|
| | |
|
TCP |
Server Process |
|
UDP |
| listen |
|
|
socket() |
|
| |
| | |
|
Client Process |
| |
|
Client Process |
| accept() |
|
socket() |
bind() |
|
| |
| | |
|
| |
| |
|
socket() |
| Get a blocked client |
<-1-> |
connect() |
recvfrom() |
|
| |
| | |
|
| |
| |
|
bind() |
| read() |
<-2-- |
write() |
Get a blocked client |
|
| |
| | |
|
| |
| |
<--- |
sendto() |
| process request |
|
| |
process request |
|
| |
| | |
|
| |
| |
|
| |
| write |
--3-> |
read() |
sendto() |
---> |
recvfrom() |