Networking and Socket Programming is one of the important area of Java
programming language, especially for those programmers, who are working
in client server based applications. Knowledge of important protocols
e.g. TCP and UDP
in detail is very important, especially if you are in business of
writing high frequency trading application, which communicate via FIX
Protocol or native exchange protocol. In this article, we will some of
the
frequently asked questions on networking and socket programming,
mostly based around TCP IP protocol. This article is kinda light on NIO
though, as it doesn't include questions from multiplexing, selectors,
ByteBuffer and
FileChannel but it does include classical questions like
difference between IO and NIO. Main focus of this post is to make Java developer familiar with low level parts e.g. how
TCP and
UDP
protocol works, socket options and writing multi-threaded servers in
Java. Questions discussed here is not really tied up with Java
programming language, and can be used in any programming language, which
allows programmers to write client-server applications. By the way, If
you are going for interview on Investment banks for core Java developer
role, you better prepare well on Java NIO, Socket Programming, TCP, UDP
and Networking along with other popular topics e.g. multi-threading, Collections API and Garbage Collection tuning.
You can also contribute any question, which is asked to you or related
to socket programming and networking and can be useful for Java
interviews.
Java Networking and Socket Programming Questions Answers
Here is my list of 15 interview questions related to networking basics,
internet protocol and socket programming in Java. Though it doesn't
contain basic questions form API e.g.
Server,
ServerSocket, but it focus on high level concept of writing scalable server in Java using NIO selectors and how to implement that using threads,
there limitations and issues etc. I will probably add few more
questions based on some best practices while writing socket based
application in Java. If you know a good question on this topic, feel
free to suggest.
1) Difference between TCP and UDP protocol?
There are many differences between TCP (Transmission control Protocol)
and UDP (User Datagram Protocol), but main is TCP is connection
oriented, while UDP is connection less. This means TCP provides
guaranteed delivery of messages in the order they are sent, while UDP
doesn't provide any delivery guarantee. Because of this guarantee, TCP
is slower than UDP, as it needs to perform more work. TCP is best suited
for message, which you can't afford to loss, e.g. order and trade
messages in electronic trading, wire transfer in banking and finance
etc. UDP is more suited for media transmission, where loss of one
packet, known as datagrams is affordable and doesn't affect quality of
service. This answer is enough for most of the interviews, but you need
to be more detailed when you are interviewing as Java developer for high
frequency trading desk. Some of the points which many candidate forget
to mention is about
order and
data boundary. In TCP,
messages are guaranteed to be delivered in the same order as they are
sent but data boundary is not preserved, which means multiple messages
can be combined and sent together, or receiver may receive one part of
the message in one packet and other part of the message in next packet.
Though application will receive full message and in the same order. TCP
protocol will do assembling of message for you. On the other hand, UDP
sends full message in a datagram packet, if clients receives the packet
it is guaranteed that it will get the full message, but there is no
guarantee that packet will come in same order they are sent. In short,
you must mention following differences between TCP and UDP protocol
while answering during interview :
- TCP is guaranteed delivery, UDP is not guaranteed.
- TCP guarantees order of messages, UDP doesn't.
- Data boundary is not preserved in TCP, but UDP preserves it.
- TCP is slower compared to UDP.
2) How does TCP handshake works?
Three messages are exchanged as part of TCP head-shake e.g. Initiator sends
SYN, upon receiving this Listener sends
SYN-ACK, and finally initiator replied with
ACK, at this point TCP connection is moved to
ESTABLISHED state. This process is easily understandable by looking at following diagram.
3) How do you implement reliable transmission in UDP protocol?
This is usually follow-up of previous interview question. Though UDP
doesn't provide delivery guarantee at protocol level, you can introduce
your own logic to maintain reliable messaging e.g. by introducing
sequence numbers and retransmission. If receiver find that it has missed
a sequence number, it can ask for replay of that message from Server.
TRDP protocol, which is used Tibco Rendezvous (a
popular high speed messaging middle-ware) uses UDP for faster messaging
and provides reliability guarantee by using sequence number and
retransmission.
4) What is Network Byte Order? How does two host communicate if they have different byte-ordering?
There are two ways to store two bytes in memory, little endian (least
significant byte at the starting address) and big endian (most
significant byte at the starting address). They are collectively known
as host byte order. For example, an Intel processor stores the 32-bit
integer as four consecutive bytes in memory in the order
1-2-3-4, where 1 is the most significant byte. IBM PowerPC processors would store the integer in the byte order
4-3-2-1. Networking protocols such as TCP are based on a specific
network byte order, which uses
big-endian
byte ordering. If two machines are communicating with each other and
they have different byte ordering, they are converted to network byte
order before sending or after receiving. Therefore, a little endian
micro-controller sending to a UDP/IP network must swap the order in
which bytes appear within multi byte values before the values are sent
onto the network, and must swap the order in which bytes appear in multi
byte values received from the network before the values are used. In
short, you can also say network byte order is standard of storing byte
during transmission, and it uses big endian byte ordering mechanism.
5) What is Nagle's algorithm?
If interviewer is testing your knowledge of TCP/IP protocol than it's
very rare for him not to ask this question. Nagle's algorithm is way of
improving performance of TCP/IP protocol and networks by reducing number
of TCP packets that needs to be sent over network. It works by
buffering small packets until buffer reaches Maximum Segment Size. Since
small packets, which contains only 1 or 2 bytes of data, has more
overhead in terms of TCP header, which is of 40 bytes. These small
packets can also leads to congestion in slow network. Nagle's algorithm
tries to improve efficiency of TCP protocol by buffering them, to send a
larger packet. Also Nagle's algorithm has negative effect on non small
writes, so if you are writing large data on packets than it's better to
disable Nagle's algorithm.
In general, Nagle's algorithm is a defence against careless
application, which sends lots of small packets to network, but it will
not benefit or have a negative effect on well written application, which
properly takes care of buffering.
6) What is TCP_NODELAY?
TCP_NODELAY is an option to disable Nagle's algorithm, provided by
various TCP implementations. Since Nagle's algorithm performs badly with
TCP delayed acknowledgement algorithm, it's better to disable Nagle's
when you are doing
write-write-read
operation. Where a read after two successive write on socket may get
delayed up-to 500 millisecond, until the second write has reached the
destination. If latency is more concern over bandwidth usage e.g. in a
network based multi-player game, user wants to see action from other
player immediately, it's better to bypass Nagle's delay by using
TCP_NODELAY flag.
7) What is multicasting or multicast transmission? Which Protocol is generally used for multicast? TCP or UDP?
Multi-casting or multicast transmission is one to many distribution,
where message is delivered to a group of subscribers simultaneously in a
single transmission from publisher. Copies of messages are
automatically created in other network elements e.g. Routers, but only
when the topology of network requires it. Tibco Rendezvous supports multicast transmission.
Multi-casting can only be implemented using UDP, because it sends full
data as datagram package, which can be replicated and delivered to other
subscribers. Since TCP is a point-to-point protocol, it can not deliver
messages to multiple subscriber, until it has link between each of
them. Though, UDP is not reliable, and messages may be lost or delivered
out of order. Reliable multicast protocols such as Pragmatic General
Multicast (PGM) have been developed to add loss detection and
retransmission on top of IP multicast. IP multicast is widely deployed
in enterprises, commercial stock exchanges, and multimedia content
delivery networks. A common enterprise use of IP multicast is for IPTV
applications
8) What is difference between Topic and Queue in JMS?
Main difference between Topic and Queue in Java Messaging Service comes
when we have multiple consumers to consumer messages. If we set-up
multiple listener thread
to consume messages from Queue, each messages will be dispatched to
only one thread and not all thread. On the other hand in case of Topic
each subscriber gets it's own copy of message.
9) What is difference between IO and NIO?
Main difference between NIO and IO is that NIO provides asynchronous,
non blocking IO, which is critical to write faster and scalable
networking systems. While most of utility from IO classes are blocking
and slow. NIO take advantage of asynchronous system calls in UNIX
systems such as
select() system call for network sockets. Using
select(), an application can monitor several resources at the same time and can also poll for network activity without blocking. The
select() system call identifies if data is pending or not, then
read() or
write() may be used knowing that they will complete immediately.
10) How do you write multi-threaded server in Java?
A multi-threaded server is the one which can server multiple clients
without blocking. Java provides excellent support to developer such
server. Prior to Java 1.4, you can write multi-threaded server using
traditional socket IO and threads. This had severe limitation on
scalability, because it creates new thread for each connection and you
can only create a fixed number of threads, depending upon machine's and
platform's capability. Though this design can be improved by using thread pools
and worker threads, it still a resource intensive design. After JDK 1.4
and NIO's introduction, writing scalable and multi-threaded server
become bit easier. You can easily create it in single thread by using
Selector, which takes advantage of asynchronous and non-blocking IO
model of Java NIO.
11) What is ephemeral port?
In TCP/IP connection usually contains four things, Server IP, Server
port, Client IP and Client Port. Out of these four, 3 are well known in
most of the time, what is not known is client port, this is where
ephemeral ports comes into picture. ephemeral ports are dynamic port
assigned by your machine's IP stack, from a specified range, known as
ephemeral port range, when a client connection explicitly doesn't
specify a port number. These are short lived, temporary port, which can
be reused once connection is closed, but most of IP software, doesn't
reuse ephemeral port, until whole range is exhausted. Similar to TCP,
UDP protocol also uses ephemeral port, while sending datagram . In Linux
ephemeral port range is from
32768 to
61000,
while in windows default ephemeral port range is 1025 to 5000.
Similarly different operating system has different ephemeral port ranges
12) What is sliding window protocol?
Sliding window protocol is a technique for controlling transmitted data
packets between two network computers where reliable and sequential
delivery of data packets is required, such as provided by Transmission
Control Protocol (TCP). In the sliding window technique, each packet
includes a unique consecutive sequence number, which is used by the
receiving computer to place data in the correct order. The objective of
the sliding window technique is to use the sequence numbers to avoid
duplicate data and to request missing data
13) When do you get "too many files open" error?
Just like File connection, Socket Connection also needs file
descriptors, Since every machine has limited number of file descriptors,
it's possible that they may ran out of file descriptors. When it
happen, you will see
"too many files open" error. You can check how many file descriptor per process is allowed on UNIX based system by executing
ulimit -n command or simply count entries on
/proc//fd/
14) What is TIME_WAIT state in TCP protocol? When does a socket connection goes to TIME_WAIT state?
When one end of TCP Connection closes it by making system call, it goes
into TIME_WAIT state. Since TCP packets can arrive in wrong order, the
port must not be closed immediately to allow late packets to arrive.
That's why that end of TCP connection goes into TIME_WAIT state. For
example, if client closes a socket connection than it will go to
TIME_WAIT state, similarly if server closes connection than you will see
TIME_WAIT there. You can check status of your TCP and UDP sockets by using these networking commands in UNIX.
15) What will happen if you have too many socket connections in TIME_WAIT state on Server?
When a socket connection or port goes into
TIME_WAIT state, it doesn't release file descriptor associated with it. File descriptor is only released when
TIME_WAIT state is gone i.e. after some specified configured time. If too many connections are in
TIME_WAIT state than your Server may ran out of file descriptors and start throwing
"too many files open" error, and stop accepting new connections.
That's all about in this list of
networking and socket programming interview questions and answers.
Though I have originally intended this list for Java programmers it is
equally useful for any programmer. In fact, this is bare minimum
knowledge of sockets and protocols every programmer should have. I have
found that C and C++ programmers are better answering these questions
than an average Java programmer. One reason of this may be because Java
programmers has got so many useful library e.g. Apache MINA, which does
all the low level work for them. Anyway, knowledge of fundamentals is
very important and everything else is just an excuse, but at same point I
also recommend using tried and tested libraries like Apache MINA for
production code.