Network Working Group                                       J. Ioannidis
Request for Comments:  1235                              G. Maguire, Jr.
                                                     Columbia University
                                          Department of Computer Science
                                                               June 1991

The Coherent File Distribution Protocol

Status of this Memo

   This memo describes the Coherent File Distribution Protocol (CFDP).
   This is an Experimental Protocol for the Internet community.
   Discussion and suggestions for improvement are requested.  Please
   refer to the current edition of the "IAB Official Protocol Standards"
   for the standardization state and status of this protocol.
   Distribution of this memo is unlimited.

Introduction

   The Coherent File Distribution Protocol (CFDP) has been designed to
   speed up one-to-many file transfer operations that exhibit traffic
   coherence on media with broadcast capability.  Examples of such
   coherent file transfers are identical diskless workstations booting
   simultaneously, software upgrades being distributed to more than one
   machines at a site, a certain "object" (bitmap, graph, plain text,
   etc.) that is being discussed in a real-time electronic conference or
   class being sent to all participants, and so on.

   In all these cases, we have a limited number of servers, usually only
   one, and <n> clients (where <n> can be large) that are being sent the
   same file.  If these files are sent via multiple one-to-one
   transfers, the load on both the server and the network is greatly
   increased, as the same data are sent <n> times.

   We propose a file distribution protocol that takes advantage of the
   broadcast nature of the communications medium (e.g., fiber, ethernet,
   packet radio) to drastically reduce the time needed for file transfer
   and the impact on the file server and the network.  While this
   protocol was developed to allow the simultaneous booting of diskless
   workstations over our experimental packet-radio network, it can be
   used in any situation where coherent transfers take place.

   CFDP was originally designed as a back-end protocol; a front-end
   interface (to convert file names and requests for them to file
   handles) is still needed, but a number of existing protocols can be
   adapted to use with CFDP.  Two such reference applications have been
   developed; one is for diskless booting of workstations, a simplified

   BOOTP [3] daemon (which we call sbootpd) and a simple, TFTP-like
   front end (which we call vtftp).  In addition, our CFDP server has
   been extended to provide this front-end interface.  We do not
   consider this front-end part of the CFDP protocol, however, we
   present it in this document to provide a complete example.

   The two clients and the CFDP server are available as reference
   implementations for anonymous ftp from the site CS.COLUMBIA.EDU
   (128.59.16.20) in directory pub/cfdp/.  Also, a companion document
   ("BOOTP extensions to support CFDP") lists the "vendor extensions"
   for BOOTP (a-la RFC-1084 [4]) that apply here.

Overview

   CFDP is implemented as a protocol on top of UDP [5], but it can be
   implemented on top of any protocol that supports broadcast datagrams.
   Moreover, when IP multicast [6] implementations become more
   widespread, it would make more sense to use a multicast address to
   distribute CFDP packets, in order to reduce the overhead of non-
   participating machines.

   A CFDP client that wants to receive a file first contacts a server to
   acquire a "ticket" for the file in question.  This server could be a
   suitably modified BOOTP server, the equivalent of the tftpd daemon,
   etc. The server responds with a 32-bit ticket that will be used in
   the actual file transfers, the block size sent with each packet
   (which we shall call "BLKSZ" from now on), and the size (in bytes) of
   the file being transferred ("FILSZ").  BLKSZ should be a power of
   two.  A good value for BLKSZ is 512. This way the total packet size
   (IPheader+UDPheader+CFDPheader+data=20+8+12+512=552), is kept well
   under the magic number 576, the minimum MTU for IP networks [7].
   Note that this choice of BLKSZ supports transfers of files that are
   up to 32 Mbytes in size.  At this point, the client should allocate
   enough buffer space (in memory, or on disk) so that received packets
   can be placed directly where they belong, in a way similar to the
   NetBLT protocol [8].

   It is assumed that the CFDP server will also be informed about the
   ticket so that it can respond to requests.  This can be done, for
   example, by having the CFDP server and the ticket server keep the
   table of ticket-to-filename mappings in shared memory, or having the
   CFDP server listening on a socket for this information.  To reduce
   overhead, it is recommended that the CFDP server be the same process
   as the front-end (ticket) server.

   After the client has received the ticket for the file, it starts
   listening for (broadcast) packets with the same ticket, that may
   exist due to an in-progress transfer of the same file.  If it cannot

   detect any traffic, it sends to the CFDP server a request to start
   transmitting the whole file.  The server then sends the entire file
   in small, equal-sized packets consisting of the ticket, the packet
   sequence number, the actual length of data in this packet (equal to
   BLKSZ, except for the last packet in the transfer), a 32-bit
   checksum, and the BLKSZ bytes of data.  Upon receipt of each packet,
   the client checksums it, marks the corresponding block as received
   and places its contents in the appropriate place in the local file.
   If the client does not receive any packets within a timeout period,
   it sends to the CFDP server a request indicating which packets it has
   not yet received, and then goes back to the receiving mode.  This
   process is repeated until the client has received all blocks of the
   file.

   The CFDP server accepts requests for an entire file ("full" file
   requests, "FULREQ"s), or requests for a set of BLKSZ blocks
   ("partial" file requests, "PARREQ"s).  In the first case, the server
   subsequently broadcasts the entire file, whereas in the second it
   only broadcasts the blocks requested.  If a FULREQ or a PARREQ
   arrives while a transfer (of the same file) is in progress, the
   requests are ignored.  When the server has sent all the requested
   packets, it returns to its idle state.

   The CFDP server listens for requests on UDP/IP port "cfdpsrv". The
   clients accept packets on UDP/IP port "cfdpcln" (both to be defined
   by the site administrator), and this is the destination of the
   server's broadcasts.  Those two port numbers are sent to the client
   with the initial handshake packet, along with the ticket.  If the
   minimal ticket server is implemented as described later in this
   document, it is recommended (for interoperability reasons) that it
   listens for requests on UDP/IP port 120 ("cfdptkt").

   Let us now examine the protocol in more detail.

Protocol Specification

Initial Handshake (not strictly part of the protocol):

   The client must acquire a ticket for the file it wishes to transfer,
   and the CFDP server should be informed of the ticket/filename
   mapping.  Again, this can be done inside a BOOTP server, a modified
   TFTP server, etc., or it can be part of the CFDP server itself.  We
   present here a suggested protocol for this phase.

   The client sends a "Request Ticket" (REQTKT) request to the CFDP
   Ticket server, using UDP port "cfdptkt".  If the address of the
   server is unknown, the packet can be sent to the local broadcast
   address.  Figure 1 shows the format of this packet.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |      'R'      |      'Q'      |      'T'      |      'K'      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      /                                                               /
      \     Filename, null-terminated, up to 512 octets               \
      /                                                               /
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                       Fig. 1: "ReQuest TicKet" packet.

   The filename is limited to 512 octets.  This should not cause a
   problem in most, if not all, cases.

   The ticket server replies with a "This is Your Ticket" (TIYT) packet
   containing the ticket.  Figure 2 shows the format of this packet.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |      'T'      |      'I'      |      'Y'      |      'T'      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           "ticket"                            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                       BLKSZ (by default 512)                  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                             FILSZ                             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |            IP address of CFDP server (network order)          |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |   client UDP port# (cfdpcln)  |   server UDP port# (cfdpsrv)  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                    Fig. 2: "This Is Your Ticket" packet.

   The reply is sent to the UDP port that the RQTK request came from.
   The IP address of the CFDP server is provided because the original
   handshake server is not necessarily on the same machine as the ticket
   server, let alone the same process.  Similarly, the cfdpcln and
   cfdpsrv port numbers (in network order) are communicated to the
   client.  If the client does not use this ticket server, but rather
   uses BOOTP or something else, that other server should be responsible
   for providing the values of cfdpcln and cfdpsrv.  The ticket server
   also communicates this ticket/filename/filesize to the real CFDP
   server.  It is recommended that the ticket requests be handled by the

   regular CFDP server, in which case informing the CFDP server of the
   ticket/filename binding is trivial (as it is internal to the
   process).

   Once the client has received the ticket for the filename it has
   requested, the file distribution can proceed.

Client Protocol:

   Once the ticket has been established, the client starts listening for
   broadcast packets on the cfdpcln/udp port that have the same "ticket"
   as the one it is interested in.  In the state diagram below, the
   client is in the CLSTART state.  If the client can detect no packets
   with that ticket within a specified timeout period, "TOUT-1", it
   assumes that no transfer is in progress.  It then sends a FULREQ
   packet (see discussion above) to the CFDP server, asking it to start
   transmitting the file, and goes back to the CLSTART state (so that it
   can time out again if the FULREQ packet is lost).  Figure 3 shows the
   format of the FULREQ packet.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           "ticket"                            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           checksum                            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |      'F'      |       0       |         length == 0           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                  Fig. 3: FULREQ (FULl file REQuest) packet.

   When the first packet arrives, the client moves to the RXING state
   and starts processing packets.  Figure 4 shows the format of a data
   packet.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           "ticket"                            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           checksum                            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |          block number         |          data length          |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      /                                                               /
      \      up to BLKSZ octets of data                               \
      /                                                               /
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                             Fig. 4: Data Packet

   The format is self-explanatory.  "Block number" the offset (in
   multiples of BLKSZ) from the beginning of the file, data length is
   always BLKSZ except for the very last packet, where it can be less
   than that, and the rest is data.

   As each packet arrives, the client verifies the checksum and places
   the data in the appropriate position in the file.  While the file is
   incomplete and packets keep arriving, the client stays in the RXING
   state, processing them.  If the client does not receive any packets
   within a specified period of time, "TOUT-2", it times out and moves
   to the INCMPLT state.  There, it determines which packets have not
   yet been received and transmits a PARREQ request to the server.  This
   request consists of as many block numbers as will fit in the data
   area of a data packet.  If one such request is not enough to request
   all missing packets, more will be requested when the server has
   finished sending this batch and the client times out.  Also, if the
   client has sent a PARREQ and has not received any data packets within
   a timeout period, "TOUT-3", it retransmits the same PARREQ.  Figure 5
   shows the format of the PARtial REQuest packet.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           "ticket"                            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           checksum                            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |      'P'      |       0       |      data length (2*N)        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           Block #0            |           Block #1            |
      |           Block #2            |           Block #3            |
      /                                                               /
      \      data  (block numbers requested)                          \
      /                                                               /
      |           Block #N-2          |           Block #N-1          |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                Fig. 5: PARREQ (PARtial file REQuest) packet.

   When all packets have been received the client enters the CLEND state
   and stops listening.

   Figure 6 summarizes the client's operations in a state diagram.

                           +-----------+
                           |  CLSTART  |
                           |           | <---.
                           |   send    |     | timeout TOUT-1
                           |  FULREQ   | ----'
                           |           |
                           +-----------+
                                 |
             received packet     | received packet
      .-----------------------.  |
      |                       V  V
     +---------+             +---------+
     | INCMPLT |             |  RXING  |
     |         |   timeout   |         | <---.
     |  send   |<------------| process |     | received packet
     | PARREQ  |    TOUT-2   | packet  | ----'
     |         |             |         |
     +---------+             +---------+
        ^   |                     |
        |   |                     |finished
        `---'                     |
       timeout                    V
        TOUT-3               +---------+
                             |  CLEND  |
                             +---------+

                Fig. 6: Client State Transition Diagram

Server Protocol:

   As described above, the CFDP server accepts two kinds of requests: a
   request for a full file transfer, "FULREQ", and a request for a
   partial (some blocks only) file transfer, "PARREQ".  For the first,
   it is instructed to start sending out the contents of a file.  For
   the second, it will only send out the requested blocks.  The server
   should know at all times which files correspond to which "tickets",
   and handle them appropriately.  Note that this may run into
   implementation limits on some Unix systems (e.g., on older systems, a
   process could only have 20 files open at any one time), but that
   should not normally pose a problem.

   The server is initially in the SIDLE state, idling (see diagram
   below).  When it receives a FULREQ packet, it goes to the FULSND
   state, whence it broadcasts the entire contents of the file whose
   ticket was specified in the FULREQ packet.  When it is done, it goes
   back to the SIDLE state. When it receives a PARREQ packet, it goes to
   the PARSND state and broadcasts the blocks specified in the PARREQ
   packet. When it has finished processing the block request, it goes

   once again back to the SIDLE state.

                     receive    +-------+    receive
                .---------------| SIDLE |---------------.
                |    FULREQ     +-------+     PARREQ    |
                |                 ^   ^                 |
                |                 |   |                 |
                V                 |   |                 V
            +--------+            |   |            +--------+
            | FULSND |            |   |            | PARSND |
            |        |    done    |   |    done    |        |
            |  send  |------------'   `------------|  send  |
            | entire |                             | req'ed |
            |  file  |                             | blocks |
            +--------+                             +--------+

                Fig. 7: Server State Transition Diagram

Packet Formats

   The structure of the packets has been already described.  In all
   packet formats, numbers are assumed to be in network order ("big-
   endian"), including the ticket and the checksum.

   The checksum is the two's complement of the unsigned 32-bit sum with
   no end-around-carry (to facilitate implementation) of the rest of the
   packet.  Thus, to compute the checksum, the sender sets that field to
   zero and adds the contents of the packet including the header.  The
   it takes the two's complement of that sum and uses it as the
   checksum.  Similarly, the receiver just adds the entire contents of
   the packet, ignoring overflows, and the result should be zero.

Tuneable Parameters: Packet Size, Delays and Timeouts

   It is recommended that the packet size be less than the minimum MTU
   on the connected network where the file transfers are taking place.
   We want this so that there be no fragmentation; one UDP packet should
   correspond to one hardware packet.  It is further recommended that
   the packet size be a power of two, so that offsets into the file can
   be computed from the block number by a simple logical shift
   operation.  Also, it is usually the case that page-aligned transfers
   are faster on machines with a paged address space.  Small packet
   sizes are inefficient, since the header will be a larger fraction of
   the packet, and packets larger than the MTU will be fragmented.  A
   good selection for BLKSZ is 512 or 1024. Using that BLKSZ, one can
   transfer files up to 32MB or 64MB respectively (since the limit is
   the 16-bit packet sequence number).  This is adequate for all but
   copying complete disks, and it allows twice as many packets to be

   requested in a PARREQ request than if the sequence number were 32
   bits.  If larger files must be transferred, they could be treated as
   multiple logical files, each with a size of 32MB (or 64MB).

   Since most UDP/IP implementations do not buffer enough UDP datagrams,
   the server should not transmit packets faster than its clients can
   consume them.  Since this is a one-to-many transfer, it is not
   desirable to use flow-control to ensure that the server does not
   overrun the clients.  Rather, we insert a small delay between packets
   transmitted.  A good estimate of the proper delay between two
   successive packets is twice the amount of time it takes for the
   interface to transmit a packet.  On Unix implementations, the ping
   program can be used to provide an estimate of this, by specifying the
   same packet length on the command line as the expected CFDP packet
   length (usually 524 bytes).

   The timeouts for the client are harder to compute. While there is a
   provision for the three timeouts (TOUT-1, TOUT-2 and TOUT-3) to be
   different, there is no compelling reason not to make them the same.
   Experimentally, we have determined that a timeout of 6-8 times the
   transfer time for a packet works best.  A timeout of less than that
   runs the risk of mistaking a transient network problem for a timeout,
   and more than that delays the transfer too much.

Summary

   To summarize, here is the timeline of a sample file distribution
   using CFDP to three clients.  Here we request a file with eight
   blocks.  States are capitalized, requests are preceded with a '<'
   sign, replies are followed by a '>' sign, block numbers are preceded
   with a '#' sign, and actions are in parentheses:

SERVER       CLIENT1     CLIENT-2      CLIENT-3      comments

IDLE                                                everybody idle
             CLSTART                                CL1 wants a file
             <TKRQ                                  requests ticket
TIYT>                                               server replies
             (timeout)                              listens for traffic
             <FULREQ                                full request
#0           RXING                                  CL1 starts receiving
             (rx 0)
#1           (rx 1)      CLSTART                    CL2 decides to join
                         <TKRQ
#2           (rx 2)                                 SRV still sending
TIYT>                                               responds to TKRQ
#3           (rx 3)      (listens)                  CL2 listens
                         RXING                      found traffic

#4           (rx 4)      (rx 4)        CLSTART      CL3 joins in
                                       <TKRQ
#5           (missed)    (rx 5)                     CL1 missed a packet
TIYT>                                  (listens)
#6           (rx 6)      (rx 6)        RXING        CL3 found traffic

#7           (rx 7)      (rx 7)        (rx 7)       Server finished
IDLE
             (wait)      (wait)        (wait)       CL1 managed to
             (timeout)   (wait)        (wait)       timeout
             <PARREQ[5]  (timeout)     (timeout)    CL1 blockrequests...
#5           (rx 5)      <PARREQ[0123] <PARREQ[0123456] ignored by SRV
             CLEND                                  CL1 has all packets
IDLE                     (wait)        (wait)       CL2+3 missed #5
                         (timeout)     (timeout)
                         <PARREQ[0123] <PARREQ[0123456] CL2's req gets
#0                       (rx 0)        (rx 0)       through, CL3 ignored
#1                       (rx 1)        (rx 1)       moving along
#2                       (rx 2)        (rx 2)
#3                       (rx 3)        (rx 3)
IDLE                     CLEND         (wait)       CL2 finished
                                       (timeout)
                                       <PARREQ[456]
#4                                     (rx 4)
#5                                     (rx 5)
#5                                     (rx 6)
IDLE                                   CLEND        CL3 finished

References

   [1] Sollins, K., "The TFTP Protocol (Revision 2)", RFC 783, MIT, June
       1981.

   [2] Finlayson, R., "Bootstrap Loading Using TFTP", RFC 906, Stanford,
       June 1984.

   [3] Croft, W., and J. Gilmore, "Bootstrap Protocol", RFC 951,
       Stanford and SUN Microsystems, September 1985.

   [4] Reynolds, J., "BOOTP Vendor Information Extensions", RFC 1084,
       USC/Information Sciences Institute, December 1988.

   [5] Postel, J., "User Datagram Protocol", RFC 768, USC/Information
       Sciences Institute, August 1980.

   [6] Deering, S., "Host Extensions for IP Multicasting", RFC 1112,
       Stanford University, August 1989.

   [7] Postel, J., "Internet Protocol - DARPA Internet Program Protocol
       Specification", RFC 791, DARPA, September 1981.

   [8] Clark, D., Lambert, M., and L. Zhang, "NETBLT: A Bulk Data
       Transfer Protocol", RFC 998, MIT, March 1987.

Security Considerations

   Security issues are not discussed in this memo.

Authors' Addresses

   John Ioannidis
   Columbia University
   Department of Computer Science
   450 Computer Science
   New York, NY 10027

   EMail:  ji@cs.columbia.edu

   Gerald Q. Maguire, Jr.
   Columbia University
   Department of Computer Science
   450 Computer Science
   New York, NY 10027

   Phone:  (212) 854-2736

   EMail:  maguire@cs.columbia.edu