Abstract: This document provides information for the novice Internet user about using the File Transfer Protocol (FTP). It explains what FTP is, what anonymous FTP is, and what an anonymous FTP archive site is. It shows a sample anonymous FTP session. It also discusses common ways files are packaged for efficient storage and transmission.
It is common for a user with files on more than one host to use the FTP program to transfer files from one host to another. In this case, the user has an account on both hosts involved, so he has passwords for both hosts.
However, Internet users may also take advantage of a wealth of information available from archive sites by using a general purpose account called "anonymous FTP".
Traditionally, this special anonymous user account accepts any string as a password, although it is common to use either the password "guest" or one's electronic mail (e-mail) address. Some archive sites now explicitly ask for the user's e-mail address and will not allow login with the "guest" password. Providing an e-mail address is a courtesy that allows archive site operators to get some idea of who is using their services.
You may also need to know if your machine uses an ASCII, EBCDIC, or other character set to know how likely a transfer of binary information will work, or whether such a transfer will require other keywords, such as is true for TENEX.
In the general case, you may assume that an ASCII transfer will always do the right thing for plain text files. However, more and more information is being stored in various compressed formats (which are discusssed later in this document), so knowing the binary characteristics of your machine may be important.
The following is an example of connecting to the nic.ddn.mil host to retrieve RFC 959, "File Transfer Protocol (FTP)."
Note several things about the session.
Also note that some FTP client implementations (eg, MVS systems) may not echo the reply codes or text as transmitted from the remote host. They may generate their own status lines or just hide the non-fatal replies from you. For the purposes of this doc ument, the more popular UNIX interface to the FTP client will be presented.
If you are not sure what format a file is in, you may need to transfer it a second time in the other mode (BINARY or ASCII) if your first guess is wrong. The extension at the end of the file name may give you a clue. File name extensions are described b elow.
Because some machines store text files differently than others, you may have to try your luck if you're not sure what format a file is in. A good guess is to try ASCII mode first, if you have grounds to suspect the file is a text file. Otherwise, try BI NARY mode. Try TENEX mode as a last resort.
Full details on the commands and options available are in the FTP documentation that comes with your system. You can also type "help at the FTP command prompt for a list of command options.
A copy of the UNIX version of the FTP documentation is available from the online manual. If your UNIX site has the manuals installed, type the following at the UNIX prompt:
Information stored on archive sites is often "transformed" in three common ways. "Compressing" (reducing the size of) the stored information makes more space available on the archive, and reduces the amount of data actually transferred across the network . "Bundling" several files into one larger file maintains the internal directory structure of the components, and allows users to transfer only one larger object rather than several (sometimes hundreds) of smaller files.
In addition, binary data is often converted into an ASCII format for transmission, a process referred to in this document as "transformation." Traditionally, Internet RFC 822-based electronic mail and USENET protocols did not allow the transmission of "binary" (8-bit) data; therefore, files in binary format had to be transformed into printable 7-bit ASCII before being transmission.
On many systems, various file naming conventions are used to help the remote user to determine the format of the stored information without first having to retrieve the files. Below we list the more common compression, bundling, and transformation conventions used on the Internet. This list is not intended to be exhaustive. In all cases public domain or freely-available implementations of the programs associated with these mechanisms are available on the network.
Filenames terminating in ".Z" normally signify files that have been compressed by the standard UNIX Lempel-Ziv "compress" utility. There is an equivalent program called "uncompress" to reverse the process and return the file to its original state. No bundling mechanism is provided, and the resulting files are always in binary format, regardless of the original format of the input data.
Performs a transformation of ASCII to binary (atob) and the reverse (btoa) in a standard format. Files so transformed often have filenames terminated with ".atob". No bundling or compression mechanisms are used.
A data transformation standard used to convert binary files to transferable ASCII format. Sometimes used in preference to other similar mechanisms because it is more space efficient; however, it is not a compression mechanism per se. It is just more eff icient in the transformation from one format to the other. Filenames of files in this format often have the ".atox" extension.
Transforms ASCII to binary ("uuencode") and the reverse ("uudecode") transformation in a standard manner. Originally used in the UUCP ("Unix to Unix CoPy") mail/USENET system. No bundling or compression mechanisms are used. Naming conventions often add a .uu at the end of the file name.
Originally a UNIX based utility for bundling (and unbundling) several files and directories into (and from) a single file (the acronym stands for "Tape ARchive"). Standard format provides no compression mechanism. The resulting bundled file is always in binary format regardless of whether the constituent files are binary or not. Naming conventions usually hold that the filename of a "tarfile" contain the sequence ".tar" or "-tar".
Often used in IBM PC environments, these complementary programs provide both bundling and compression mechanisms. The resulting files are always in binary format. Files resulting from the "zip" program are by convention terminated with the ".zip" filena me extension.
Often used in IBM PC environments, these complementary programs provide both bundling and compression mechanisms. The resulting files are always in binary format. Files stored in this format often have a ".arc" filename extension.
Used in the Apple MacIntosh environment, the binhex process provides bundling as well as binary to ASCII data transformations. Files in this format by convention have a filename extension of ".hqx".
Bourne shell archives package text or binary files into a single longer file which, when executed, will create the component files. Because this format is vulnerable to misuse, most users use a special tool called unshar to decode these archives. By con vention, files in this format have a filename extension of ".shar".
DCL archives package text or binary files into a single longer file which, when executed, will created the component files. Because this format is vulnerable to misuse, care must be take to examine such an archive before executing it. By convention, fil es in this format have a filename extension of ".shar".
Sometimes these shell archive files are broken into multiple small parts to simplify their transfer over other forms of fileservers that share the same archive tree. In such cases, the parts of the files are usually suffixed with a part number (e.g. xyz .01 xyz.02 xyz.03 ... or even .01-of-05). Collect all the parts, concatenate them on your local system, and then apply the procedure listed above for a simple shar or vms_share file to the concatenated file you just made.
The zoo program implements compression/decompression and bundling/unbundling in a single program. Utilities supporting the zoo format exist on a wide variety of systems, including Unix, MS-DOS, Macintosh, OS/2, Atari ST, and VAX VMS. Files created by th e "zoo" programs by convention end with the ".zoo" filename extension. Zoo is a popular distribution format due to the availability of free implementations (both source and executable code) on a wide variety of operating systems.
The Free Software Foundation GNU project adopted a variant of the zip compression mechanism as a substitute for the compress/uncompress commands. The resulting files are always in binary format. Files resulting from the "gzip" program are by convention terminated with the ".z" or ".gz" filename extensions. The gunzip program also recognizes ".tgz" and ".taz" as shorthands for ".tar.z" or ".tar.Z". Also, gunzip can recognize and decompress files created by the gzip, zip, compress, or pack commands.
The GNU project recently began distributing and using the gzip/gunzip utilities. Even more recently they changed the default suffix from .z to .gz, in an attempt to (1) reduce confusion with .Z, and (2) eliminate a problem with case-insensitive file syst ems such as MS-DOS. The gzip software is freely redistributable and has been ported to most UNIX systems, as well as Amiga, Atari, MSDOS, OS2, and VMS systems.
Some operating systems can not handle multiple periods; in such cases they are often replaced by hyphen ( - ), underscore ( _ ), or by detailed instructions in the "read me" files in the directories.
Suppose "patch" is a useful public domain program for applying program patches and updates. You find this file at an archive site as "patch.tar.Z". Now you know that the ".Z" indicates that the file was compressed with the UNIX "compress" command, and t he ".tar" indicates that it was tar'ed using the UNIX "tar" tape archive command.
First retrieve the file onto your machine using anonymous FTP. To unpack this program, you would first uncompress it by typing:
In the example of patch.tar, you could invoke the command as:
Remember that Internet site administrators for archive sites have made their systems available out of a sense of community. Rarely are they fully compensated for the time and effort it takes to administer such a site. There are some things users can do to make their jobs somewhat easier, such as checking with local support personnel first if problems occur before asking the archive adminstrator for help.
Most archive machines perform other functions as well. Please respect the needs of their primary users and restrict your FTP access to non-prime hours (generally between 1900 and 0600 hours local time for that site) whenever possible. It is especially i mportant to remember this for sites located on another continent or across a significant body of water because most such links are relatively slow and heavily loaded.
In addition, some sites offering anonymous FTP limit the number of concurrent anonymous FTP logins. If your attempt to log onto such a site results in an error message to the effect that too many anonymous FTP users are online, you should wait a while be fore attempting another connection rather than retrying immediately.
To reduce redundant storage, you should find out how to make useful the files you fetch using FTP available to your entire organization. If you retrieve and test a program that turns out to be useful, you should probably ask your administrator to consider making the program generally available, which will reduce the redundant effort and disk space resulting from multiple individuals installing the same package in their personal directories.
If you find an interesting file or program on an archive site, tell others about it. You should not copy the file or program to your own archive unless you are willing to keep your copy current.
To learn the current status of any Internet-Draft, please check the 1id-abstracts.txt listing contained in the Internet-Drafts Shadow Directories on ds.internic.net, nic.nordu.net, venera.isi.edu, or munnari.oz.au.
Please send comments to April Marine, amarine@atlas.arc.nasa.gov.
This Internet Draft expires August 30, 1994.