tar.info-6 [plain text]

This is tar.info, produced by Makeinfo version 3.12f from tar.texi.

START-INFO-DIR-ENTRY
* tar: (tar). Making tape (or disk) archives.
END-INFO-DIR-ENTRY

This file documents GNU `tar', a utility used to store, backup, and
transport files.

Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.

Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that
the entire resulting derived work is distributed under the terms of a
permission notice identical to this one.

Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the Foundation.

This file documents GNU `tar', which is a utility used to store,
backup, and transport files. `tar' is a tape (or disk) archiver. This
manual documents the release 1.13.

File: tar.info, Node: Attributes, Next: Standard, Prev: Compression, Up: Formats

Handling File Attributes
========================

_(This message will disappear, once this node revised.)_

When `tar' reads files, this causes them to have the access times
updated. To have `tar' attempt to set the access times back to what
they were before they were read, use the `--atime-preserve' option.
This doesn't work for files that you don't own, unless you're root, and
it doesn't interact with incremental dumps nicely (*note Backups::.),
but it is good enough for some purposes.

Handling of file attributes

`--atime-preserve'
Do not change access times on dumped files.

`-m'
`--touch'
Do not extract file modified time.

When this option is used, `tar' leaves the modification times of
the files it extracts as the time when the files were extracted,
instead of setting it to the time recorded in the archive.

This option is meaningless with `--list' (`-t').

`--same-owner'
Create extracted files with the same ownership they have in the
archive.

When using super-user at extraction time, ownership is always
restored. So, this option is meaningful only for non-root users,
when `tar' is executed on those systems able to give files away.
This is considered as a security flaw by many people, at least
because it makes quite difficult to correctly account users for
the disk space they occupy. Also, the `suid' or `sgid' attributes
of files are easily and silently lost when files are given away.

When writing an archive, `tar' writes the user id and user name
separately. If it can't find a user name (because the user id is
not in `/etc/passwd'), then it does not write one. When restoring,
and doing a `chmod' like when you use `--same-permissions'
(`--preserve-permissions', `-p') (), it tries to look the name (if
one was written) up in `/etc/passwd'. If it fails, then it uses
the user id stored in the archive instead.

`--numeric-owner'
The `--numeric-owner' option allows (ANSI) archives to be written
without user/group name information or such information to be
ignored when extracting. It effectively disables the generation
and/or use of user/group name information. This option forces
extraction using the numeric ids from the archive, ignoring the
names.

This is useful in certain circumstances, when restoring a backup
from an emergency floppy with different passwd/group files for
example. It is otherwise impossible to extract files with the
right ownerships if the password file in use during the extraction
does not match the one belonging to the filesystem(s) being
extracted. This occurs, for example, if you are restoring your
files after a major crash and had booted from an emergency floppy
with no password file or put your disk into another machine to do
the restore.

The numeric ids are _always_ saved into `tar' archives. The
identifying names are added at create time when provided by the
system, unless `--old-archive' (`-o') is used. Numeric ids could
be used when moving archives between a collection of machines using
a centralized management for attribution of numeric ids to users
and groups. This is often made through using the NIS capabilities.

When making a `tar' file for distribution to other sites, it is
sometimes cleaner to use a single owner for all files in the
distribution, and nicer to specify the write permission bits of the
files as stored in the archive independently of their actual value
on the file system. The way to prepare a clean distribution is
usually to have some Makefile rule creating a directory, copying
all needed files in that directory, then setting ownership and
permissions as wanted (there are a lot of possible schemes), and
only then making a `tar' archive out of this directory, before
cleaning everything out. Of course, we could add a lot of options
to GNU `tar' for fine tuning permissions and ownership. This is
not the good way, I think. GNU `tar' is already crowded with
options and moreover, the approach just explained gives you a
great deal of control already.

`-p'
`--same-permissions'
`--preserve-permissions'
Extract all protection information.

This option causes `tar' to set the modes (access permissions) of
extracted files exactly as recorded in the archive. If this option
is not used, the current `umask' setting limits the permissions on
extracted files.

This option is meaningless with `--list' (`-t').

`--preserve'
Same as both `--same-permissions' (`--preserve-permissions', `-p')
and `--same-order' (`--preserve-order', `-s').

The `--preserve' option has no equivalent short option name. It
is equivalent to `--same-permissions' (`--preserve-permissions',
`-p') plus `--same-order' (`--preserve-order', `-s').

File: tar.info, Node: Standard, Next: Extensions, Prev: Attributes, Up: Formats

The Standard Format
===================

_(This message will disappear, once this node revised.)_

While an archive may contain many files, the archive itself is a
single ordinary file. Like any other file, an archive file can be
written to a storage device such as a tape or disk, sent through a pipe
or over a network, saved on the active file system, or even stored in
another archive. An archive file is not easy to read or manipulate
without using the `tar' utility or Tar mode in GNU Emacs.

Physically, an archive consists of a series of file entries
terminated by an end-of-archive entry, which consists of 512 zero
bytes. A file entry usually describes one of the files in the archive
(an "archive member"), and consists of a file header and the contents
of the file. File headers contain file names and statistics, checksum
information which `tar' uses to detect file corruption, and information
about file types.

Archives are permitted to have more than one member with the same
member name. One way this situation can occur is if more than one
version of a file has been stored in the archive. For information
about adding new versions of a file to an archive, see *Note update::,
and to learn more about having more than one archive member with the
same name, see .

In addition to entries describing archive members, an archive may
contain entries which `tar' itself uses to store information.
*Note label::, for an example of such an archive entry.

A `tar' archive file contains a series of blocks. Each block
contains `BLOCKSIZE' bytes. Although this format may be thought of as
being on magnetic tape, other media are often used.

Each file archived is represented by a header block which describes
the file, followed by zero or more blocks which give the contents of
the file. At the end of the archive file there may be a block filled
with binary zeros as an end-of-file marker. A reasonable system should
write a block of zeros at the end, but must not assume that such a
block exists when reading an archive.

The blocks may be "blocked" for physical I/O operations. Each
record of N blocks (where N is set by the `--blocking-factor=512-SIZE'
(`-b 512-SIZE') option to `tar') is written with a single `write ()'
operation. On magnetic tapes, the result of such a write is a single
record. When writing an archive, the last record of blocks should be
written at the full size, with blocks after the zero block containing
all zeros. When reading an archive, a reasonable system should
properly handle an archive whose last record is shorter than the rest,
or which contains garbage records after a zero block.

The header block is defined in C as follows. In the GNU `tar'
distribution, this is part of file `src/tar.h':

/* GNU tar Archive Format description. */

/* If OLDGNU_COMPATIBILITY is not zero, tar produces archives which, by
default, are readable by older versions of GNU tar. This can be
overriden by using --posix; in this case, POSIXLY_CORRECT in environment
may be set for enforcing stricter conformance. If OLDGNU_COMPATIBILITY
is zero or undefined, tar will eventually produces archives which, by
default, POSIX compatible; then either using --posix or defining
POSIXLY_CORRECT enforces stricter conformance.

This #define will disappear in a few years. FP, June 1995. */
#define OLDGNU_COMPATIBILITY 1

/*---------------------------------------------.
| `tar' Header Block, from POSIX 1003.1-1990. |
`---------------------------------------------*/

/* POSIX header. */

struct posix_header
{ /* byte offset */
char name[100]; /* 0 */
char mode[8]; /* 100 */
char uid[8]; /* 108 */
char gid[8]; /* 116 */
char size[12]; /* 124 */
char mtime[12]; /* 136 */
char chksum[8]; /* 148 */
char typeflag; /* 156 */
char linkname[100]; /* 157 */
char magic[6]; /* 257 */
char version[2]; /* 263 */
char uname[32]; /* 265 */
char gname[32]; /* 297 */
char devmajor[8]; /* 329 */
char devminor[8]; /* 337 */
char prefix[155]; /* 345 */
/* 500 */
};

#define TMAGIC "ustar" /* ustar and a null */
#define TMAGLEN 6
#define TVERSION "00" /* 00 and no null */
#define TVERSLEN 2

/* Values used in typeflag field. */
#define REGTYPE '0' /* regular file */
#define AREGTYPE '\0' /* regular file */
#define LNKTYPE '1' /* link */
#define SYMTYPE '2' /* reserved */
#define CHRTYPE '3' /* character special */
#define BLKTYPE '4' /* block special */
#define DIRTYPE '5' /* directory */
#define FIFOTYPE '6' /* FIFO special */
#define CONTTYPE '7' /* reserved */

/* Bits used in the mode field, values in octal. */
#define TSUID 04000 /* set UID on execution */
#define TSGID 02000 /* set GID on execution */
#define TSVTX 01000 /* reserved */
/* file permissions */
#define TUREAD 00400 /* read by owner */
#define TUWRITE 00200 /* write by owner */
#define TUEXEC 00100 /* execute/search by owner */
#define TGREAD 00040 /* read by group */
#define TGWRITE 00020 /* write by group */
#define TGEXEC 00010 /* execute/search by group */
#define TOREAD 00004 /* read by other */
#define TOWRITE 00002 /* write by other */
#define TOEXEC 00001 /* execute/search by other */

/*-------------------------------------.
| `tar' Header Block, GNU extensions. |
`-------------------------------------*/

/* In GNU tar, SYMTYPE is for to symbolic links, and CONTTYPE is for
contiguous files, so maybe disobeying the `reserved' comment in POSIX
header description. I suspect these were meant to be used this way, and
should not have really been `reserved' in the published standards. */

/* *BEWARE* *BEWARE* *BEWARE* that the following information is still
boiling, and may change. Even if the OLDGNU format description should be
accurate, the so-called GNU format is not yet fully decided. It is
surely meant to use only extensions allowed by POSIX, but the sketch
below repeats some ugliness from the OLDGNU format, which should rather
go away. Sparse files should be saved in such a way that they do *not*
require two passes at archive creation time. Huge files get some POSIX
fields to overflow, alternate solutions have to be sought for this. */

/* Descriptor for a single file hole. */

struct sparse
{ /* byte offset */
char offset[12]; /* 0 */
char numbytes[12]; /* 12 */
/* 24 */
};

/* Sparse files are not supported in POSIX ustar format. For sparse files
with a POSIX header, a GNU extra header is provided which holds overall
sparse information and a few sparse descriptors. When an old GNU header
replaces both the POSIX header and the GNU extra header, it holds some
sparse descriptors too. Whether POSIX or not, if more sparse descriptors
are still needed, they are put into as many successive sparse headers as
necessary. The following constants tell how many sparse descriptors fit
in each kind of header able to hold them. */

#define SPARSES_IN_EXTRA_HEADER 16
#define SPARSES_IN_OLDGNU_HEADER 4
#define SPARSES_IN_SPARSE_HEADER 21

/* The GNU extra header contains some information GNU tar needs, but not
foreseen in POSIX header format. It is only used after a POSIX header
(and never with old GNU headers), and immediately follows this POSIX
header, when typeflag is a letter rather than a digit, so signaling a GNU
extension. */

struct extra_header
{ /* byte offset */
char atime[12]; /* 0 */
char ctime[12]; /* 12 */
char offset[12]; /* 24 */
char realsize[12]; /* 36 */
char longnames[4]; /* 48 */
char unused_pad1[68]; /* 52 */
struct sparse sp[SPARSES_IN_EXTRA_HEADER];
/* 120 */
char isextended; /* 504 */
/* 505 */
};

/* Extension header for sparse files, used immediately after the GNU extra
header, and used only if all sparse information cannot fit into that
extra header. There might even be many such extension headers, one after
the other, until all sparse information has been recorded. */

struct sparse_header
{ /* byte offset */
struct sparse sp[SPARSES_IN_SPARSE_HEADER];
/* 0 */
char isextended; /* 504 */
/* 505 */
};

/* The old GNU format header conflicts with POSIX format in such a way that
POSIX archives may fool old GNU tar's, and POSIX tar's might well be
fooled by old GNU tar archives. An old GNU format header uses the space
used by the prefix field in a POSIX header, and cumulates information
normally found in a GNU extra header. With an old GNU tar header, we
never see any POSIX header nor GNU extra header. Supplementary sparse
headers are allowed, however. */

struct oldgnu_header
{ /* byte offset */
char unused_pad1[345]; /* 0 */
char atime[12]; /* 345 */
char ctime[12]; /* 357 */
char offset[12]; /* 369 */
char longnames[4]; /* 381 */
char unused_pad2; /* 385 */
struct sparse sp[SPARSES_IN_OLDGNU_HEADER];
/* 386 */
char isextended; /* 482 */
char realsize[12]; /* 483 */
/* 495 */
};

/* OLDGNU_MAGIC uses both magic and version fields, which are contiguous.
Found in an archive, it indicates an old GNU header format, which will be
hopefully become obsolescent. With OLDGNU_MAGIC, uname and gname are
valid, though the header is not truly POSIX conforming. */
#define OLDGNU_MAGIC "ustar " /* 7 chars and a null */

/* The standards committee allows only capital A through capital Z for
user-defined expansion. */

/* This is a dir entry that contains the names of files that were in the
dir at the time the dump was made. */
#define GNUTYPE_DUMPDIR 'D'

/* Identifies the *next* file on the tape as having a long linkname. */
#define GNUTYPE_LONGLINK 'K'

/* Identifies the *next* file on the tape as having a long name. */
#define GNUTYPE_LONGNAME 'L'

/* This is the continuation of a file that began on another volume. */
#define GNUTYPE_MULTIVOL 'M'

/* For storing filenames that do not fit into the main header. */
#define GNUTYPE_NAMES 'N'

/* This is for sparse files. */
#define GNUTYPE_SPARSE 'S'

/* This file is a tape/volume header. Ignore it on extraction. */
#define GNUTYPE_VOLHDR 'V'

/*--------------------------------------.
| tar Header Block, overall structure. |
`--------------------------------------*/

/* tar files are made in basic blocks of this size. */
#define BLOCKSIZE 512

enum archive_format
{
DEFAULT_FORMAT, /* format to be decided later */
V7_FORMAT, /* old V7 tar format */
OLDGNU_FORMAT, /* GNU format as per before tar 1.12 */
POSIX_FORMAT, /* restricted, pure POSIX format */
GNU_FORMAT /* POSIX format with GNU extensions */
};

union block
{
char buffer[BLOCKSIZE];
struct posix_header header;
struct extra_header extra_header;
struct oldgnu_header oldgnu_header;
struct sparse_header sparse_header;
};

/* End of Format description. */

All characters in header blocks are represented by using 8-bit
characters in the local variant of ASCII. Each field within the
structure is contiguous; that is, there is no padding used within the
structure. Each character on the archive medium is stored contiguously.

Bytes representing the contents of files (after the header block of
each file) are not translated in any way and are not constrained to
represent characters in any character set. The `tar' format does not
distinguish text files from binary files, and no translation of file
contents is performed.

The `name', `linkname', `magic', `uname', and `gname' are
null-terminated character strings. All other fileds are zero-filled
octal numbers in ASCII. Each numeric field of width W contains W minus
2 digits, a space, and a null, except `size', and `mtime', which do not
contain the trailing null.

The `name' field is the file name of the file, with directory names
(if any) preceding the file name, separated by slashes.

The `mode' field provides nine bits specifying file permissions and
three bits to specify the Set UID, Set GID, and Save Text ("sticky")
modes. Values for these bits are defined above. When special
permissions are required to create a file with a given mode, and the
user restoring files from the archive does not hold such permissions,
the mode bit(s) specifying those special permissions are ignored.
Modes which are not supported by the operating system restoring files
from the archive will be ignored. Unsupported modes should be faked up
when creating or updating an archive; e.g. the group permission could
be copied from the _other_ permission.

The `uid' and `gid' fields are the numeric user and group ID of the
file owners, respectively. If the operating system does not support
numeric user or group IDs, these fields should be ignored.

The `size' field is the size of the file in bytes; linked files are
archived with this field specified as zero. , in particular the
`--incremental' (`-G') option.

The `mtime' field is the modification time of the file at the time
it was archived. It is the ASCII representation of the octal value of
the last time the file was modified, represented as an integer number of
seconds since January 1, 1970, 00:00 Coordinated Universal Time.

The `chksum' field is the ASCII representation of the octal value of
the simple sum of all bytes in the header block. Each 8-bit byte in
the header is added to an unsigned integer, initialized to zero, the
precision of which shall be no less than seventeen bits. When
calculating the checksum, the `chksum' field is treated as if it were
all blanks.

The `typeflag' field specifies the type of file archived. If a
particular implementation does not recognize or permit the specified
type, the file will be extracted as if it were a regular file. As this
action occurs, `tar' issues a warning to the standard error.

The `atime' and `ctime' fields are used in making incremental
backups; they store, respectively, the particular file's access time
and last inode-change time.

The `offset' is used by the `--multi-volume' (`-M') option, when
making a multi-volume archive. The offset is number of bytes into the
file that we need to restart at to continue the file on the next tape,
i.e., where we store the location that a continued file is continued at.

The following fields were added to deal with sparse files. A file
is "sparse" if it takes in unallocated blocks which end up being
represented as zeros, i.e., no useful data. A test to see if a file is
sparse is to look at the number blocks allocated for it versus the
number of characters in the file; if there are fewer blocks allocated
for the file than would normally be allocated for a file of that size,
then the file is sparse. This is the method `tar' uses to detect a
sparse file, and once such a file is detected, it is treated
differently from non-sparse files.

Sparse files are often `dbm' files, or other database-type files
which have data at some points and emptiness in the greater part of the
file. Such files can appear to be very large when an `ls -l' is done
on them, when in truth, there may be a very small amount of important
data contained in the file. It is thus undesirable to have `tar' think
that it must back up this entire file, as great quantities of room are
wasted on empty blocks, which can lead to running out of room on a tape
far earlier than is necessary. Thus, sparse files are dealt with so
that these empty blocks are not written to the tape. Instead, what is
written to the tape is a description, of sorts, of the sparse file:
where the holes are, how big the holes are, and how much data is found
at the end of the hole. This way, the file takes up potentially far
less room on the tape, and when the file is extracted later on, it will
look exactly the way it looked beforehand. The following is a
description of the fields used to handle a sparse file:

The `sp' is an array of `struct sparse'. Each `struct sparse'
contains two 12-character strings which represent an offset into the
file and a number of bytes to be written at that offset. The offset is
absolute, and not relative to the offset in preceding array element.

The header can hold four of these `struct sparse' at the moment; if
more are needed, they are not stored in the header.

The `isextended' flag is set when an `extended_header' is needed to
deal with a file. Note that this means that this flag can only be set
when dealing with a sparse file, and it is only set in the event that
the description of the file will not fit in the alloted room for sparse
structures in the header. In other words, an extended_header is needed.

The `extended_header' structure is used for sparse files which need
more sparse structures than can fit in the header. The header can fit
4 such structures; if more are needed, the flag `isextended' gets set
and the next block is an `extended_header'.

Each `extended_header' structure contains an array of 21 sparse
structures, along with a similar `isextended' flag that the header had.
There can be an indeterminate number of such `extended_header's to
describe a sparse file.

`REGTYPE'
`AREGTYPE'
These flags represent a regular file. In order to be compatible
with older versions of `tar', a `typeflag' value of `AREGTYPE'
should be silently recognized as a regular file. New archives
should be created using `REGTYPE'. Also, for backward
compatibility, `tar' treats a regular file whose name ends with a
slash as a directory.

`LNKTYPE'
This flag represents a file linked to another file, of any type,
previously archived. Such files are identified in Unix by each
file having the same device and inode number. The linked-to name
is specified in the `linkname' field with a trailing null.

`SYMTYPE'
This represents a symbolic link to another file. The linked-to
name is specified in the `linkname' field with a trailing null.

`CHRTYPE'
`BLKTYPE'
These represent character special files and block special files
respectively. In this case the `devmajor' and `devminor' fields
will contain the major and minor device numbers respectively.
Operating systems may map the device specifications to their own
local specification, or may ignore the entry.

`DIRTYPE'
This flag specifies a directory or sub-directory. The directory
name in the `name' field should end with a slash. On systems where
disk allocation is performed on a directory basis, the `size' field
will contain the maximum number of bytes (which may be rounded to
the nearest disk block allocation unit) which the directory may
hold. A `size' field of zero indicates no such limiting. Systems
which do not support limiting in this manner should ignore the
`size' field.

`FIFOTYPE'
This specifies a FIFO special file. Note that the archiving of a
FIFO file archives the existence of this file and not its contents.

`CONTTYPE'
This specifies a contiguous file, which is the same as a normal
file except that, in operating systems which support it, all its
space is allocated contiguously on the disk. Operating systems
which do not allow contiguous allocation should silently treat this
type as a normal file.

`A' ... `Z'
These are reserved for custom implementations. Some of these are
used in the GNU modified format, as described below.

Other values are reserved for specification in future revisions of
the P1003 standard, and should not be used by any `tar' program.

The `magic' field indicates that this archive was output in the
P1003 archive format. If this field contains `TMAGIC', the `uname' and
`gname' fields will contain the ASCII representation of the owner and
group of the file respectively. If found, the user and group IDs are
used rather than the values in the `uid' and `gid' fields.

For references, see ISO/IEC 9945-1:1990 or IEEE Std 1003.1-1990,
pages 169-173 (section 10.1) for `Archive/Interchange File Format'; and
IEEE Std 1003.2-1992, pages 380-388 (section 4.48) and pages 936-940
(section E.4.48) for `pax - Portable archive interchange'.

File: tar.info, Node: Extensions, Next: cpio, Prev: Standard, Up: Formats

GNU Extensions to the Archive Format
====================================

_(This message will disappear, once this node revised.)_

The GNU format uses additional file types to describe new types of
files in an archive. These are listed below.

`GNUTYPE_DUMPDIR'
`'D''
This represents a directory and a list of files created by the
`--incremental' (`-G') option. The `size' field gives the total
size of the associated list of files. Each file name is preceded
by either a `Y' (the file should be in this archive) or an `N'.
(The file is a directory, or is not stored in the archive.) Each
file name is terminated by a null. There is an additional null
after the last file name.

`GNUTYPE_MULTIVOL'
`'M''
This represents a file continued from another volume of a
multi-volume archive created with the `--multi-volume' (`-M')
option. The original type of the file is not given here. The
`size' field gives the maximum size of this piece of the file
(assuming the volume does not end before the file is written out).
The `offset' field gives the offset from the beginning of the
file where this part of the file begins. Thus `size' plus
`offset' should equal the original size of the file.

`GNUTYPE_SPARSE'
`'S''
This flag indicates that we are dealing with a sparse file. Note
that archiving a sparse file requires special operations to find
holes in the file, which mark the positions of these holes, along
with the number of bytes of data to be found after the hole.

`GNUTYPE_VOLHDR'
`'V''
This file type is used to mark the volume header that was given
with the `--label=ARCHIVE-LABEL' (`-V ARCHIVE-LABEL') option when
the archive was created. The `name' field contains the `name'
given after the `--label=ARCHIVE-LABEL' (`-V ARCHIVE-LABEL')
option. The `size' field is zero. Only the first file in each
volume of an archive should have this type.

You may have trouble reading a GNU format archive on a non-GNU
system if the options `--incremental' (`-G'), `--multi-volume' (`-M'),
`--sparse' (`-S'), or `--label=ARCHIVE-LABEL' (`-V ARCHIVE-LABEL') were
used when writing the archive. In general, if `tar' does not use the
GNU-added fields of the header, other versions of `tar' should be able
to read the archive. Otherwise, the `tar' program will give an error,
the most likely one being a checksum error.

File: tar.info, Node: cpio, Prev: Extensions, Up: Formats

Comparison of `tar' and `cpio'
==============================

_(This message will disappear, once this node revised.)_

The `cpio' archive formats, like `tar', do have maximum pathname
lengths. The binary and old ASCII formats have a max path length of
256, and the new ASCII and CRC ASCII formats have a max path length of
1024. GNU `cpio' can read and write archives with arbitrary pathname
lengths, but other `cpio' implementations may crash unexplainedly
trying to read them.

`tar' handles symbolic links in the form in which it comes in BSD;
`cpio' doesn't handle symbolic links in the form in which it comes in
System V prior to SVR4, and some vendors may have added symlinks to
their system without enhancing `cpio' to know about them. Others may
have enhanced it in a way other than the way I did it at Sun, and which
was adopted by AT&T (and which is, I think, also present in the `cpio'
that Berkeley picked up from AT&T and put into a later BSD release--I
think I gave them my changes).

(SVR4 does some funny stuff with `tar'; basically, its `cpio' can
handle `tar' format input, and write it on output, and it probably
handles symbolic links. They may not have bothered doing anything to
enhance `tar' as a result.)

`cpio' handles special files; traditional `tar' doesn't.

`tar' comes with V7, System III, System V, and BSD source; `cpio'
comes only with System III, System V, and later BSD (4.3-tahoe and
later).

`tar''s way of handling multiple hard links to a file can handle
file systems that support 32-bit inumbers (e.g., the BSD file system);
`cpio's way requires you to play some games (in its "binary" format,
i-numbers are only 16 bits, and in its "portable ASCII" format, they're
18 bits--it would have to play games with the "file system ID" field of
the header to make sure that the file system ID/i-number pairs of
different files were always different), and I don't know which `cpio's,
if any, play those games. Those that don't might get confused and
think two files are the same file when they're not, and make hard links
between them.

`tar's way of handling multiple hard links to a file places only one
copy of the link on the tape, but the name attached to that copy is the
_only_ one you can use to retrieve the file; `cpio's way puts one copy
for every link, but you can retrieve it using any of the names.

What type of check sum (if any) is used, and how is this
calculated.

See the attached manual pages for `tar' and `cpio' format. `tar'
uses a checksum which is the sum of all the bytes in the `tar' header
for a file; `cpio' uses no checksum.

If anyone knows why `cpio' was made when `tar' was present at the
unix scene,

It wasn't. `cpio' first showed up in PWB/UNIX 1.0; no
generally-available version of UNIX had `tar' at the time. I don't
know whether any version that was generally available _within AT&T_ had
`tar', or, if so, whether the people within AT&T who did `cpio' knew
about it.

On restore, if there is a corruption on a tape `tar' will stop at
that point, while `cpio' will skip over it and try to restore the rest
of the files.

The main difference is just in the command syntax and header format.

`tar' is a little more tape-oriented in that everything is blocked
to start on a record boundary.

Is there any differences between the ability to recover crashed
archives between the two of them. (Is there any chance of
recovering crashed archives at all.)

Theoretically it should be easier under `tar' since the blocking
lets you find a header with some variation of `dd skip=NN'. However,
modern `cpio''s and variations have an option to just search for the
next file header after an error with a reasonable chance of re-syncing.
However, lots of tape driver software won't allow you to continue past
a media error which should be the only reason for getting out of sync
unless a file changed sizes while you were writing the archive.

If anyone knows why `cpio' was made when `tar' was present at the
unix scene, please tell me about this too.

Probably because it is more media efficient (by not blocking
everything and using only the space needed for the headers where `tar'
always uses 512 bytes per file header) and it knows how to archive
special files.

You might want to look at the freely available alternatives. The
major ones are `afio', GNU `tar', and `pax', each of which have their
own extensions with some backwards compatibility.

Sparse files were `tar'red as sparse files (which you can easily
test, because the resulting archive gets smaller, and GNU `cpio' can no
longer read it).

File: tar.info, Node: Media, Next: Index, Prev: Formats, Up: Top

Tapes and Other Archive Media
*****************************

_(This message will disappear, once this node revised.)_

A few special cases about tape handling warrant more detailed
description. These special cases are discussed below.

Many complexities surround the use of `tar' on tape drives. Since
the creation and manipulation of archives located on magnetic tape was
the original purpose of `tar', it contains many features making such
manipulation easier.

Archives are usually written on dismountable media--tape cartridges,
mag tapes, or floppy disks.

The amount of data a tape or disk holds depends not only on its size,
but also on how it is formatted. A 2400 foot long reel of mag tape
holds 40 megabytes of data when formated at 1600 bits per inch. The
physically smaller EXABYTE tape cartridge holds 2.3 gigabytes.

Magnetic media are re-usable--once the archive on a tape is no longer
needed, the archive can be erased and the tape or disk used over.
Media quality does deteriorate with use, however. Most tapes or disks
should be disgarded when they begin to produce data errors. EXABYTE
tape cartridges should be disgarded when they generate an "error count"
(number of non-usable bits) of more than 10k.

Magnetic media are written and erased using magnetic fields, and
should be protected from such fields to avoid damage to stored data.
Sticking a floppy disk to a filing cabinet using a magnet is probably
not a good idea.

* Menu:

* Device:: Device selection and switching
* Remote Tape Server::
* Common Problems and Solutions::
* Blocking:: Blocking
* Many:: Many archives on one tape
* Using Multiple Tapes:: Using Multiple Tapes
* label:: Including a Label in the Archive
* verify::
* Write Protection::

File: tar.info, Node: Device, Next: Remote Tape Server, Prev: Media, Up: Media

Device Selection and Switching
==============================

_(This message will disappear, once this node revised.)_

`-f [HOSTNAME:]FILE'
`--file=[HOSTNAME:]FILE'
Use archive file or device FILE on HOSTNAME.

This option is used to specify the file name of the archive `tar'
works on.

If the file name is `-', `tar' reads the archive from standard input
(when listing or extracting), or writes it to standard output (when
creating). If the `-' file name is given when updating an archive,
`tar' will read the original archive from its standard input, and will
write the entire new archive to its standard output.

If the file name contains a `:', it is interpreted as `hostname:file
name'. If the HOSTNAME contains an "at" sign (`@'), it is treated as
`user@hostname:file name'. In either case, `tar' will invoke the
command `rsh' (or `remsh') to start up an `/etc/rmt' on the remote
machine. If you give an alternate login name, it will be given to the
`rsh'. Naturally, the remote machine must have an executable
`/etc/rmt'. This program is free software from the University of
California, and a copy of the source code can be found with the sources
for `tar'; it's compiled and installed by default.

If this option is not given, but the environment variable `TAPE' is
set, its value is used; otherwise, old versions of `tar' used a default
archive name (which was picked when `tar' was compiled). The default
is normally set up to be the "first" tape drive or other transportable
I/O medium on the system.

Starting with version 1.11.5, GNU `tar' uses standard input and
standard output as the default device, and I will not try anymore
supporting automatic device detection at installation time. This was
failing really in too many cases, it was hopeless. This is now
completely left to the installer to override standard input and standard
output for default device, if this seems preferrable to him/her.
Further, I think _most_ actual usages of `tar' are done with pipes or
disks, not really tapes, cartridges or diskettes.

Some users think that using standard input and output is running
after trouble. This could lead to a nasty surprise on your screen if
you forget to specify an output file name--especially if you are going
through a network or terminal server capable of buffering large amounts
of output. We had so many bug reports in that area of configuring
default tapes automatically, and so many contradicting requests, that
we finally consider the problem to be portably intractable. We could
of course use something like `/dev/tape' as a default, but this is
_also_ running after various kind of trouble, going from hung processes
to accidental destruction of real tapes. After having seen all this
mess, using standard input and output as a default really sounds like
the only clean choice left, and a very useful one too.

GNU `tar' reads and writes archive in records, I suspect this is the
main reason why block devices are preferred over character devices.
Most probably, block devices are more efficient too. The installer
could also check for `DEFTAPE' in `<sys/mtio.h>'.

`--force-local'
Archive file is local even if it contains a colon.

`--rsh-command=COMMAND'
Use remote COMMAND instead of `rsh'. This option exists so that
people who use something other than the standard `rsh' (e.g., a
Kerberized `rsh') can access a remote device.

When this command is not used, the shell command found when the
`tar' program was installed is used instead. This is the first
found of `/usr/ucb/rsh', `/usr/bin/remsh', `/usr/bin/rsh',
`/usr/bsd/rsh' or `/usr/bin/nsh'. The installer may have
overriden this by defining the environment variable `RSH' _at
installation time_.

`-[0-7][lmh]'
Specify drive and density.

`-M'
`--multi-volume'
Create/list/extract multi-volume archive.

This option causes `tar' to write a "multi-volume" archive--one
that may be larger than will fit on the medium used to hold it.
*Note Multi-Volume Archives::.

`-L NUM'
`--tape-length=NUM'
Change tape after writing NUM x 1024 bytes.

This option might be useful when your tape drivers do not properly
detect end of physical tapes. By being slightly conservative on
the maximum tape length, you might avoid the problem entirely.

`-F FILE'
`--info-script=FILE'
`--new-volume-script=FILE'
Execute `file' at end of each tape. This implies `--multi-volume'
(`-M').

File: tar.info, Node: Remote Tape Server, Next: Common Problems and Solutions, Prev: Device, Up: Media

The Remote Tape Server
======================

In order to access the tape drive on a remote machine, `tar' uses
the remote tape server written at the University of California at
Berkeley. The remote tape server must be installed as `/etc/rmt' on
any machine whose tape drive you want to use. `tar' calls `/etc/rmt'
by running an `rsh' or `remsh' to the remote machine, optionally using
a different login name if one is supplied.

A copy of the source for the remote tape server is provided. It is
Copyright (C) 1983 by the Regents of the University of California, but
can be freely distributed. Instructions for compiling and installing
it are included in the `Makefile'.

Unless you use the `--absolute-names' (`-P') option, GNU `tar' will
not allow you to create an archive that contains absolute file names (a
file name beginning with `/'.) If you try, `tar' will automatically
remove the leading `/' from the file names it stores in the archive.
It will also type a warning message telling you what it is doing.

When reading an archive that was created with a different `tar'
program, GNU `tar' automatically extracts entries in the archive which
have absolute file names as if the file names were not absolute. This
is an important feature. A visitor here once gave a `tar' tape to an
operator to restore; the operator used Sun `tar' instead of GNU `tar',
and the result was that it replaced large portions of our `/bin' and
friends with versions from the tape; needless to say, we were unhappy
about having to recover the file system from backup tapes.

For example, if the archive contained a file `/usr/bin/computoy',
GNU `tar' would extract the file to `usr/bin/computoy', relative to the
current directory. If you want to extract the files in an archive to
the same absolute names that they had when the archive was created, you
should do a `cd /' before extracting the files from the archive, or you
should either use the `--absolute-names' (`-P') option, or use the
command `tar -C / ...'.

Some versions of Unix (Ultrix 3.1 is know to have this problem), can
claim that a short write near the end of a tape succeeded, when it
actually failed. This will result in the -M option not working
correctly. The best workaround at the moment is to use a significantly
larger blocking factor than the default 20.

In order to update an archive, `tar' must be able to backspace the
archive in order to reread or rewrite a record that was just read (or
written). This is currently possible only on two kinds of files: normal
disk files (or any other file that can be backspaced with `lseek'), and
industry-standard 9-track magnetic tape (or any other kind of tape that
can be backspaced with the `MTIOCTOP' `ioctl'.

This means that the `--append' (`-r'), `--update' (`-u'),
`--concatenate' (`--catenate', `-A'), and `--delete' commands will not
work on any other kind of file. Some media simply cannot be
backspaced, which means these commands and options will never be able
to work on them. These non-backspacing media include pipes and
cartridge tape drives.

Some other media can be backspaced, and `tar' will work on them once
`tar' is modified to do so.

Archives created with the `--multi-volume' (`-M'),
`--label=ARCHIVE-LABEL' (`-V ARCHIVE-LABEL'), and `--incremental'
(`-G') options may not be readable by other version of `tar'. In
particular, restoring a file that was split over a volume boundary will
require some careful work with `dd', if it can be done at all. Other
versions of `tar' may also create an empty file whose name is that of
the volume header. Some versions of `tar' may create normal files
instead of directories archived with the `--incremental' (`-G') option.

File: tar.info, Node: Common Problems and Solutions, Next: Blocking, Prev: Remote Tape Server, Up: Media

Some Common Problems and their Solutions
========================================

errors from system:
permission denied
no such file or directory
not owner

errors from `tar':
directory checksum error
header format error

errors from media/system:
i/o error
device busy