dump-load-format.txt [plain text]
This file describes the format produced by 'svnadmin dump' and
consumed by 'svnadmin load'.
The format has undergone revisions over time. They are presented in
reverse chronological order here. You may wish to start with the
VERSION 1 description in order to get a baseline understanding first.
===== SVN DUMPFILE VERSION 3 FORMAT =====
(generated by SVN versions 1.1.0-present, if requested by the user)
This format is equivalent to the VERSION 2 format except for the
following:
1.) The format starts with the new version number of the dump format
("SVN-fs-dump-format-version: 3\n").
2.) There are several new optional headers for node changes:
[Text-delta: true|false]
[Prop-delta: true|false]
[Text-delta-base-md5: blob]
[Text-delta-base-sha1: blob]
[Text-copy-source-sha1: blob]
[Text-content-sha1: blob]
The default value for the boolean headers is "false". If the value is
set to "true", then the text and property contents will be treated
as deltas against the previous contents of the node (as determined
by copy history for adds with history, or by the value in the
previous revision for changes--just as with commits).
Property deltas have the same format as regular property lists except
that (1) properties with the same value as in the previous contents of
the node are not printed, and (2) deleted properties will be written
out as
D <name length>
<name>
just as a regular property is printed, but with the "K " changed to a
"D " and with no value part.
Text deltas are written out as a series of svndiff0 windows. If
Text-delta-base-md5 is provided, it is the checksum of the base to
which the text delta is applied; note that older versions (pre-1.5) of
'svnadmin load' may ignore the checksum.
Text-delta-base-sha1, Text-copy-source-sha1, and Text-content-sha1 are not
currently used by the loader. They are written by 1.6-and-later versions of
Subversion so that future loaders can optionally choose which checksum to
use for checking for corruption.
===== SVN DUMPFILE VERSION 2 FORMAT =====
(generated by SVN versions 0.18.0-present, by default)
This format is equivalent to the VERSION 1 format in every respect,
except for the following:
1.) The format starts with the new version number of the dump format
("SVN-fs-dump-format-version: 2\n").
2.) In addition to "Revision Records", another sort of record is supported:
the "UUID" record, which should be of the form:
UUID: 7bf7a5ef-cabf-0310-b7d4-93df341afa7e
This should be used to indicate the UUID of the originating repository.
===== SVN DUMPFILE VERSION 1 FORMAT =====
(generated by SVN versions prior to 0.18.0)
The binary format starts with the version number of the dump format
("SVN-fs-dump-format-version: 1\n"), followed by a series of revision
records. Each revision record starts with information about the
revision, followed by a variable number of node changes for that
revision. Fields in [braces] are optional, and unknown headers are
always ignored, for backwards compatibility.
Revision-number: N
Prop-content-length: P
Content-length: L
...P bytes of property data. Properties are stored in the same
human-readable hashdump format used by working copy property files,
except that they end with "PROPS-END\n" for better readability.
Node-path: absolute/path/to/node/in/filesystem
Node-kind: file | dir (1)
Node-action: change | add | delete | replace
[Node-copyfrom-rev: X]
[Node-copyfrom-path: path ]
[Text-copy-source-md5: blob] (2)
[Text-content-md5: blob]
[Text-content-length: T]
[Prop-content-length: P]
Content-length: Y (3)
... Y bytes of content data, divided into P bytes of "property"
data and T bytes of "text" data. The properties come first; their
total length (including formatting) is Prop-content-length, and is
included in Node-content-length. The "PROPS-END\n" line always
terminates the property section if there are props. The remainder
of the Y bytes (expected to be equivalent to Text-content-length)
represent the contents of the node.
Notes:
(1) if the node represents a deletion, this field is optional.
(2) this is a checksum of the source of the copy. a loader process
can use this checksum to determine that the copyfrom path/rev
already present in a filesystem is really the *correct* one to
use.
(3) the Content-length header is technically unnecessary, since the
information it holds (and more) can be found in the
Prop-content-length and Text-content-length fields. Though
Subversion itself does not make use of the header when reading
a dumpfile, we include it for compatibility with generic RFC822
parsers.
(4) There are actually 2 types of version 1 dump streams. The
regular ones are generated since r2634 (svn 0.14.0). Older ones
also claim to be version 1, but miss the Props-content-length
and Text-content-length fields in the block header. In those
days there *always* was a properties block.
EXAMPLE:
Here's an example of revision 1422, whereby I added a new directory
"baz", added a new file "bop" inside it, and modified the file "foo.c":
Revision-number: 1422
Prop-content-length: 80
Content-length: 80
K 6
author
V 7
sussman
K 3
log
V 33
Added two files, changed a third.
PROPS-END
Node-path: bar/baz
Node-kind: dir
Node-action: add
Prop-content-length: 35
Content-length: 35
K 10
svn:ignore
V 4
TAGS
PROPS-END
Node-path: bar/baz/bop
Node-kind: file
Node-action: add
Prop-content-length: 76
Text-content-length: 54
Content-length: 130
K 14
svn:executable
V 2
on
K 12
svn:keywords
V 15
LastChangedDate
PROPS-END
Here is the text of the newly added 'bop' file.
Whee.
Node-path: bar/foo.c
Node-kind: file
Node-action: change
Text-content-length: 102
Content-length: 102
Here is the fulltext of my change to an existing /bar/foo.c.
Notice that this file has no properties.
-*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*-
Old discussion:
(This file started as a proposal, preserved here for posterity.)
A proposal for an svn filesystem dump/restore format.
Two problems we want to solve
=============================
1. When we change our node-id schema, we need to migrate all of our
data (by dumping and restoring).
2. Serves as a backup format. Could be read by other software tools
someday.
Design Goals
============
A. Written as two new public functions in svn_fs.h. To be invoked
by new 'svnadmin' subcommands.
B. Format uses only timeless fs concepts.
The dump format needs to reference concepts that we *know* are
general enough to never change. These concepts must exist
independently of any internal node-id schema, or any DB storage
backend. In other words, we're talking about the basic ideas in
our original "design spec" from May 2000.
Format Semantics
================
Here are the timeless semantics of our fs design -- the things that
would be stored in our dump format.
- A filesystem is an array of trees.
Each tree is called a "revision" and has unversioned properties attached.
- A revision has a tree of "nodes" hanging off of it.
Actually, the nodes in the filesystem form a DAG. A revision
always points to an initial node that represents the 'root' of some tree.
- The majority of a tree's nodes are hard-links (references) to
nodes that were created in earlier trees.
- A node contains
- versioned text
- versioned properties
- predecessor history: "which node am I a variant of?"
- copy history: "which node am I a copy of?"
The history values can be non-existent (meaning the node is
completely new), or can have a value of {revision, path}.
------------------------------------------------------------------------
Refinement of proposal #2: (after discussion with gstein)
=========================
Each node starts with RFC822-style headers at the top. The final
header is a 'Content-length:', followed by the content, so record
boundaries can be inferred.
The content section has two implicit parts: a property hash, and the
fulltext. The division between these two sections is implied by the
"PROPS-END\n" tag at the end of the prophash. In the case of a
directory node or a revision, only the prophash is present.