Latest revision as of 18:17, 26 June 2018

For updating files WoW uses bsdiff. Have a look at "http://www.daemonology.net/bsdiff/" or "http://www.pokorra.de/coding/bsdiff.html"

Depending on version, this might not be BSDIFF40 but ZBSDIFF1, which is a variant of BSDIFF40 with seemingly no differences than exchanging BZ2 library calls with their libz inflate equivalents.

Format

bsdiff_int64_t

For unknown reason, bsdiff reimplements signed integrals. It also uses them for all values, even those that will never be negative (literally all but seek_in_input), because bsdiff is horrible code.

struct bsdiff_int64_t {
  int64_t value : 63;
  int64_t sign : 1;
  operator int64_t() const { return sign ? -value : value; }
  bsdiff_int64_t (int64_t x) : value (abs (x)), sign (value < 0) {}
};

int64_t alternative_manual_implementation (uint64_t raw) {
  int64_t const value = raw & 0x7FFFFFFFFFFFFFFF;
  return                raw & 0x8000000000000000 ? -value : value;
}
uint64_t alternative_manual_implementation (int64_t raw) {
  return abs (raw) | (raw < 0 ? 0x8000000000000000 : 0);
}

File

The files themselves are a rather simple format:

struct {
  char magic[8];                                          // "ZBSDIFF1" or "BSDIFF40"
  bsdiff_int64_t control_block_size;
  bsdiff_int64_t diff_block_size;
  bsdiff_int64_t output_file_size;
} header;
char compressed_control_block[header.control_block_size]; // format as given in #Control_block
char compressed_diff_block[header.diff_block_size];       // raw data
char compressed_extra_block[0];                           // to the end of the file

where compressed blocks are either BZ2 or zlib compressed depending on header.magic.

Control block

While the size is given in bytes, the decompressed control block entries are always the same structure

struct {
  bsdiff_int64_t bytes_from_diff_block;
  bsdiff_int64_t bytes_from_extra_block;
  bsdiff_int64_t seek_in_input;
};

How to patch

To patch a file, first decompress the blocks and iterate the data according to control_block:

Copy bytes_from_diff_block data from input, bytewise += bytes from diff_block and copy to output: o[x] = i[x] + d[x]
Copy bytes_from_extra_block bytes from extra_block to the output: o[x] = e[x]
Seek seek_in_input in input, keep offset in output.
Repeat.

This means that

copying without modification: diff block filled with 0 (and rely on compression to make it small)
copying with modification: diff block filled with bytewise diff
addition: extra bytes
removal: seek over removed bytes
tuples of up to three operations are collapsed into one control blovk

For an implementation, consult bspatch from BSDIFF4.

@@ Line 2: / Line 2: @@
 Have a look at "http://www.daemonology.net/bsdiff/" or "http://www.pokorra.de/coding/bsdiff.html"
---------------
+Depending on version, this might not be BSDIFF40 but ZBSDIFF1, which is a variant of BSDIFF40 with seemingly no differences than exchanging BZ2 library calls with their libz inflate equivalents.
-I had try bsdiff, but it doesn't work.
+=Format=
+==bsdiff_int64_t==
-. extract achievement.dbc from enUS\DBFilesClient in Patches\WoW-3.1.3-to-3.2.0-enUS-Win-patch\wow-partial-1.MPQ
+For unknown reason, bsdiff reimplements signed integrals. It also uses them for all values, even those that will never be negative (literally all but <tt>seek_in_input</tt>), because bsdiff is horrible code.
-. rename it to achievement.dbc.patch (45K)
-. extract achievement.dbc from DBFilesClient in Data\enUS\patch-enUS-2.MPQ
-. rename it to achievement.dbc.old
-. bsdiff
-    usage: bspatch.exe oldfile newfile patchfile
-. bsdiff chievement.dbc.old achievement.dbc achievement.dbc.patch
-    Corrupt patch
-. extract achievement.dbc from DBFilesClient in PTR:Data\enUS\patch-enUS-2.MPQ
-. rename it to achievement.dbc.ptr
-. bsdiff achievement.dbc.old achievement.dbc.ptr achievement.dbc.patch-ptr (20K)
-althought it shows "BSDIFF40" in it's header, but it is not a bsdiff?
+ struct bsdiff_int64_t {
+  int64_t value : 63;
+  int64_t sign : 1;
+  operator int64_t() const { return sign ? -value : value; }
+  bsdiff_int64_t (int64_t x) : value (abs (x)), sign (value < 0) {}
+ };
+ int64_t alternative_manual_implementation (uint64_t raw) {
+  int64_t const value = raw & 0x7FFFFFFFFFFFFFFF;
+  return                raw & 0x8000000000000000 ? -value : value;
+ }
+ uint64_t alternative_manual_implementation (int64_t raw) {
+  return abs (raw) | (raw < 0 ? 0x8000000000000000 : 0);
+ }
--- chuanhsing
+==File==
-----------------
+The files themselves are a rather simple format:
-I got some information about files in pre-download mpq. All files
+ struct {
-inside it are "patch" files.
+  char magic[8];                                          // "ZBSDIFF1" or "BSDIFF40"
+  bsdiff_int64_t control_block_size;
+  bsdiff_int64_t diff_block_size;
+  bsdiff_int64_t output_file_size;
+ } header;
+ char compressed_control_block[header.control_block_size]; // format as given in [[#Control_block]]
+ char compressed_diff_block[header.diff_block_size];      // raw data
+ char compressed_extra_block[0];                          // to the end of the file
-Followings are for binary/new patch header:
+where compressed blocks are either BZ2 or zlib compressed depending on <tt>header.magic</tt>.
-x0000 2 bytes: Size of the patch header (I've only seen 0x18)
+===Control block===
-x0002 2 bytes: signature,
-          * 0x0104 means binary patch
-x0004 8 bytes: unknown, always 8 bytes 0x00
-x000C 4 bytes: file size n
-x0010 8 bytes: timestamp
-  ---- body ----
-x0018 n bytes: the real thing
-Followings are for plain text patch header:
+While the size is given in bytes, the decompressed control block entries are always the same structure
-x0000 2 bytes: Size of the patch header (I've only seen 0x18)
+ struct {
-x0002 2 bytes: signature,
+  bsdiff_int64_t bytes_from_diff_block;
-          * 0x0404 means plain text patch
+  bsdiff_int64_t bytes_from_extra_block;
-x0004 8 bytes: maybe timestamp
+  bsdiff_int64_t seek_in_input;
-x000C 4 bytes: file size
+ };
-x0010 8 bytes: timestamp
-  ---- body ----
-x0018 4 bytes: file size
-x001C 1 byte: unknown
-x001D 8 bytes: signature, "BSDIFF40"
-x0025 n bytes: unknown
-For the new files, I can skip heading 24 bytes and get the real
+=How to patch=
-thing! But I still can't understand the things after BSDIFF40.
--- chuanhsing
+To patch a file, first decompress the blocks and iterate the data according to <tt>control_block</tt>:
------------------------------
+* Copy <tt>bytes_from_diff_block</tt> data from input, bytewise <tt>+=</tt> bytes from <tt>diff_block</tt> and copy to output: <tt>o[x] = i[x] + d[x]</tt>
+* Copy <tt>bytes_from_extra_block</tt> bytes from <tt>extra_block</tt> to the output: <tt>o[x] = e[x]</tt>
+* Seek <tt>seek_in_input</tt> in input, keep offset in output.
+* Repeat.
-As about patch files in MPQs, I found this structure so far:
+This means that
-x0000 (2 bytes) Size of the patch header (I've only seen 0x18)
+* copying without modification: diff block filled with 0 (and rely on compression to make it small)
-x0002 (2 bytes) Flags.
+* copying with modification: diff block filled with bytewise diff
-  - 0x0004 - Seems to be always set
+* addition: extra bytes
-  - 0x0100 - Unknown
+* removal: seek over removed bytes
-  - 0x0400 - The patch file is made by bin patch generator)
+* tuples of up to three operations are collapsed into one control blovk
-x0004 (4 bytes) Looks like CRC
-x0008 (4 bytes) Unknown
-  (looks like size of something when 0x0100 flag set)
-x000C (4 bytes) File size (only if Flags = 0x0104)
-x0010 (8 bytes) Time stamp as FILETIME
-  ---- body ----
--- chuanhsing
+For an implementation, consult <tt>bspatch</tt> from BSDIFF4.
+[[Category:Client]]

Patching Files: Difference between revisions

Latest revision as of 18:17, 26 June 2018

Contents

Format

bsdiff_int64_t

File

Control block

How to patch

Navigation menu

Patching Files: Difference between revisions

Latest revision as of 18:17, 26 June 2018

Format

bsdiff_int64_t

File

Control block

How to patch

Navigation menu

Search