Patching Files: Difference between revisions

From wowdev
Jump to navigation Jump to search
mNo edit summary
 
(9 intermediate revisions by 2 users not shown)
Line 2: Line 2:
Have a look at "http://www.daemonology.net/bsdiff/" or "http://www.pokorra.de/coding/bsdiff.html"
Have a look at "http://www.daemonology.net/bsdiff/" or "http://www.pokorra.de/coding/bsdiff.html"


--------------
Depending on version, this might not be BSDIFF40 but ZBSDIFF1, which is a variant of BSDIFF40 with seemingly no differences than exchanging BZ2 library calls with their libz inflate equivalents.


I had try bsdiff, but it doesn't work.
=Format=
==bsdiff_int64_t==


  1. extract achievement.dbc from enUS\DBFilesClient in Patches\WoW-3.1.3-to-3.2.0-enUS-Win-patch\wow-partial-1.MPQ
For unknown reason, bsdiff reimplements signed integrals. It also uses them for all values, even those that will never be negative (literally all but <tt>seek_in_input</tt>), because bsdiff is horrible code.
  2. rename it to achievement.dbc.patch (45K)
  3. extract achievement.dbc from DBFilesClient in Data\enUS\patch-enUS-2.MPQ
  4. rename it to achievement.dbc.old
  5. bsdiff
    usage: bspatch.exe oldfile newfile patchfile
  6. bsdiff chievement.dbc.old achievement.dbc achievement.dbc.patch
    Corrupt patch
  7. extract achievement.dbc from DBFilesClient in PTR:Data\enUS\patch-enUS-2.MPQ
  8. rename it to achievement.dbc.ptr
  9. bsdiff achievement.dbc.old achievement.dbc.ptr achievement.dbc.patch-ptr (20K)


althought it shows "BSDIFF40" in it's header, but it is not a bsdiff?
struct bsdiff_int64_t {
  int64_t value : 63;
  int64_t sign : 1;
  operator int64_t() const { return sign ? -value : value; }
  bsdiff_int64_t (int64_t x) : value (abs (x)), sign (value < 0) {}
};
int64_t alternative_manual_implementation (uint64_t raw) {
  int64_t const value = raw & 0x7FFFFFFFFFFFFFFF;
  return                raw & 0x8000000000000000 ? -value : value;
}
uint64_t alternative_manual_implementation (int64_t raw) {
  return abs (raw) | (raw < 0 ? 0x8000000000000000 : 0);
}


-- chuanhsing
==File==


----------------
The files themselves are a rather simple format:


I got some information about files in pre-download mpq. All files
struct {
inside it are "patch" files.
  char magic[8];                                          // "ZBSDIFF1" or "BSDIFF40"
  bsdiff_int64_t control_block_size;
  bsdiff_int64_t diff_block_size;
  bsdiff_int64_t output_file_size;
} header;
char compressed_control_block[header.control_block_size]; // format as given in [[#Control_block]]
char compressed_diff_block[header.diff_block_size];      // raw data
char compressed_extra_block[0];                          // to the end of the file


Followings are for binary/new patch header:
where compressed blocks are either BZ2 or zlib compressed depending on <tt>header.magic</tt>.


  0x0000 2 bytes: Size of the patch header (I've only seen 0x18)
===Control block===
  0x0002 2 bytes: signature,
          * 0x0104 means binary patch
  0x0004 8 bytes: unknown, always 8 bytes 0x00
  0x000C 4 bytes: file size n
  0x0010 8 bytes: timestamp
  ---- body ----
  0x0018 n bytes: the real thing


Followings are for plain text patch header:
While the size is given in bytes, the decompressed control block entries are always the same structure


  0x0000 2 bytes: Size of the patch header (I've only seen 0x18)
struct {
  0x0002 2 bytes: signature,
  bsdiff_int64_t bytes_from_diff_block;
          * 0x0404 means plain text patch
  bsdiff_int64_t bytes_from_extra_block;
  0x0004 8 bytes: maybe timestamp
  bsdiff_int64_t seek_in_input;
  0x000C 4 bytes: file size
};
  0x0010 8 bytes: timestamp
  ---- body ----
  0x0018 4 bytes: file size
  0x001C 1 byte: unknown
  0x001D 8 bytes: signature, "BSDIFF40"
  0x0025 n bytes: unknown


For the new files, I can skip heading 24 bytes and get the real
=How to patch=
thing! But I still can't understand the things after BSDIFF40.


-- chuanhsing
To patch a file, first decompress the blocks and iterate the data according to <tt>control_block</tt>:


-----------------------------
* Copy <tt>bytes_from_diff_block</tt> data from input, bytewise <tt>+=</tt> bytes from <tt>diff_block</tt> and copy to output: <tt>o[x] = i[x] + d[x]</tt>
* Copy <tt>bytes_from_extra_block</tt> bytes from <tt>extra_block</tt> to the output: <tt>o[x] = e[x]</tt>
* Seek <tt>seek_in_input</tt> in input, keep offset in output.
* Repeat.


As about patch files in MPQs, I found this structure so far:
This means that 


  0x0000 (2 bytes) Size of the patch header (I've only seen 0x18)
* copying without modification: diff block filled with 0 (and rely on compression to make it small)
  0x0002 (2 bytes) Flags.
* copying with modification: diff block filled with bytewise diff
  - 0x0004 - Seems to be always set
* addition: extra bytes
  - 0x0100 - Unknown
* removal: seek over removed bytes
  - 0x0400 - The patch file is made by bin patch generator)
* tuples of up to three operations are collapsed into one control blovk
  0x0004 (4 bytes) Looks like CRC
  0x0008 (4 bytes) Unknown
  (looks like size of something when 0x0100 flag set)
  0x000C (4 bytes) File size (only if Flags = 0x0104)
  0x0010 (8 bytes) Time stamp as FILETIME
  ---- body ----


-- chuanhsing
 
For an implementation, consult <tt>bspatch</tt> from BSDIFF4.
 
[[Category:Client]]

Latest revision as of 18:17, 26 June 2018

For updating files WoW uses bsdiff. Have a look at "http://www.daemonology.net/bsdiff/" or "http://www.pokorra.de/coding/bsdiff.html"

Depending on version, this might not be BSDIFF40 but ZBSDIFF1, which is a variant of BSDIFF40 with seemingly no differences than exchanging BZ2 library calls with their libz inflate equivalents.

Format

bsdiff_int64_t

For unknown reason, bsdiff reimplements signed integrals. It also uses them for all values, even those that will never be negative (literally all but seek_in_input), because bsdiff is horrible code.

struct bsdiff_int64_t {
  int64_t value : 63;
  int64_t sign : 1;
  operator int64_t() const { return sign ? -value : value; }
  bsdiff_int64_t (int64_t x) : value (abs (x)), sign (value < 0) {}
};

int64_t alternative_manual_implementation (uint64_t raw) {
  int64_t const value = raw & 0x7FFFFFFFFFFFFFFF;
  return                raw & 0x8000000000000000 ? -value : value;
}
uint64_t alternative_manual_implementation (int64_t raw) {
  return abs (raw) | (raw < 0 ? 0x8000000000000000 : 0);
}

File

The files themselves are a rather simple format:

struct {
  char magic[8];                                          // "ZBSDIFF1" or "BSDIFF40"
  bsdiff_int64_t control_block_size;
  bsdiff_int64_t diff_block_size;
  bsdiff_int64_t output_file_size;
} header;
char compressed_control_block[header.control_block_size]; // format as given in #Control_block
char compressed_diff_block[header.diff_block_size];       // raw data
char compressed_extra_block[0];                           // to the end of the file

where compressed blocks are either BZ2 or zlib compressed depending on header.magic.

Control block

While the size is given in bytes, the decompressed control block entries are always the same structure

struct {
  bsdiff_int64_t bytes_from_diff_block;
  bsdiff_int64_t bytes_from_extra_block;
  bsdiff_int64_t seek_in_input;
};

How to patch

To patch a file, first decompress the blocks and iterate the data according to control_block:

  • Copy bytes_from_diff_block data from input, bytewise += bytes from diff_block and copy to output: o[x] = i[x] + d[x]
  • Copy bytes_from_extra_block bytes from extra_block to the output: o[x] = e[x]
  • Seek seek_in_input in input, keep offset in output.
  • Repeat.

This means that

  • copying without modification: diff block filled with 0 (and rely on compression to make it small)
  • copying with modification: diff block filled with bytewise diff
  • addition: extra bytes
  • removal: seek over removed bytes
  • tuples of up to three operations are collapsed into one control blovk


For an implementation, consult bspatch from BSDIFF4.