DB2

From wowdev
Revision as of 19:15, 6 September 2016 by Barncastle (talk | contribs)

Jump to: navigation, search
This section only applies to versions ≥ Cata.

DB2 files are the new version of client side databases, introduced in Cataclysm, containing data about items, NPCs, environment, world and a lot more. They are similar in many ways to DBC files, so you may want to also look at those. They both have headers with similar data, records containing fields of various types with various data, and a string block where 4-byte string references are made. The difference is that DB2 files have become wildly more complex than their simple DBC brethren, especially in recent times. The structure described here is also used in ADB files, which are a cache of dynamically streamed database entries, typically used hotfixes and content that Blizzard wants to hide from dataminers until the last minute.

Table content structures

This page describes the structure of DB2 and ADB files. For a list of existing DB2 files and their contents see the categories DBC, Vanilla, Burning Crusade, Wrath of the Lich King, Cataclysm, Mists of Pandaria and Warlords of Draenor. If you add documentation for a file, please add the correct categories (also the build number) as well.

Field Types

WDB2 / WCH3 began with the following possible field types:

64-bit Integers*
32-bit Integers*
8-bit Integers*
Floats
Strings (strings are represented in the record data as a 32-bit unsigned integer, see the String Block section for more information)

Additionally, WDB3 / WCH4 added the following possible field type:

16-bit Integers*

Most recently, WDB5 added the following possible field type:

24-bit Integers

Note that Blizzard does not differentiate between signed and unsigned field types; WoW code simply casts the data around as it needs. Because of this, some fields will make more sense as signed integers (example: a casting time reduction) and some fields will make more sense as unsigned integers (example: bitfields). You will have to make the determination as to which fields should be which on your own. Personally, I default to Signed 32-bit Integers, Unsigned 16-bit Integers, and Unsigned 8-bit Integers.

As an addendum to the above paragraph, in the binary's DBCMeta for WDB5-ready Legion clients, there is a flag for each field that can designate 'do not sign-extend when decompressing'. However, there are fields which obviously contain unsigned data that are not marked with this flag, so it is not exhaustive.

64-bit integers have only made an appearance in a very tiny number of DB2s (usually 1 DB2 at any given time) and only for short periods of time, so any tool supporting DB2 files can likely put 64-bit integer support on the backburner.

Determining Field Types

In WDB2/WCH3, you can mostly just assume that every field will be four bytes. Three out of the five possibilities are four bytes, and the 8-bit integers are only used very, very rarely (literally like 3-4 files out of hundreds). The 64-bit Integers are actually only used ONCE so far (in CriteriaTree it was needed to store the number 2,500,000,000, which is 250k gold in copper). Deciding whether or not a four-byte value is a float, an integer, or a string is not terribly difficult (floats will basically always have certain bits set and every value in a string field will be an offset into the string block, which you can check), and this approach gives you ~98% compatibility with the WDB2/WCH3 format with minimal effort.

In WDB3, WDB4, and WCH4+, things become much, much harder. Determining field types on the fly is virtually impossible, and the majority of DBCs (80%+) have at least one field that is not four bytes. The only proper solution is to read the WoW binary executable and parse it for the DBCMeta structure. In that structure you will find the field types for all fields.

String Block

DB2 records can contain localized strings. In contrast to DBCs, a DB2 file only contains localized values for a given locale (header.locale).

Since Cataclysm, all DB files contain only localized string values, including DBCs.

The rest of the string block is equivalent to DBC version. See documentation there.

WDB2 (.db2) / WCH3 (.adb)

This section only applies to versions ≤ WoD.

This file format was introduced in Cataclysm. It was phased out in favor of WDB3/WCH4 in Legion (Patch 7.0.1 build 20740).

Structure

See WCH3-specific Concerns for an issue with the 'field_count' header variable in .ADB files.

struct db2_header
{
  uint32_t magic;                                               // 'WDB2' for .db2 (database), 'WCH3' for .adb (cache)
  uint32_t record_count;
  uint32_t field_count;                                         // array fields count as the size of array for WDB2 but the entire array only counts as '1' field for WCH3
  uint32_t record_size;
  uint32_t string_table_size;                                   // string block almost always contains at least one zero-byte
  uint32_t table_hash;
  uint32_t build;
  uint32_t timestamp_last_written;                              // set to time(0); when writing in WowClientDB2_Base::Save()
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
  uint32_t copy_table_size;                                     // always zero in WDB2 (?) - see WDB3 for information on how to parse this
};

template<typename record_type>
struct db2_file
{
  db2_header header;
  // static_assert (header.record_size == sizeof (record_type));

  if (header.max_id != 0)
  {
    int indices[header.max_id - header.min_id + 1];             // maps from id to row index in records[] below
    short string_lengths[header.max_id - header.min_id + 1];    // sum of lengths of all strings in row
  }
 
  record_type records[header.record_count];*
  char string_table[header.string_table_size];
};
  • Note: Each record is aligned to the length of the longest field type in the record. If you have a record with three fields: Int32, Int8, and Int8 - then there will be 2 bytes of padding at the end of every record, including the last one.

WCH3-specific Concerns

Take for example, the following structure (which I made up):

uint32_t ID;
uint32_t ItemID[8];
uint32_t NPCID[2];

The field count in the header of the DB2 file will be 11, which you would expect since a record would contain 11 uint32_t values. However, the field count in the header of the counterpart ADB file will be 3. It counts arrays as one field, regardless of how long that 'field' would have to be. This is very likely a serialization bug in WoW's DB2 or ADB writing, but it has gone unchanged for over half a decade, so you need to plan for it if you rely on field count. If you're reading and parsing DBCMeta directly, you don't really have to ever check or care about field count, though. Blizzard certainly doesn't, which is why this bug has gone unnoticed.*

  • WDB5 actually addressed this problem many years later. This was because Blizzard actually started using field count in parsing DB2s, so they finally noticed and fixed the bug.

WDB3 (.db2) / WCH4 (.adb)

This section only applies to versions Legion (7.0.1.20740) … < Legion (7.0.1.20810).

The file extensions remain unchanged from the previous format. This file format was introduced in Legion (Patch 7.0.1 build 20740). It was phased out in favor of WDB4/WCH5 just a few builds later, also in Legion (Patch 7.0.1 build 20810). As such, support for WDB3 and WCH4 is likely important for almost nobody.

The major changes are the addition of an optional offset map which forces records to have inline strings, an optional block after the string block that contains non-inline IDs, and an optional block after that block which contains the ID numbers of rows that have been deduplicated to save space.

It is worth noting that min_id, max_id, and copy_table_size fields now see different use, despite the header remaining the same. min_id and max_id are always (?) non-zero, even when the offset_map structure is present. Additionally, the copy_table_size field is now non-zero sometimes, requiring action.

Structure for WDB3

See WCH4-specific Concerns for how to adapt this structure for the .ADB counterpart.

template<typename record_type>
struct wdb3_file
{
  db2_header header;
  struct offset_map_entry
  {
    uint32_t offset;                                             // this offset is absolute, not relative to another structure; this can (and often will) be zero, in which case you should ignore that entry and move on
    uint16_t length;                                             // this is the length of the record located at the specified offset
  };
  offset_map_entry offset_map[header.max_id - header.min_id + 1];*
  record_type records[header.record_count]; 
  char string_table[header.string_table_size];
  uint32_t IDs[header.record_count];*
  if (header.copy_table_size > 0)
  {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[header.copy_table_size / copy_table_entry.size()];
  }
};

Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string block usually still exists, just as size 2 with two blank entries.

  • This part of the structure is optional.

WCH4-specific Concerns

In .adb files with the WCH4 signature, the issue with field_count described above in WCH3-specific Concerns still exists. Additionally, the optional offset_map structure changes. Specifically, this part defined in the structure above is changed:

struct offset_map_entry
{
  uint32_t ID;
  uint32_t offset;
  uint16_t length;
};

Note that, in most cases, the non-inline IDs structure still exists, even if it is made completely redundant by the existence of the offset map. After all, why do work once when you could do it twice?

WDB4 (.db2) / WCH5/WCH6 (.adb)

This section only applies to versions Legion (7.0.1.20810) … < Legion (7.0.3.21414).

The file extensions remain unchanged from the previous format. This file format was introduced in Legion (Patch 7.0.1 build 20810) and is still in use today. It was partially phased out in favor of WDB5 just a few months later, also in Legion (Patch 7.0.3 build 21414). As such, support for WDB4 and WCH5/WCH6 is likely important for almost nobody.

The offset_map structure has moved back farther in the file and is now located in between the string table and the non-inline IDs table (this change does not apply to WCH5 - see WCH5-specific Concerns). Additionally, while not a format change, the string table is now always 0 bytes if the offset map is present. In those cases, the 'string_table_size' header value changes its meaning to "offset to the start of the offset map". This is a minor optimization since parsing the file must begin with the offset map.

The header has changed from the original format; it gained one new field ('flags'). This flags field allows for easy detection of the optional structures.

Structure for WDB4

See WCH5-specific Concerns for how to adapt this structure for the .ADB counterpart.

struct wdb4_db2_header
{
  uint32_t magic;                                               // 'WDB4' for .db2 (database), 'WCH5' for .adb (cache)
  uint32_t record_count;
  uint32_t field_count;                                         // array fields count as the size of array for WDB4 but the entire array only counts as '1' field for WCH5
  uint32_t record_size;
  uint32_t string_table_size;                                   // if flags & 0x01 != 0, this field takes on a new meaning - it becomes an absolute offset to the beginning of the offset_map
  uint32_t table_hash;
  uint32_t build;
  uint32_t timestamp_last_written;                              // set to time(0); when writing in WowClientDB2_Base::Save()
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
  uint32_t copy_table_size;
  uint32_t flags;                                               // in WDB3/WCH4, this field was in the WoW executable's DBCMeta; possible values are listed in Known Flag Meanings
};

template<typename record_type>
struct wdb4_file
{
  wdb4_db2_header header;
  record_type records[header.record_count]; 
  char string_table[header.string_table_size];
  if (flags & 0x01 != 0)
  {
    struct offset_map_entry
    {
      uint32_t offset;                                           // this offset is absolute, not relative to another structure; this can (and often will) be zero, in which case you should ignore that entry and move on
      uint16_t length;                                           // this is the length of the record located at the specified offset
    };
    offset_map_entry offset_map[header.max_id - header.min_id + 1];
  }
  if (flags & 0x04 != 0)
  {
    uint32_t IDs[header.record_count];
  }
  if (header.copy_table_size > 0)
  {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[header.copy_table_size / copy_table_entry.size()];
  }
};

Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string_table will not exist.

WCH5/WCH6-specific Concerns

struct wch5_adb_pre_21737_header
{
  uint32_t magic;                                               // 'WCH5' for .adb (cache)
  uint32_t record_count;
  uint32_t field_count;                                         // array fields count as the size of array for WDB4 but the entire array only counts as '1' field for WCH5
  uint32_t record_size;
  uint32_t string_table_size;                                   // if flags & 0x01 != 0, this field takes on a new meaning - it becomes an absolute offset to the beginning of the offset_map
  uint32_t table_hash;
  uint32_t build;
  uint32_t timestamp_last_written;                              // set to time(0); when writing in WowClientDB2_Base::Save()
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
};

The issue with field counts from WCH3 remains in effect (WCH3-specific Concerns), and the offset map changes from WCH4 also remain in effect (WCH4-specific Concerns). Note that the offset map changes from WCH4 (to include an 'ID' field for each entry) override the offset map changes for WDB4 (to move the offset map after the string table), so the offset map's position remains unchanged compared to WCH4. Additionally, WCH5 files are missing two header fields compared to its WDB4 counterpart (the last two fields - copy_table_size and flags).

Since the deduplication technology we call 'copy_table' is not employed in any of the .ADB formats, copy_table_size variable was always zero anyway. As for flags, Blizzard always loads the .DB2 file before loading the .ADB file, so the flags field being in both files is just redundant (for them); any .ADB file will be read using the flags field from the .DB2 file of the same name. This is inconvenient for us, as most (if not all) of us load .ADB files standalone at the moment, but there are workarounds (such as having a configuration file to remember the last used flags value for each named .DB2 file).

struct wch5_adb_post_21737_header
{
  uint32_t magic;                                               // 'WCH5' for .adb (cache)
  uint32_t record_count;
  uint32_t field_count;                                         // array fields count as the size of array for WDB4 but the entire array only counts as '1' field for WCH5
  uint32_t record_size;
  uint32_t string_table_size;                                   // if flags & 0x01 != 0, this field takes on a new meaning - it becomes an absolute offset to the beginning of the offset_map
  uint32_t table_hash;
  uint32_t layout_hash;                                         // from build 21737 onward, this field is present
  uint32_t build;
  uint32_t timestamp_last_written;                              // set to time(0); when writing in WowClientDB2_Base::Save()
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
};

After Legion build 21737, an additional header field was added (right before 'build') called 'layout_hash'. This field's appearance is similar to the change to the WDB5 header made in the same build, except in that case, the 'build' field was removed when the 'layout_hash' field was added. For the post-21737 WCH5 header, both fields are present.

WCH5 was phased out in favor of in Legion (Patch 7.0.3 build 22345). However, literally nothing changed in WCH6. The introduction of this incremented signature was probably a temporary countermeasure to combat a problem Blizzard was having with 'build' not being updated successfully. This would leave ADBs around basically forever (since they are only wiped when current build is greater than the one in the ADBs being loaded), which could cause conflicts if you downloaded an updated client with new DB2s.

WDB5 (.db2)

This section only applies to versions ≥ Legion (7.0.3.21479).

The file extension remains unchanged from the previous format. This file format was introduced in Legion (Patch 7.0.3 build 21479) and is still in use today. There have been a variety of ADB formats used simultaneously with WDB5, including WCH5, WCH6, and WCH7.

There are very significant changes to the format in WDB5 which will require substantial effort for existing tools to support. The main changes are that the field_count now counts arrays as '1' (mirroring very old ADB behavior), a field_structure block has been added directly after the header, and the addition of compressed fields (and 24-bit integers). The introduction of the field_structure block, in particular, is hugely positive for us and should improve the accuracy of DB2 parsers or at least reduce their dependence on the WoW binary's DBCMeta. Note that the WoW binary's DBCMeta will often disagree with the 'field_structure' (DBCMeta might say 'int32' but the field_structure block says the size is '3-bytes'). In those cases, the field_structure block takes priority. Additionally, the DBCMeta is still relevant for parsing ADBs, as they do not support compression.

The header has lost a field compared to WDB4, timestamp_last_written. This field was useless in DB2s and always 0, so its removal is understandable.

In build 21737, a few builds after the introduction of WDB5, more changes to the header were made. The 'flags' header field was split into two shorts - 'flags' and 'id_index'. 'id_index' is very valuable since the inline ID fields no longer have to be the first field in WDB5; they can appear anywhere in the record. This index lets you know which field is ID, so you can, for example, move it to the beginning of the record for the ease of the viewer. Additionally, the 'build' field was changed to be 'layout_hash'. This hash is unique to the specific column layout, including (at least) position and size (possibly names and array size as well). This replacement is actually very beneficial to us because when layout_hash changes, you know the structure changed.

Structure for WDB5

struct wdb5_db2_header
{
  uint32_t magic;                                               // 'WDB5' for .db2 (database)
  uint32_t record_count;
  uint32_t field_count;                                         // for the first time, this counts arrays as '1'; in the past, only the WCH* variants have counted arrays as 1 field
  uint32_t record_size;
  uint32_t string_table_size;                                   // if flags & 0x01 != 0, this field takes on a new meaning - it becomes an absolute offset to the beginning of the offset_map
  uint32_t table_hash;
  uint32_t layout_hash;                                         // used to be 'build', but after build 21737, this is a new hash field that changes only when the structure of the data changes
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
  uint32_t copy_table_size;
  uint16_t flags;                                               // in WDB3/WCH4, this field was in the WoW executable's DBCMeta; possible values are listed in Known Flag Meanings
  uint16_t id_index;                                            // new in WDB5 (and only after build 21737), this is the index of the field containing ID values; this is ignored if flags & 0x04 != 0
};

template<typename record_type>
struct wdb5_file
{
  wdb5_db2_header header;
  struct field_structure
  {
    int16_t size;                                               // size in bits as calculated by: byteSize = (32 - size) / 8; this value can be negative to indicate field sizes larger than 32-bits
    uint16_t position;                                          // position of the field within the record, relative to the start of the record
  };
  field_structure fields[header.field_count];
  record_type records[header.record_count];
  char string_table[header.string_table_size];
  if (flags & 0x01 != 0)
  {
    struct offset_map_entry
    {
      uint32_t offset;                                          // this offset is absolute, not relative to another structure; this can (and often will) be zero, in which case you should ignore that entry and move on
      uint16_t length;                                          // this is the length of the record located at the specified offset
    };
    offset_map_entry offset_map[header.max_id - header.min_id + 1];
  }
  if (flags & 0x04 != 0)
  {
    uint32_t IDs[header.record_count];
  }
  if (header.copy_table_size > 0)
  {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[header.copy_table_size / copy_table_entry.size()];
  }
};

Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string_table will not exist.

WCH7 (.adb)

This section only applies to versions ≥ Legion (7.0.3.22451).

The file extension remains unchanged from the previous format. This file format was introduced in Legion (Patch 7.0.3 build 22451). It was used alongside WDB5.

A new header field has been added between 'record_count' and 'field_count'. This field designates the size of a new table which is located between the string table and the non-inline IDs table. All records are padded to 'record_size'.

WCH8 (.adb)

This section only applies to versions ≥ Legion (7.0.3.22484).

The file extension remains unchanged from the previous format. This file format was introduced in Legion (Patch 7.0.3 build 22484) and is still in use today. It is used alongside WDB5.

The only difference to WCH7 is the padding of files with an offset map. These are padded as per the WCH5 specification again; presumably as this was unnecessarily changed in WCH7.

Known Flag Meanings for WDB4 / WDB5

flags & 0x01 = 'Has offset map'
flags & 0x02 = 'Has secondary key'
flags & 0x04 = 'Has non-inline IDs'