DB2

From wowdev
Revision as of 20:12, 8 December 2017 by Simca (talk | contribs) (Restructured page greatly, moving ADB content out to its own page.)
Jump to navigation Jump to search
This section only applies to versions ≥ Cata.

DB2 files are the new version of client side databases, introduced in Cataclysm, containing data about items, NPCs, environment, world and a lot more. They are similar in many ways to DBC files, so you may want to also look at those. They both have headers with similar data, records containing fields of various types with various data, and a string block where 4-byte string references are made. The difference is that DB2 files have become wildly more complex than their simple DBC brethren, especially in recent times. The structure described here was also used in ADB files for many years, which are a cache of dynamically streamed database entries, typically used hotfixes and content that Blizzard wants to hide from dataminers until the last minute. Recently the file structures of DB2 files and ADB files have diverged greatly.

Table content structures

This page describes the structure of DB2 files. For a list of existing DB2 files and their contents see the categories DBC, Vanilla, Burning Crusade, Wrath of the Lich King, Cataclysm, Mists of Pandaria and Warlords of Draenor. If you add documentation for a file, please add the correct categories (also the build number) as well.

Field Types

WDB2 / WCH3 began with the following possible field types:

64-bit Integers*
32-bit Integers*
8-bit Integers*
Floats
Strings (strings are represented in the record data as a 32-bit unsigned integer, see the String Block section for more information)

Additionally, WDB3 / WCH4 added the following possible field type:

16-bit Integers*

Note that Blizzard does not differentiate between signed and unsigned field types; WoW code simply casts the data around as it needs. Because of this, some fields will make more sense as signed integers (example: a casting time reduction) and some fields will make more sense as unsigned integers (example: bitfields). You will have to make the determination as to which fields should be which on your own. Personally, I default to Signed 32-bit Integers, Unsigned 16-bit Integers, and Unsigned 8-bit Integers.

As an addendum to the above paragraph, in the binary's DBCMeta for WDB5-ready Legion clients, there is a flag for each field that can designate 'do not sign-extend when decompressing'. However, there are fields which obviously contain unsigned data that are not marked with this flag, so it is not exhaustive.

Traditionally, 64-bit integers have only made an appearance in a very tiny number of DB2s (usually 1 DB2 at any given time) and only for short periods of time. This meant that 64-bit integer support was not necessary. However, the addition of Allied Races in Legion's Patch 7.3.5 Build 25600 caused this to change. There were no longer enough bits for 'RaceMask', a field used in half a dozen different DB2s, and the field was expanded to be a 64-bit integer.

Determining Field Types

In WDB2, you can mostly just assume that every field will be four bytes. Three out of the five possibilities are four bytes, and the 8-bit integers are only used very, very rarely (literally like 3-4 files out of hundreds). The 64-bit Integers are actually only used ONCE so far (in CriteriaTree it was needed to store the number 2,500,000,000, which is 250k gold in copper). Deciding whether or not a four-byte value is a float, an integer, or a string is not terribly difficult (floats will basically always have certain bits set and every value in a string field will be an offset into the string block, which you can check), and this approach gives you ~98% compatibility with the WDB2/WCH3 format with minimal effort.

In WDB3 and WDB4, things become much, much harder. Determining field types on the fly is virtually impossible, and the majority of DBCs (80%+) have at least one field that is not four bytes. The only proper solution is to read the WoW binary executable and parse it for the DBCMeta structure. In that structure you will find the field types for all fields.

In the newer formats such as WDB5 and WDB6, determining field types on the fly is much more doable. Any field being compressed cannot be a string or a float as they are always 4 bytes, meaning you can just assume they are integers. You can use the 'field structure block' present in WDB5+ in order to determine the size of the fields and the distance between the offsets in the field structure block in order to determine array length for all fields except the last one (the last field will be padded out to 'record_size').

String Block

DB2 records can contain localized strings. In contrast to DBCs, a DB2 file only contains localized values for a given locale (header.locale).

Since Cataclysm, all DB files contain only localized string values, including DBCs.

The rest of the string block is equivalent to DBC version. See documentation there.

WDB2

This section only applies to versions Cata … < WoD (7.0.1.20740).

This file format was introduced in Cataclysm. It was phased out in favor of WDB3 in Legion (Patch 7.0.1 build 20740).

Structure

See ADB#WCH3 for how to adapt this structure for its .ADB counterpart.

struct db2_header
{
  uint32_t magic;                                               // 'WDB2'
  uint32_t record_count;
  uint32_t field_count;                                         // array fields count as the size of array for WDB2
  uint32_t record_size;
  uint32_t string_table_size;                                   // string block almost always contains at least one zero-byte
  uint32_t table_hash;
  uint32_t build;
  uint32_t timestamp_last_written;                              // set to time(0); when writing in WowClientDB2_Base::Save()
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
  uint32_t copy_table_size;                                     // always zero in WDB2 (?) - see WDB3 for information on how to parse this
};

template<typename record_type>
struct db2_file
{
  db2_header header;
  // static_assert (header.record_size == sizeof (record_type));

  if (header.max_id != 0)
  {
    int indices[header.max_id - header.min_id + 1];             // maps from id to row index in records[] below
    short string_lengths[header.max_id - header.min_id + 1];    // sum of lengths of all strings in row
  }
 
  record_type records[header.record_count];*
  char string_table[header.string_table_size];
};
  • Note: Each record is aligned to the length of the longest field type in the record. If you have a record with three fields: Int32, Int8, and Int8 - then there will be 2 bytes of padding at the end of every record, including the last one. 'record_size' should account for this (it would be '8' in the example given in the previous sentence).

WDB3

This section only applies to versions Legion (7.0.1.20740) … < Legion (7.0.1.20810).

The file extensions remain unchanged from the previous format. This file format was introduced in Legion (Patch 7.0.1 build 20740). It was phased out in favor of WDB4/WCH5 just a few builds later, also in Legion (Patch 7.0.1 build 20810). As such, support for WDB3 and WCH4 is likely important for almost nobody.

The major changes are the addition of an optional offset map which forces records to have inline strings, an optional block after the string block that contains non-inline IDs, and an optional block after that block which contains the ID numbers of rows that have been deduplicated to save space.

It is worth noting that min_id, max_id, and copy_table_size fields now see different use, despite the header remaining the same. min_id and max_id are always non-zero, even when the offset_map structure is present. Additionally, the copy_table_size field is now non-zero sometimes, requiring action.

Structure

See ADB#WCH4 for how to adapt this structure for its .ADB counterpart.

template<typename record_type>
struct wdb3_file
{
  db2_header header;
  struct offset_map_entry
  {
    uint32_t offset;                                             // This offset is absolute, not relative to another structure; this can (and often will) be zero, in which case you should ignore that entry and move on
    uint16_t length;                                             // This is the length of the record located at the specified offset
  };
  offset_map_entry offset_map[header.max_id - header.min_id + 1];*
  record_type records[header.record_count]; 
  char string_table[header.string_table_size];
  uint32_t IDs[header.record_count];*
  if (header.copy_table_size > 0)
  {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[header.copy_table_size / copy_table_entry.size()];
  }
};

Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string block usually still exists, just as size 2 with two blank entries.

  • This part of the structure is optional.

WDB4

This section only applies to versions Legion (7.0.1.20810) … < Legion (7.0.3.21414).

The file extensions remain unchanged from the previous format. This file format was introduced in Legion (Patch 7.0.1 build 20810) and is still in use today. It was partially phased out in favor of WDB5 just a few months later, also in Legion (Patch 7.0.3 build 21414). As such, support for WDB4 and WCH5/WCH6 is likely important for almost nobody.

The offset_map structure has moved back farther in the file and is now located in between the string table and the non-inline IDs table (this change does not apply to WCH5 - see WCH5-specific Concerns). Additionally, while not a format change, the string table is now always 0 bytes if the offset map is present. In those cases, the 'string_table_size' header value changes its meaning to "offset to the start of the offset map". This is a minor optimization since parsing the file must begin with the offset map.

The header has changed from the original format; it gained one new field ('flags'). This flags field allows for easy detection of the optional structures.

Structure for WDB4

See ADB#WCH5/WCH6 for how to adapt this structure for its .ADB counterpart.

struct wdb4_db2_header
{
  uint32_t magic;                                               // 'WDB4'
  uint32_t record_count;
  uint32_t field_count;                                         // array fields count as the size of array for WDB4
  uint32_t record_size;
  uint32_t string_table_size;                                   // if flags & 0x01 != 0, this field takes on a new meaning - it becomes an absolute offset to the beginning of the offset_map
  uint32_t table_hash;
  uint32_t build;
  uint32_t timestamp_last_written;                              // set to time(0); when writing in WowClientDB2_Base::Save()
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
  uint32_t copy_table_size;
  uint32_t flags;                                               // in WDB3, this field was in the WoW executable's DBCMeta instead; possible values are listed in Known Flag Meanings
};

template<typename record_type>
struct wdb4_file
{
  wdb4_db2_header header;
  record_type records[header.record_count]; 
  char string_table[header.string_table_size];
  if (flags & 0x01 != 0)
  {
    struct offset_map_entry
    {
      uint32_t offset;                                           // this offset is absolute, not relative to another structure; this can (and often will) be zero, in which case you should ignore that entry and move on
      uint16_t length;                                           // this is the length of the record located at the specified offset
    };
    offset_map_entry offset_map[header.max_id - header.min_id + 1];
  }
  if (flags & 0x04 != 0)
  {
    uint32_t IDs[header.record_count];
  }
  if (header.copy_table_size > 0)
  {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[header.copy_table_size / copy_table_entry.size()];
  }
};

Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string_table will not exist.

WDB5

This section only applies to versions Legion (7.0.3.21479) … < Legion (7.2.0.23436).

The file extension remains unchanged from the previous format. This file format was introduced in Legion (Patch 7.0.3 build 21479) and was replaced by WDB6 later in Legion (Patch 7.2.0 build 23436). There have been a variety of ADB formats used simultaneously with WDB5, including WCH5, WCH6, WCH7, and WCH8.

There are very significant changes to the format in WDB5 which will require substantial effort for existing tools to support. The main changes are that the field_count now counts arrays as '1' (mirroring very old ADB behavior), a field_structure block has been added directly after the header, and the addition of compressed fields (and 24-bit integers). The introduction of the field_structure block, in particular, is hugely positive for us and should improve the accuracy of DB2 parsers or at least reduce their dependence on the WoW binary's DBCMeta. Note that the WoW binary's DBCMeta will often disagree with the 'field_structure' (DBCMeta might say 'int32' but the field_structure block says the size is '3-bytes'). In those cases, the field_structure block takes priority. Additionally, the DBCMeta is still relevant for parsing ADBs, as they do not support compression.

The header has lost a field compared to WDB4, timestamp_last_written. This field was useless in DB2s and always 0, so its removal is understandable.

In build 21737, a few builds after the introduction of WDB5, more changes to the header were made. The 'flags' header field was split into two shorts - 'flags' and 'id_index'. 'id_index' is very valuable since the inline ID fields no longer have to be the first field in WDB5; they can appear anywhere in the record. This index lets you know which field is ID, so you can, for example, move it to the beginning of the record for the ease of the viewer. Additionally, the 'build' field was changed to be 'layout_hash'. This hash is unique to the specific column layout, including (at least) position, size, and column name (we don't have most of these names, but we can tell this is true based on layout_hash changes that shouldn't have happened otherwise). It is worth noting that array size is actually not unique per hash. In some cases, the size of an array can change without the hash changing. This replacement for BuildID is often beneficial to us because when layout_hash changes, you know the structure changed (usually - there have been some instances of recalculations).

Structure

See ADB#WCH5/WCH6 for how to adapt this structure for its .ADB counterpart.
struct wdb5_db2_header
{
  uint32_t magic;                                               // 'WDB5' for .db2 (database)
  uint32_t record_count;
  uint32_t field_count;                                         // for the first time, this counts arrays as '1'; in the past, only the ADB variants have counted arrays as 1 field
  uint32_t record_size;
  uint32_t string_table_size;                                   // if flags & 0x01 != 0, this field takes on a new meaning - it becomes an absolute offset to the beginning of the offset_map
  uint32_t table_hash;
  uint32_t layout_hash;                                         // used to be 'build', but after build 21737, this is a new hash field that changes only when the structure of the data changes
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
  uint32_t copy_table_size;
  uint16_t flags;                                               // possible values are listed in Known Flag Meanings
  uint16_t id_index;                                            // new in WDB5 (and only after build 21737), this is the index of the field containing ID values; this is ignored if flags & 0x04 != 0
};

template<typename record_type>
struct wdb5_file
{
  wdb5_db2_header header;
  struct field_structure
  {
    int16_t size;                                               // size in bits as calculated by: byteSize = (32 - size) / 8; this value can be negative to indicate field sizes larger than 32-bits
    uint16_t position;                                          // position of the field within the record, relative to the start of the record
  };
  field_structure fields[header.field_count];
  record_type records[header.record_count];
  char string_table[header.string_table_size];
  if (flags & 0x01 != 0)
  {
    struct offset_map_entry
    {
      uint32_t offset;                                          // this offset is absolute, not relative to another structure; this can (and often will) be zero, in which case you should ignore that entry and move on
      uint16_t length;                                          // this is the length of the record located at the specified offset
    };
    offset_map_entry offset_map[header.max_id - header.min_id + 1];
  }
  if (flags & 0x04 != 0)
  {
    uint32_t IDs[header.record_count];
  }
  if (header.copy_table_size > 0)
  {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[header.copy_table_size / copy_table_entry.size()];
  }
};

Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string_table will not exist.

WDB6

This section only applies to versions Legion (7.2.0.23436) … < Legion (7.3.5.25600).

The file extension remains unchanged from the previous format. This file format was introduced in Legion (Patch 7.2.0 build 23436) and was replaced by WDC1 later in Legion (Patch 7.3.5 build 25600).

Two new header fields and a new data block were added in WDB6. Both of the header fields relate to the new block, which we have named 'common_data_table'. Its purpose is to drastically reduce db2 filesize and memory footprint (affected db2s have had their filesizes halved) by letting the client assume that many columns are always 0, unless there is an entry in the proper 'common_data_table' that is mapped to the ID of the row in question. For example, in build 23436, only 10 columns are in the 'normal' row data section for SpellEffect.db2. However, the 'common_data_table' supports up to 26 columns (designated by the new header field named 'total_field_count'). To find the value for one of the latter 16 columns in SpellEffect.db2, you look up the column in the 'common_data_table' and then the ID in the 'common_data_map' for that column. If there is an entry, use the value paired with the ID. If there is not an entry, the value 'defaults'.

Default values are stored in the WoW binary, in DBMeta. It's worth noting that almost every field's default value is '0', with a few exceptions. For example, the 'Alpha' byte field in CreatureDisplayInfo defaults to 255, and a couple of floats in SpellEffect.db2 default to 1.

Neither strings or arrays are supported in the 'common_data_table'.

Starting from Patch 7.3.0 Build 24473, values in the 'common_data_table' are always padded out to 4 bytes. Detecting this change is very annoying as there were no other accompanying changes. If you wish to support WDB6 both before and after this build, you will need to attempt to navigate the common_data_table once without padding. Compare the distance you just navigated against the common_data_table_size field from the header. If they match, it is either not padded (pre-7.3) or does not matter (because all of the common data fields are 4 bytes). If it does not match, then it is padded (post-7.3). After determining this, you can then parse the table again with the knowledge that you are doing it correctly.

It is worth a minor mention here that from WDB6 onwards, standalone ADB files were discarded in favor of 'ADB#DBCache.bin', a new format that does not mirror DB2 structure at all.

Structure

struct wdb6_db2_header
{
  uint32_t magic;                                               // 'WDB6'
  uint32_t record_count;
  uint32_t field_count;                                         // this counts arrays as '1' field
  uint32_t record_size;
  uint32_t string_table_size;                                   // if flags & 0x01 != 0, this field takes on a new meaning - it becomes an absolute offset to the beginning of the offset_map
  uint32_t table_hash;
  uint32_t layout_hash;                                         // used to be 'build', but now this is a hash field that changes only when the structure of the data changes
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
  uint32_t copy_table_size;
  uint16_t flags;                                               // possible values are listed in Known Flag Meanings
  uint16_t id_index;                                            // this is the index of the field containing ID values; this is ignored if flags & 0x04 != 0
  uint32_t total_field_count;                                   // new in WDB6, includes columns only expressed in the 'common_data_table', unlike field_count
  uint32_t common_data_table_size;                              // new in WDB6, size of new block called 'common_data_table'
};

template<typename record_type>
struct wdb6_file
{
  wdb6_db2_header header;
  struct field_structure
  {
    int16_t size;                                               // size in bits as calculated by: byteSize = (32 - size) / 8; this value can be negative to indicate field sizes larger than 32-bits
    uint16_t position;                                          // position of the field within the record, relative to the start of the record
  };
  field_structure fields[header.field_count];
  record_type records[header.record_count];
  char string_table[header.string_table_size];
  if (flags & 0x01 != 0)
  {
    struct offset_map_entry
    {
      uint32_t offset;                                          // this offset is absolute, not relative to another structure; this can (and often will) be zero, in which case you should ignore that entry and move on
      uint16_t length;                                          // this is the length of the record located at the specified offset
    };
    offset_map_entry offset_map[header.max_id - header.min_id + 1];
  }
  if (flags & 0x04 != 0)
  {
    uint32_t IDs[header.record_count];
  }
  if (header.copy_table_size > 0)
  {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[header.copy_table_size / copy_table_entry.size()];
  }
  if (header.common_data_table_size > 0)
  {
    uint32_t num_columns_in_table;
    struct common_data_map_entry
    {
      uint32_t id;
      uint32_t value;                                            // Calling this 'uint32_t' is an oversimplification - the size of this field depends on the 'type' from the enum 
                                                                 // (From Patch 7.3.0 Build 24473 onwards, this is no longer true. Values are always padded out to 4 bytes.)
    };
    struct common_data_table_entry
    {
      uint32_t count;
      uint8_t type;                                              // New enum: string = 0, short = 1, byte = 2, float = 3, int = 4 (int64 = 5??)
      common_data_map_entry common_data_map[count];
    };
    common_data_table_entry common_data_table[num_columns_in_table];
  }
};

Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string_table will not exist.

WDC1

This section only applies to versions ≥ Legion (7.3.5.25600).

The file extension remains unchanged from the previous format. This file format was introduced in Legion (Patch 7.3.5 build 25600) and is still in use today (as of Patch 7.3.5 Build 25632).

This format is like 'woah'.

Structure

Coming soon to theaters near you.

Known Flag Meanings for WDB4+

flags & 0x01 = 'Has offset map'
flags & 0x02 = 'Has secondary key'
flags & 0x04 = 'Has non-inline IDs'