DB2

From wowdev
Jump to navigation Jump to search
This section only applies to versions ≥ Cata.

DB2 files are the new version of client side databases, introduced in Cataclysm, containing data about items, NPCs, environment, world and a lot more. They are similar in many ways to DBC files, so you may want to also look at those. They both have headers with similar data, records containing fields of various types with various data, and a string block where 4-byte string references are made. The difference is that DB2 files have become wildly more complex than their simple DBC brethren, especially in recent times. The structure described here was also used in ADB files for many years, which are a cache of dynamically streamed database entries, typically used hotfixes and content that Blizzard wants to hide from dataminers until the last minute. Recently the file structures of DB2 files and ADB files have diverged greatly.

Table content structures

This page describes the structure of DB2 files. For a list of existing DB2 files and their contents see the categories DBC, Vanilla, Burning Crusade, Wrath of the Lich King, Cataclysm, Mists of Pandaria, Warlords of Draenor and Legion. If you add documentation for a file, please add the correct categories (also the build number) as well.

Field Types

WDB2 began with the following possible field types:

64-bit Integers*
32-bit Integers*
8-bit Integers*
Floats
Strings (strings are represented in the record data as a 32-bit unsigned integer, see the String Block section for more information)

Additionally, WDB3 added the following possible field type:

16-bit Integers*

Note that Blizzard does not differentiate between signed and unsigned field types; WoW code simply casts the data around as it needs. Because of this, some fields will make more sense as signed integers (example: a casting time reduction) and some fields will make more sense as unsigned integers (example: bitfields). You will have to make the determination as to which fields should be which on your own. Personally, I default to Signed 32-bit Integers, Unsigned 16-bit Integers, and Unsigned 8-bit Integers.

As an addendum to the above paragraph, in the binary's DBCMeta for WDB5-ready Legion clients, there is a flag for each field that can designate 'do not sign-extend when decompressing'. However, there are fields which obviously contain unsigned data that are not marked with this flag, so it is not exhaustive.

Traditionally, 64-bit integers have only made an appearance in a very tiny number of DB2s (usually 1 DB2 at any given time) and only for short periods of time. This meant that 64-bit integer support was not necessary. However, the addition of Allied Races in Legion's Patch 7.3.5 Build 25600 caused this to change. There were no longer enough bits for 'RaceMask', a field used in half a dozen different DB2s, and the field was expanded to be a 64-bit integer.

Determining Field Types

In WDB2, you can mostly just assume that every field will be four bytes. Three out of the five possibilities are four bytes, and the 8-bit integers are only used very, very rarely (literally like 3-4 files out of hundreds). The 64-bit Integers are actually only used ONCE so far (in CriteriaTree it was needed to store the number 2,500,000,000, which is 250k gold in copper). Deciding whether or not a four-byte value is a float, an integer, or a string is not terribly difficult (floats will basically always have certain bits set and every value in a string field will be an offset into the string block, which you can check), and this approach gives you ~98% compatibility with the WDB2 format with minimal effort.

In WDB3 and WDB4, things become much, much harder. Determining field types on the fly is virtually impossible, and the majority of DBs (80%+) have at least one field that is not four bytes. The only proper solution is to read the WoW binary executable and parse it for the DBMeta structure. In that structure you will find the field types for all fields.

In the newer formats such as WDB5 and WDB6, determining field types on the fly is much more doable. Any field being compressed cannot be a string or a float as they are always 4 bytes, meaning you can just assume they are integers. You can use the 'field structure block' present in WDB5+ in order to determine the size of the fields and the distance between the offsets in the field structure block in order to determine array length for all fields except the last one (the last field will be padded out to 'record_size').

String Block

DB2 records can contain localized strings. In contrast to DBCs, a DB2 file only contains localized values for a given locale (header.locale).

Since Cataclysm, all DB files contain only localized string values, including DBCs.

The rest of the string block is equivalent to DBC version. See documentation there.

WDB2

This section only applies to versions Cata (4.0.1.12911) … < Legion (7.0.1.20740).

This file format was introduced in Cataclysm. It was phased out in favor of WDB3 in Legion (Patch 7.0.1 build 20740).

Structure

See ADB#WCH3 for how to adapt this structure for its .ADB counterpart.

struct db2_header
{
  uint32_t magic;                                               // 'WDB2'
  uint32_t record_count;
  uint32_t field_count;                                         // array fields count as the size of array for WDB2
  uint32_t record_size;
  uint32_t string_table_size;                                   // string block almost always contains at least one zero-byte
  uint32_t table_hash;
  uint32_t build;
  uint32_t timestamp_last_written;                              // set to time(0); when writing in WowClientDB2_Base::Save()
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
  uint32_t copy_table_size;                                     // always zero in WDB2 (?) - see WDB3 for information on how to parse this
};

template<typename record_type>
struct db2_file
{
  db2_header header;

  if (header.max_id != 0)
  {
    int indices[header.max_id - header.min_id + 1];             // maps from id to row index in records[] below
    short string_lengths[header.max_id - header.min_id + 1];    // sum of lengths of all strings in row
  }
 
  record_type records[header.record_count];*
  char string_table[header.string_table_size];
};
  • Note: Each record is aligned to the length of the longest field type in the record. If you have a record with three fields: Int32, Int8, and Int8 - then there will be 2 bytes of padding at the end of every record, including the last one. 'record_size' should account for this (it would be '8' in the example given in the previous sentence).

WDB3

This section only applies to versions Legion (7.0.1.20740) … < Legion (7.0.1.20810).

This file format was introduced in Legion (Patch 7.0.1 build 20740). It was phased out in favor of WDB4/WCH5 just a few builds later, also in Legion (Patch 7.0.1 build 20810). As such, support for WDB3 and WCH4 is likely important for almost nobody.

The major changes are the addition of an optional offset map which forces records to have inline strings, an optional block after the string block that contains non-inline IDs, and an optional block after that block which contains the ID numbers of rows that have been deduplicated to save space.

It is worth noting that min_id, max_id, and copy_table_size fields now see different use, despite the header remaining the same. min_id and max_id are always non-zero, even when the offset_map structure is present. Additionally, the copy_table_size field is now non-zero sometimes, requiring action.

Structure

See ADB#WCH4 for how to adapt this structure for its .ADB counterpart.

template<typename record_type>
struct wdb3_file
{
  db2_header header;
  struct offset_map_entry
  {
    uint32_t offset;                                             // This offset is absolute, not relative to another structure; this can (and often will) be zero, in which case you should ignore that entry and move on
    uint16_t length;                                             // This is the length of the record located at the specified offset
  };
  offset_map_entry offset_map[header.max_id - header.min_id + 1];*
  uint32_t relationship_IDs[header.max_id - header.min_id + 1];* // only ever used by wmominimaptexture.db2 to record WMOIDs
  record_type records[header.record_count]; 
  char string_table[header.string_table_size];
  uint32_t IDs[header.record_count];*
  if (header.copy_table_size > 0)
  {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[header.copy_table_size / sizeof(copy_table_entry)];
  }
};

Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string block usually still exists, just as size 2 with two blank entries.

  • This part of the structure is optional.

WDB4

This section only applies to versions Legion (7.0.1.20810) … < Legion (7.0.3.21414).

This file format was introduced in Legion (Patch 7.0.1 build 20810) and is still in use today. It was partially phased out in favor of WDB5 just a few months later, also in Legion (Patch 7.0.3 build 21414). As such, support for WDB4 is likely important for almost nobody.

The offset_map structure has moved back farther in the file and is now located in between the string table and the non-inline IDs table. Additionally, while not a format change, the string table is now always 0 bytes if the offset map is present. In those cases, the 'string_table_size' header value changes its meaning to "offset to the start of the offset map". This is a minor optimization since parsing the file must begin with the offset map.

The header has changed from the original format; it gained one new field ('flags'). This flags field allows for easy detection of the optional structures.

Structure

See ADB#WCH5 for how to adapt this structure for its .ADB counterpart.

struct wdb4_db2_header
{
  uint32_t magic;                                               // 'WDB4'
  uint32_t record_count;
  uint32_t field_count;                                         // array fields count as the size of array for WDB4
  uint32_t record_size;
  uint32_t string_table_size;                                   // if flags & 0x01 != 0, this field takes on a new meaning - it becomes an absolute offset to the beginning of the offset_map
  uint32_t table_hash;
  uint32_t build;
  uint32_t timestamp_last_written;                              // set to time(0); when writing in WowClientDB2_Base::Save()
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
  uint32_t copy_table_size;
  uint32_t flags;                                               // in WDB3, this field was in the WoW executable's DBCMeta instead; possible values are listed in Known Flag Meanings
};

template<typename record_type>
struct wdb4_file
{
  wdb4_db2_header header;
  record_type records[header.record_count]; 
  char string_table[header.string_table_size];
  if (flags & 0x01 != 0)
  {
    struct offset_map_entry
    {
      uint32_t offset;                                           // this offset is absolute, not relative to another structure; this can (and often will) be zero, in which case you should ignore that entry and move on
      uint16_t length;                                           // this is the length of the record located at the specified offset
    };
    offset_map_entry offset_map[header.max_id - header.min_id + 1];
  }
  if (flags & 0x02 != 0)
  {
    uint32_t relationship_IDs[header.max_id - header.min_id + 1]; // only ever used by wmominimaptexture.db2 to record WMOIDs
  }
  if (flags & 0x04 != 0)
  {
    uint32_t IDs[header.record_count];
  }
  if (header.copy_table_size > 0)
  {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[header.copy_table_size / sizeof(copy_table_entry)];
  }
};

Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string_table will not exist.

WDB5

This section only applies to versions Legion (7.0.3.21479) … < Legion (7.2.0.23436).

This file format was introduced in Legion (Patch 7.0.3 build 21479) and was replaced by WDB6 later in Legion (Patch 7.2.0 build 23436). There have been a variety of ADB formats used simultaneously with WDB5, including WCH5, WCH6, WCH7, and WCH8.

There are very significant changes to the format in WDB5 which will require substantial effort for existing tools to support. The main changes are that the field_count now counts arrays as '1' (mirroring very old ADB behavior), a field_structure block has been added directly after the header, and the addition of compressed fields (and 24-bit integers). The introduction of the field_structure block, in particular, is hugely positive for us and should improve the accuracy of DB2 parsers or at least reduce their dependence on the WoW binary's DBCMeta. Note that the WoW binary's DBCMeta will often disagree with the 'field_structure' (DBCMeta might say 'int32' but the field_structure block says the size is '3-bytes'). In those cases, the field_structure block takes priority. Additionally, the DBCMeta is still relevant for parsing ADBs, as they do not support compression.

The header has lost a field compared to WDB4, timestamp_last_written. This field was useless in DB2s and always 0, so its removal is understandable.

In build 21737, a few builds after the introduction of WDB5, more changes to the header were made. The 'flags' header field was split into two shorts - 'flags' and 'id_index'. 'id_index' is very valuable since the inline ID fields no longer have to be the first field in WDB5; they can appear anywhere in the record. This index lets you know which field is ID, so you can, for example, move it to the beginning of the record for the ease of the viewer. Additionally, the 'build' field was changed to be 'layout_hash'. This hash is unique to the specific column layout, including (at least) position, size, and column name (we don't have most of these names, but we can tell this is true based on layout_hash changes that shouldn't have happened otherwise). It is worth noting that array size is actually not unique per hash. In some cases, the size of an array can change without the hash changing. This replacement for BuildID is often beneficial to us because when layout_hash changes, you know the structure changed (usually - there have been some instances of recalculations).

Structure

See ADB#WCH5 for how to adapt this structure for its .ADB counterpart.
struct wdb5_db2_header
{
  uint32_t magic;                                               // 'WDB5' for .db2 (database)
  uint32_t record_count;
  uint32_t field_count;                                         // for the first time, this counts arrays as '1'; in the past, only the ADB variants have counted arrays as 1 field
  uint32_t record_size;
  uint32_t string_table_size;                                   // if flags & 0x01 != 0, this field takes on a new meaning - it becomes an absolute offset to the beginning of the offset_map
  uint32_t table_hash;
  uint32_t layout_hash;                                         // used to be 'build', but after build 21737, this is a new hash field that changes only when the structure of the data changes
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
  uint32_t copy_table_size;
  uint16_t flags;                                               // possible values are listed in Known Flag Meanings
  uint16_t id_index;                                            // new in WDB5 (and only after build 21737), this is the index of the field containing ID values; this is ignored if flags & 0x04 != 0
};

template<typename record_type>
struct wdb5_file
{
  wdb5_db2_header header;
  struct field_structure
  {
    int16_t size;                                               // size in bits as calculated by: byteSize = (32 - size) / 8; this value can be negative to indicate field sizes larger than 32-bits
    uint16_t position;                                          // position of the field within the record, relative to the start of the record
  };
  field_structure fields[header.field_count];
  record_type records[header.record_count];
  char string_table[header.string_table_size];
  if (flags & 0x01 != 0)
  {
    struct offset_map_entry
    {
      uint32_t offset;                                          // this offset is absolute, not relative to another structure; this can (and often will) be zero, in which case you should ignore that entry and move on
      uint16_t length;                                          // this is the length of the record located at the specified offset
    };
    offset_map_entry offset_map[header.max_id - header.min_id + 1];
  }
  if (flags & 0x02 != 0)
  {
    uint32_t relationship_IDs[header.max_id - header.min_id + 1]; // only ever used by wmominimaptexture.db2 to record WMOIDs
  }
  if (flags & 0x04 != 0)
  {
    uint32_t IDs[header.record_count];
  }
  if (header.copy_table_size > 0)
  {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[header.copy_table_size / sizeof(copy_table_entry)];
  }
};

Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string_table will not exist.

WDB6

This section only applies to versions Legion (7.2.0.23436) … < Legion (7.3.5.25600).

This file format was introduced in Legion (Patch 7.2.0 build 23436) and was replaced by WDC1 later in Legion (Patch 7.3.5 build 25600).

Two new header fields and a new data block were added in WDB6. Both of the header fields relate to the new block, which we have named 'common_data_table'. Its purpose is to drastically reduce db2 filesize and memory footprint (affected db2s have had their filesizes halved) by letting the client assume that many columns are always 0, unless there is an entry in the proper 'common_data_table' that is mapped to the ID of the row in question. For example, in build 23436, only 10 columns are in the 'normal' row data section for SpellEffect.db2. However, the 'common_data_table' supports up to 26 columns (designated by the new header field named 'total_field_count'). To find the value for one of the latter 16 columns in SpellEffect.db2, you look up the column in the 'common_data_table' and then the ID in the 'common_data_map' for that column. If there is an entry, use the value paired with the ID. If there is not an entry, the value 'defaults'.

Default values are stored in the WoW binary, in DBMeta. It's worth noting that almost every field's default value is '0', with a few exceptions. For example, the 'Alpha' byte field in CreatureDisplayInfo defaults to 255, and a couple of floats in SpellEffect.db2 default to 1.

Neither strings or arrays are supported in the 'common_data_table'.

Starting from Patch 7.3.0 Build 24473, values in the 'common_data_table' are always padded out to 4 bytes. Detecting this change is very annoying as there were no other accompanying changes. If you wish to support WDB6 both before and after this build, you will need to attempt to navigate the common_data_table once without padding. Compare the distance you just navigated against the common_data_table_size field from the header. If they match, it is either not padded (pre-7.3) or does not matter (because all of the common data fields are 4 bytes). If it does not match, then it is padded (post-7.3). After determining this, you can then parse the table again with the knowledge that you are doing it correctly.

It is worth a minor mention here that from WDB6 onwards, standalone ADB files were discarded in favor of 'ADB#DBCache.bin', a new format that does not mirror DB2 structure at all.

Structure

struct wdb6_db2_header
{
  uint32_t magic;                                               // 'WDB6'
  uint32_t record_count;
  uint32_t field_count;                                         // this counts arrays as '1' field
  uint32_t record_size;
  uint32_t string_table_size;                                   // if flags & 0x01 != 0, this field takes on a new meaning - it becomes an absolute offset to the beginning of the offset_map
  uint32_t table_hash;
  uint32_t layout_hash;                                         // used to be 'build', but now this is a hash field that changes only when the structure of the data changes
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
  uint32_t copy_table_size;
  uint16_t flags;                                               // possible values are listed in Known Flag Meanings
  uint16_t id_index;                                            // this is the index of the field containing ID values; this is ignored if flags & 0x04 != 0
  uint32_t total_field_count;                                   // new in WDB6, includes columns only expressed in the 'common_data_table', unlike field_count
  uint32_t common_data_table_size;                              // new in WDB6, size of new block called 'common_data_table'
};

template<typename record_type>
struct wdb6_file
{
  wdb6_db2_header header;
  struct field_structure
  {
    int16_t size;                                               // size in bits as calculated by: byteSize = (32 - size) / 8; this value can be negative to indicate field sizes larger than 32-bits
    uint16_t position;                                          // position of the field within the record, relative to the start of the record
  };
  field_structure fields[header.field_count];
  record_type records[header.record_count];
  char string_table[header.string_table_size];
  if (header.flags & 0x01 != 0)
  {
    struct offset_map_entry
    {
      uint32_t offset;                                          // this offset is absolute, not relative to another structure; this can (and often will) be zero, in which case you should ignore that entry and move on
      uint16_t length;                                          // this is the length of the record located at the specified offset
    };
    offset_map_entry offset_map[header.max_id - header.min_id + 1];
  }
  if (header.flags & 0x02 != 0)
  {
    uint32_t relationship_IDs[header.max_id - header.min_id + 1]; // only ever used by wmominimaptexture.db2 to record WMOIDs
  }
  if (header.flags & 0x04 != 0)
  {
    uint32_t IDs[header.record_count];
  }
  if (header.copy_table_size > 0)
  {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[header.copy_table_size / sizeof(copy_table_entry)];
  }
  if (header.common_data_table_size > 0)
  {
    uint32_t num_columns_in_table;
    struct common_data_map_entry
    {
      uint32_t id;
      uint32_t value;                                            // Calling this 'uint32_t' is an oversimplification - the size of this field depends on the 'type' from the enum 
                                                                 // (From Patch 7.3.0 Build 24473 onwards, this is no longer true. Values are always padded out to 4 bytes.)
    };
    struct common_data_table_entry
    {
      uint32_t count;
      uint8_t type;                                              // New enum: string = 0, short = 1, byte = 2, float = 3, int = 4 (int64 = 5??)
      common_data_map_entry common_data_map[count];
    };
    common_data_table_entry common_data_table[num_columns_in_table];
  }
};

Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string_table will not exist.

WDC1

This section only applies to versions Legion (7.3.5.25600) … Battle (8.0.1.26231).

This file format was introduced in Legion (Patch 7.3.5 build 25600) and was replaced by WDC2 in Battle for Azeroth (Patch 8.0.1 Build 26231).

WDC1 is an expanded version of WDB6. If you've been following along with the format changes in Legion thus far, you will notice that the theme has been saving space. Most recently, in WDB6, a 'common_data_block' was added in order to handle columns that are usually always one value but rarely are some other value.

WDC1 takes this idea and adds in even more types of space-saving techniques. It adds in bitpacking (for fields who would be best suited to a field size that is not a multiple of 8), indexed values (for fields with values that are commonly always one of a small number of values but who do not have a common 'default' amongst those values), and indexed array values (for array fields where all fields in an array are commonly linked together, for example, with common spell flag values). It also keeps the 'common_data_block' technique from WDB6, but it streamlines the implementation of it, removing the need for a separate header block for 'common_data'. This is done by adding in a new block we call 'field_storage_info'.

'field_storage_info' contains the size in bits and the offset in bits of every column, along with the size of the column's data in the different data blocks (currently the data blocks are 'pallet_data' and 'common_data'). The structure below explains most of this. Compression type '0' is normal in-field data. Compression types '1', '3', and '4' begin as bitpacked data in the record. For compression type '1', you can stop there, as that is the proper value. For compression types '3' and '4', you need to take the bitpacked value you obtained from the record and use it as an index into 'pallet_data'. For compression type '3', you pull a 4-byte value from 'pallet_data' using the formula 'additional_data_offset + (index * 4)', where 'additional_data_offset' is the sum of 'additional_data_size' for every column before the current one. Compression type '4' is very similar, except that it is always an array. Array values are bundled together for compression type '4' - that single index will give you 4-byte values for every single piece of the array at once. You can retrieve these values by using the formula 'additional_data_offset + (index * 4 * array_count) + (iter * 4)', where 'iter' is the 0-based array piece you wish to retrieve (you should iterate over all of them to retrieve all of the values for the array). Lastly, compression type '2' is just WDB6's old 'common_data_block' system; it is the only compression type where no data at all is read from the record. Just apply the 'default_value' (also provided by 'field_storage_info') when the ID for the record is not present in 'common_data'.

The last major quirk of WDC1 is the 'relationship_map' system. In WDC1, any DB2 that had a primary key which was also a foreign key (in Blizzard's database system, I guess) was removed from the record and placed into this new structure. This is probably for increased lookup speed by the game executable. So, for example, almost all Spell*.db2 tables (besides Spell.db2 because it wouldn't be a foreign key there) had their SpellID columns removed and this new block added in their place. The easiest way to handle this block is just to add a new 'fake' column to the end of the DB2 in question and populate it with the values from the relationship map.

Structure

struct wdc1_db2_header
{
  uint32_t magic;                  // 'WDC1'
  uint32_t record_count;
  uint32_t field_count;
  uint32_t record_size;
  uint32_t string_table_size;
  uint32_t table_hash;             // hash of the table name
  uint32_t layout_hash;            // this is a hash field that changes only when the structure of the data changes
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                 // as seen in TextWowEnum
  uint32_t copy_table_size;
  uint16_t flags;                  // possible values are listed in Known Flag Meanings
  uint16_t id_index;               // this is the index of the field containing ID values; this is ignored if flags & 0x04 != 0
  uint32_t total_field_count;      // from WDC1 onwards, this value seems to always be the same as the 'field_count' value
  uint32_t bitpacked_data_offset;  // relative position in record where bitpacked data begins; not important for parsing the file
  uint32_t lookup_column_count;
  uint32_t offset_map_offset;      // Offset to array of struct {uint32_t offset; uint16_t size;}[max_id - min_id + 1];
  uint32_t id_list_size;           // List of ids present in the DB file
  uint32_t field_storage_info_size;
  uint32_t common_data_size;
  uint32_t pallet_data_size;
  uint32_t relationship_data_size;
};

struct field_structure
{
   int16_t size;                   // size in bits as calculated by: byteSize = (32 - size) / 8; this value can be negative to indicate field sizes larger than 32-bits
   uint16_t position;              // position of the field within the record, relative to the start of the record
};

wdc1_db2_header header;
field_structure fields[header.total_field_count];
if ((header.flags & 1) == 0) {
  // Normal records
  struct record_data
  {
    char data[header.record_size];
  };
  record_data records[header.record_count];
  char string_data[header.string_table_size];
} else {
  // Offset map records -- these records have null-terminated strings inlined, and
  // since they are variable-length, they are pointed to by an array of 6-byte
  // offset+size pairs.
  char variable_record_data[header.offset_map_offset - sizeof(wdc1_db2_header) - (sizeof(field_structure) * header.total_field_count)];
  struct offset_map_entry
  {
    uint32_t offset;
    uint16_t size;
  };
  offset_map_entry offset_map[header.max_id - header.min_id + 1];
}

enum field_compression
{
  // None -- the field is a 8-, 16-, 32-, or 64-bit integer in the record data
  field_compression_none,
  // Bitpacked -- the field is a bitpacked integer in the record data.  It
  // is field_size_bits long and starts at field_offset_bits.
  // A bitpacked value occupies
  //   (field_size_bits + (field_offset_bits & 7) + 7) / 8
  // bytes starting at byte
  //   field_offset_bits / 8
  // in the record data.  These bytes should be read as a little-endian value,
  // then the value is shifted to the right by (field_offset_bits & 7) and
  // masked with ((1ull << field_size_bits) - 1).
  field_compression_bitpacked,
  // Common data -- the field is assumed to be a default value, and exceptions
  // from that default value are stored in the corresponding section in
  // common_data as pairs of { uint32_t record_id; uint32_t value; }.
  field_compression_common_data,
  // Bitpacked indexed -- the field has a bitpacked index in the record data.
  // This index is used as an index into the corresponding section in
  // pallet_data.  The pallet_data section is an array of uint32_t, so the index
  // should be multiplied by 4 to obtain a byte offset.
  field_compression_bitpacked_indexed,
  // Bitpacked indexed array -- the field has a bitpacked index in the record
  // data.  This index is used as an index into the corresponding section in
  // pallet_data.  The pallet_data section is an array of uint32_t[array_count],
  //
  field_compression_bitpacked_indexed_array,
};

struct field_storage_info
{
  uint16_t          field_offset_bits;
  uint16_t          field_size_bits; // very important for reading bitpacked fields; size is the sum of all array pieces in bits - for example, uint32[3] will appear here as '96'
  // additional_data_size is the size in bytes of the corresponding section in
  // common_data or pallet_data.  These sections are in the same order as the
  // field_info, so to find the offset, add up the additional_data_size of any
  // previous fields which are stored in the same block (common_data or
  // pallet_data).
  uint32_t          additional_data_size;
  field_compression storage_type;
  switch (storage_type)
  {
    case field_compression.field_compression_bitpacked:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t flags; // known values - 0x01: sign-extend (signed)
      break;
    case field_compression.field_compression_common_data:
      uint32_t default_value;
      uint32_t unk_or_unused2;
      uint32_t unk_or_unused3;
      break;
    case field_compression.field_compression_bitpacked_indexed:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t unk_or_unused3;
      break;
    case field_compression.field_compression_bitpacked_indexed_array:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t array_count;
      break;
    default:
      uint32_t unk_or_unused1;
      uint32_t unk_or_unused2;
      uint32_t unk_or_unused3;
      break;
  }
};

uint32_t id_list[header.id_list_size / 4];
if (header.copy_table_size > 0) {
  struct copy_table_entry
  {
    uint32_t id_of_new_row;
    uint32_t id_of_copied_row;
  };
  copy_table_entry copy_table[header.copy_table_size / sizeof(copy_table_entry)];
}
field_storage_info field_info[header.field_storage_info_size / sizeof(field_storage_info)];
char pallet_data[header.pallet_data_size];
char common_data[header.common_data_size];
if (header.relationship_data_size > 0) {
  // In some tables, this relationship mapping replaced columns that were used
  // only as a lookup, such as the SpellID in SpellX* tables.
  struct relationship_entry
  {
    // This is the id of the foreign key for the record, e.g. SpellID in
    // SpellX* tables.
    uint32_t foreign_id;
    // This is the index of the record in record_data.  Note that this is
    // *not* the record's own ID.
    uint32_t record_index;
  };
  struct relationship_mapping
  {
    uint32_t            num_entries;
    uint32_t            min_id;
    uint32_t            max_id;
    relationship_entry  entries[num_entries];
  };
  relationship_mapping relationship_map;
}

Further Quirks

The 'field structure' section introduced in WDB5 still exists here, but 'field storage info' has all of the same information and more. You may wonder - why should I even bother reading field structure now if I only care about WDC1 support? There is one good reason - array sizes. In WDC1, the field structure section reports the size of the individual field piece in the file (for compression type '0') in bytes (well, actually it reports a value you have to subtract from 32 and divide by 8 first, see the comments in the structure above on the 'size' field in the field structure block). However, the field storage info section reports the size of the entire field in bits. The difference in these values is important if an array is the field in question. For example, in field structure, an array of uint32[3] would be reported as size '4' (technically, reported as value '0', which you subtract from 32 and divide by 8 to get '4'), while in field storage info, it would be reported as '96'. This technicality greatly benefits us as a trivial bit of math allows you to instantly deduce array sizes when reading the file: 'ArrayLength = SizeReportedByFieldStorageInfo / (32 - ValueReportedByFieldStructure)'.

There is a writing bug that has existed since WDB6's common_data padding was added. The bug has expanded slightly in WDC1 and newer formats to also be present in pallet_data blocks. Padding bytes are not properly written when the DB2 uses multiple fields in one block type (either common_data or pallet_data) with different unpadded sizes. For example, if a byte column and a short column both use the common_data block, the byte column's values -can- (but does not always) encounter this bug. The padding difference between the smallest and largest field size using the block will be filled with random garbage bytes. For example, in the above example, a value from the byte column stored in common_data may be '08 02 00 00'. This is invalid. The last three bytes should all be '00', but due to this writing bug, this will not always be true.

The only perfect way to handle this is to properly mask the data being read with the expected size of the field in order to ward off the 'garbage' data. This would require you to know the proper value size ahead of time (probably from DBCMeta) since that information is not handed out for pallet_data or common_data fields in any existing DB2 format. Alternatively, you could attempt to detect unexpected patterns in the data in order to proactively detect this bug and mask it. This would likely be very complex, but since the bug seems to manifest in very consistent ways, it may be possible. Because this is a writing bug, it may be fixed by Blizzard at some point in the future without us noticing immediately. Please update the wiki article or contact one of its authors if you believe Blizzard has solved this issue permanently, as it would be of interest.

There are several fields in the structure with no apparent purpose, but I believe these are all related. 'bitpacked_data_offset' in the header and 'bitpacking_offset_bits' and 'bitpacking_size_bits' from some variants of 'field_storage_info' (the bitpacked ones) are the fields I am referring to. I believe these three values are used to read from the record starting at the position of the bitpacked values. Our best guess is that Blizzard needs this for some purpose, possibly to integrate with their existing tools.

On the note of relationship maps, the aspect that makes them tricky (the parsing is actually very simple) is that sometimes the column in question remains in DB2 data in addition to being in a relationship map. For example, Achievement.db2's CriteriaTreeID field was not removed, but it had a relationship map added anyway. Cases like this are difficult to handle properly. You could just allow the duplicate data to be added as a fake column to the end, as there may not be any harm, especially if you are expecting it. Alternatively, you could parse the game executable's DBCMeta - the fields removed by relationship maps are still present there. If there is a mismatch between the number of fields reported by DBCMeta and the number of fields reported by the DB2, then you know that the relationship map data is unique and needs to be restored. If the number of fields match, then the data in the relationship map already exists in the record and is superfluous.

WDC2

This section only applies to versions Battle (8.0.1.26231) … 8.1.0.28048.

This file format was introduced in Battle for Azeroth (Patch 8.0.1 Build 26231) and was replaced by WDC3 shortly into Battle for Azeroth (Patch 8.1.0 Build 28048).

WDC2 introduces a new system of 'sections' of data. Three chunks of the structure ('field_storage_info', 'pallet_data', and 'common_data') were moved up in front of the records / 'sections'. Everything after that was broken into a new form of 'sections' which can potentially occur many times in one file. Note that implementing this system right now appears to be entirely optional as no DB2 file as of the current build has a 'section_count' higher than 1. Also, it appears that the 'offset_map' format (in use for a small handful of DB2s) has a hardcoded limit of 1 section.

The other major change is to strings and how string offsets are calculated. For the history of WoW up until WDC2, strings in records were represented as 'the relative position of the referenced string to the start of the string block'. In WDC2, strings in records are represented as 'the relative position from the beginning of the field where this offset was stored to the position of the referenced string in the string block'. This complicates parsing the string block directly and storing relative locations, but it does slightly simplify reading records.

If before WDC2 you dealt with strings by reading the string block and assigning relative positions to every string, then you will need to either change your approach (to having a constantly-accessible 1:1 representation of the string table available) or mitigate the damage from the change. Starting with WDC2, DB2s can have multiple sections. The client loads data from all sections into one big binary array. The structure of this array looks like this:

 section[0].records, section[1].records, ..., section[section_count-1].records, section[0].string_data, section[1].string_data, ..., section[section_count-1].string_data

The client reads both data and strings from this blob. The string offsets are relative offsets inside this big blob from the field holding the offset. Some sort of mitigation is required if implementation omits creation of such blob.

One method of mitigation is: when reading the values of the string offsets in the records, replace the data at that point in time with 'old-style' string offsets.

For example, using code like this when reading the field in question works:

// Pre-WDC2 string value = value read from record + current position - size of field just read - location of string block - combined size of record data after this section - combined size of string blocks before this section

This sample code reads the field containing the string offset in the record, then recalculates the value it just read by adding in the current position and subtracting the size of the field it just read (to get the absolute position of the referenced string) and then by subtracting the absolute position of the start of the string block (which yields the relative position of the referenced string to the start of the string block). Additionally, it calculates the combined size of the string block data before this section and adds that value into the result. Lastly, it calculates the combined size of all record data blocks after this section and subtracts that value from the result. This manipulation ensures that a program converts every string offset back to the pre-WDC2 style.

Note that the parts where the 'combined size of string block data before this section' and 'combined size of all record data blocks after this section' need to be subtracted out of the string offsets was not actually discovered until WDC3. We suspect that this unknown quirk was part of the issue behind a broken build (Patch 8.1.0 Build 27826), where the strings in the db2s were unreadable by normal methods. This is potentially now explainable if the db2s in that build had unshipped additional sections that were being factored into the calculations.

Structure

struct wdc2_db2_header
{
  uint32_t magic;                  // 'WDC2' or '1SLC' ('CLS1' = 'CLaSsic1')
  uint32_t record_count;           // this is for all sections combined now
  uint32_t field_count;
  uint32_t record_size;
  uint32_t string_table_size;      // this is for all sections combined now
  uint32_t table_hash;             // hash of the table name
  uint32_t layout_hash;            // this is a hash field that changes only when the structure of the data changes
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                 // as seen in TextWowEnum
  uint16_t flags;                  // possible values are listed in Known Flag Meanings
  uint16_t id_index;               // this is the index of the field containing ID values; this is ignored if flags & 0x04 != 0
  uint32_t total_field_count;      // from WDC1 onwards, this value seems to always be the same as the 'field_count' value
  uint32_t bitpacked_data_offset;  // relative position in record where bitpacked data begins; not important for parsing the file
  uint32_t lookup_column_count;
  uint32_t field_storage_info_size;
  uint32_t common_data_size;
  uint32_t pallet_data_size;
  uint32_t section_count;          // new to WDC2, this is number of sections of data
};
wdc2_db2_header header;

// a section = records + string block + offset map + id list + copy table + relationship map
struct wdc2_section_header
{
  uint64_t tact_key_hash;          // TactKeyLookup hash
  uint32_t file_offset;            // absolute position to the beginning of the section
  uint32_t record_count;           // 'record_count' for the section
  uint32_t string_table_size;      // 'string_table_size' for the section
  uint32_t copy_table_size;
  uint32_t offset_map_offset;      // Offset to array of struct {uint32_t offset; uint16_t size;}[max_id - min_id + 1];
  uint32_t id_list_size;           // Size of the list of ids present in the section
  uint32_t relationship_data_size;
};
wdc2_section_header section_headers[header.section_count];

struct field_structure
{
   int16_t size;                   // size in bits as calculated by: byteSize = (32 - size) / 8; this value can be negative to indicate field sizes larger than 32-bits
   uint16_t position;              // position of the field within the record, relative to the start of the record
};
field_structure fields[header.total_field_count];

enum field_compression
{
  // None -- the field is a 8-, 16-, 32-, or 64-bit integer in the record data
  field_compression_none,
  // Bitpacked -- the field is a bitpacked integer in the record data.  It
  // is field_size_bits long and starts at field_offset_bits.
  // A bitpacked value occupies
  //   (field_size_bits + (field_offset_bits & 7) + 7) / 8
  // bytes starting at byte
  //   field_offset_bits / 8
  // in the record data.  These bytes should be read as a little-endian value,
  // then the value is shifted to the right by (field_offset_bits & 7) and
  // masked with ((1ull << field_size_bits) - 1).
  field_compression_bitpacked,
  // Common data -- the field is assumed to be a default value, and exceptions
  // from that default value are stored in the corresponding section in
  // common_data as pairs of { uint32_t record_id; uint32_t value; }.
  field_compression_common_data,
  // Bitpacked indexed -- the field has a bitpacked index in the record data.
  // This index is used as an index into the corresponding section in
  // pallet_data.  The pallet_data section is an array of uint32_t, so the index
  // should be multiplied by 4 to obtain a byte offset.
  field_compression_bitpacked_indexed,
  // Bitpacked indexed array -- the field has a bitpacked index in the record
  // data.  This index is used as an index into the corresponding section in
  // pallet_data.  The pallet_data section is an array of uint32_t[array_count].
  field_compression_bitpacked_indexed_array,
  // Same as field_compression_bitpacked
  field_compression_bitpacked_signed,
};

struct field_storage_info
{
  uint16_t          field_offset_bits;
  uint16_t          field_size_bits; // very important for reading bitpacked fields; size is the sum of all array pieces in bits - for example, uint32[3] will appear here as '96'
  // additional_data_size is the size in bytes of the corresponding section in
  // common_data or pallet_data.  These sections are in the same order as the
  // field_info, so to find the offset, add up the additional_data_size of any
  // previous fields which are stored in the same block (common_data or
  // pallet_data).
  uint32_t          additional_data_size;
  field_compression storage_type;
  switch (storage_type)
  {
    case field_compression.field_compression_bitpacked:
    case field_compression.field_compression_bitpacked_signed:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t flags; // known values - 0x01: sign-extend (signed)
                      // <[TOM_RUS]> last one makes no sense - 6/7/18 (for signed_immediate)
      break;
    case field_compression.field_compression_common_data:
      uint32_t default_value;
      uint32_t unk_or_unused2;
      uint32_t unk_or_unused3;
      break;
    case field_compression.field_compression_bitpacked_indexed:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t unk_or_unused3;
      break;
    case field_compression.field_compression_bitpacked_indexed_array:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t array_count;
      break;
    default:
      uint32_t unk_or_unused1;
      uint32_t unk_or_unused2;
      uint32_t unk_or_unused3;
      break;
  }
};
field_storage_info field_info[header.field_storage_info_size / sizeof(field_storage_info)];
char pallet_data[header.pallet_data_size];
char common_data[header.common_data_size];

struct section
{
  if ((header.flags & 1) == 0) {
    // Normal records
    struct record_data
    {
      char data[header.record_size];
    };
    record_data records[section_headers.record_count];
    char string_data[section_headers.string_table_size];
  } else {
    // Offset map records -- these records have null-terminated strings inlined, and
    // since they are variable-length, they are pointed to by an array of 6-byte offset+size pairs.
    char variable_record_data[section_headers.offset_map_offset - section_headers.file_offset];
    struct offset_map_entry
    {
      uint32_t offset;
      uint16_t size;
    };
    offset_map_entry offset_map[header.max_id - header.min_id + 1];
  }
  
  uint32_t id_list[section_headers.id_list_size / 4];
  if (section_headers.copy_table_size > 0) {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[section_headers.copy_table_size / sizeof(copy_table_entry)];
  }
  if (section_headers.relationship_data_size > 0) {
    // In some tables, this relationship mapping replaced columns that were used only as a lookup, such as the SpellID in SpellX* tables.
    struct relationship_entry
    {
      // This is the id of the foreign key for the record, e.g. SpellID in SpellX* tables.
      uint32_t foreign_id;
      // This is the index of the record in record_data.  Note that this is *not* the record's own ID.
      uint32_t record_index;
    };
    struct relationship_mapping
    {
      uint32_t            num_entries;
      uint32_t            min_id;
      uint32_t            max_id;
      relationship_entry  entries[num_entries];
    };
    relationship_mapping relationship_map;
  }
};
section data_sections[header.section_count];

WDC3

This section only applies to versions Battle (8.1.0.28048) … DF (10.1.0.48480).

This file format was introduced in Battle for Azeroth (Patch 8.1.0 Build 28048) and got unused in 10.1.0.48480.

WDC3 consists of some changes to the section header, officially introduces DB2s with multiple sections (as described in WDC2) which seem to be used for storing encrypted data (tact_key_hash in section header is set), finally cleaned up all the empty and unused space in offset map structures, and shed a bit of additional light on string calculation changes made back in WDC2.

For the section header changes, they removed 'copy_table_size', instead replacing it with a 'copy_table_count' field added at the end of the section header. Additionally, a new field has been added right before that which is 'offset_map_id_count', a field required for parsing the newly slimmed offset map structure. Instead of the size of the offset map structure being an implicit calculation based on MaxID and MinID, it is now an explicit size ('offset_map_id_count' * sizeof(offset_map_entry)). Lastly, the field 'offset_map_offset' is now better called 'offset_records_end'. Instead of pointing at the spot where the offset map structure begins, it points at the position where the offset records end. In WDC2, these would have been the same value, so it's impossible to know if this changed now or back in WDC2, where we wouldn't have been able to tell the difference.

The reason we can now tell the difference is that the offset map structure has moved back two blocks in the structure. It's now after id_list and copy_table. To get the old 'offset_map_offset' value, you need to do something like:

uint offset_map_offset = offset_records_end + id_list_size + (copy_table_count * 8);

The offset map structure is also now much more compact. Before, it implicitly ran from min_id to max_id, and if the ID didn't exist in the db2, there were 6 bytes of 0s. This was a massive waste of space in large files like ItemSparse.db2. Now, as you read the offset_map entry-by-entry, you also need to read offset_map_id_list entry-by-entry. The first entry in the offset_map has an ID equal to the value of the first entry in offset_map_id_list. Reading the two structures in parallel gives you the same kind of information as before (ID, offset, length) with the only difference being that the ID is explicit, so there does not need to be large gaps everywhere that an ID doesn't exist (in fact, it's impossible for an ID in offset_map_id_list to not exist, as far as I know).

Note that the WDC2 string changes are still in effect and that when introducing multiple sections, we discovered an additional quirk to these changes that was never noticed before. When calculating WDC2+ string offsets, you need to subtract out the record data size of any sections that occur after the current section and subtract out the size of any string data sections that occur before the current section. This issue only manifests itself on DB2s with multiple sections, no offset map, and non-zero string tables. Re-read the WDC2 section for additional information on the string changes.

Structure

struct wdc3_db2_header
{
  uint32_t magic;                  // 'WDC3'
  uint32_t record_count;           // this is for all sections combined now
  uint32_t field_count;
  uint32_t record_size;
  uint32_t string_table_size;      // this is for all sections combined now
  uint32_t table_hash;             // hash of the table name
  uint32_t layout_hash;            // this is a hash field that changes only when the structure of the data changes
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                 // as seen in TextWowEnum
  uint16_t flags;                  // possible values are listed in Known Flag Meanings
  uint16_t id_index;               // this is the index of the field containing ID values; this is ignored if flags & 0x04 != 0
  uint32_t total_field_count;      // from WDC1 onwards, this value seems to always be the same as the 'field_count' value
  uint32_t bitpacked_data_offset;  // relative position in record where bitpacked data begins; not important for parsing the file
  uint32_t lookup_column_count;
  uint32_t field_storage_info_size;
  uint32_t common_data_size;
  uint32_t pallet_data_size;
  uint32_t section_count;          // new to WDC2, this is number of sections of data
};
wdc3_db2_header header;

// a section = records + string block + id list + copy table + offset map + offset map id list + relationship map
struct wdc3_section_header
{
  uint64_t tact_key_hash;          // TactKeyLookup hash
  uint32_t file_offset;            // absolute position to the beginning of the section
  uint32_t record_count;           // 'record_count' for the section
  uint32_t string_table_size;      // 'string_table_size' for the section
  uint32_t offset_records_end;     // Offset to the spot where the records end in a file with an offset map structure;
  uint32_t id_list_size;           // Size of the list of ids present in the section
  uint32_t relationship_data_size; // Size of the relationship data in the section
  uint32_t offset_map_id_count;    // Count of ids present in the offset map in the section
  uint32_t copy_table_count;       // Count of the number of deduplication entries (you can multiply by 8 to mimic the old 'copy_table_size' field)
};
wdc3_section_header section_headers[header.section_count];

struct field_structure
{
   int16_t size;                   // size in bits as calculated by: byteSize = (32 - size) / 8; this value can be negative to indicate field sizes larger than 32-bits
   uint16_t position;              // position of the field within the record, relative to the start of the record
};
field_structure fields[header.total_field_count];

enum field_compression
{
  // None -- usually the field is a 8-, 16-, 32-, or 64-bit integer in the record data. But can contain 96-bit value representing 3 floats as well
  field_compression_none,
  // Bitpacked -- the field is a bitpacked integer in the record data.  It
  // is field_size_bits long and starts at field_offset_bits.
  // A bitpacked value occupies
  //   (field_size_bits + (field_offset_bits & 7) + 7) / 8
  // bytes starting at byte
  //   field_offset_bits / 8
  // in the record data.  These bytes should be read as a little-endian value,
  // then the value is shifted to the right by (field_offset_bits & 7) and
  // masked with ((1ull << field_size_bits) - 1).
  field_compression_bitpacked,
  // Common data -- the field is assumed to be a default value, and exceptions
  // from that default value are stored in the corresponding section in
  // common_data as pairs of { uint32_t record_id; uint32_t value; }.
  field_compression_common_data,
  // Bitpacked indexed -- the field has a bitpacked index in the record data.
  // This index is used as an index into the corresponding section in
  // pallet_data.  The pallet_data section is an array of uint32_t, so the index
  // should be multiplied by 4 to obtain a byte offset.
  field_compression_bitpacked_indexed,
  // Bitpacked indexed array -- the field has a bitpacked index in the record
  // data.  This index is used as an index into the corresponding section in
  // pallet_data.  The pallet_data section is an array of uint32_t[array_count],
  //
  field_compression_bitpacked_indexed_array,
  // Same as field_compression_bitpacked
  field_compression_bitpacked_signed,
};

struct field_storage_info
{
  uint16_t          field_offset_bits;
  uint16_t          field_size_bits; // very important for reading bitpacked fields; size is the sum of all array pieces in bits - for example, uint32[3] will appear here as '96'
  // additional_data_size is the size in bytes of the corresponding section in
  // common_data or pallet_data.  These sections are in the same order as the
  // field_info, so to find the offset, add up the additional_data_size of any
  // previous fields which are stored in the same block (common_data or
  // pallet_data).
  uint32_t          additional_data_size;
  field_compression storage_type;
  switch (storage_type)
  {
    case field_compression.field_compression_bitpacked:
    case field_compression.field_compression_bitpacked_signed:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t flags; // known values - 0x01: sign-extend (signed)
      break;
    case field_compression.field_compression_common_data:
      uint32_t default_value;
      uint32_t unk_or_unused2;
      uint32_t unk_or_unused3;
      break;
    case field_compression.field_compression_bitpacked_indexed:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t unk_or_unused3;
      break;
    case field_compression.field_compression_bitpacked_indexed_array:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t array_count;
      break;
    default:
      uint32_t unk_or_unused1;
      uint32_t unk_or_unused2;
      uint32_t unk_or_unused3;
      break;
  }
};
field_storage_info field_info[header.field_storage_info_size / sizeof(field_storage_info)];
char pallet_data[header.pallet_data_size];
char common_data[header.common_data_size];

struct section
{
  if ((header.flags & 1) == 0) {
    // Normal records
    struct record_data
    {
      char data[header.record_size];
    };
    record_data records[section_headers.record_count];
    char string_data[section_headers.string_table_size];
  } else {
    // Offset map records -- these records have null-terminated strings inlined, and
    // since they are variable-length, they are pointed to by an array of 6-byte offset+size pairs.
    char variable_record_data[section_headers.offset_records_end - section_headers.file_offset];
  }
  
  uint32_t id_list[section_headers.id_list_size / 4];
  if (section_headers.copy_table_count > 0) {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[section_headers.copy_table_count];
  }
  struct offset_map_entry
  {
    uint32_t offset;
    uint16_t size;
  };
  offset_map_entry offset_map[section_headers.offset_map_id_count];
  if (section_headers.relationship_data_size > 0) {
    // In some tables, this relationship mapping replaced columns that were used only as a lookup, such as the SpellID in SpellX* tables.
    struct relationship_entry
    {
      // This is the id of the foreign key for the record, e.g. SpellID in SpellX* tables.
      uint32_t foreign_id;
      // This is the index of the record in record_data.  Note that this is *not* the record's own ID.
      uint32_t record_index;
    };
    struct relationship_mapping
    {
      uint32_t            num_entries;
      uint32_t            min_id;
      uint32_t            max_id;
      relationship_entry  entries[num_entries];
    };
    relationship_mapping relationship_map;
  }
  uint32_t offset_map_id_list[section_headers.offset_map_id_count];
};
section data_sections[header.section_count];

WDC4

This section only applies to versions DF (10.1.0.48480) … 10.2.5.52393.

In WDC4 'encrypted_status' struct got added after 'common_data', this structure holds the IDs of encrypted records, likely to give a better idea of what IDs are available/unavailable while loading the DB2 (provided you have keys available).

WDC4 also has the reappearance of flag 0x02 in Collectable* sparse tables, which if set, moves offset_map_id_list to before relationship_map and in relationship_entry uses record IDs instead of record index.

Structure

struct wdc4_db2_header
{
  uint32_t magic;                  // 'WDC4'
  uint32_t record_count;           // this is for all sections combined now
  uint32_t field_count;
  uint32_t record_size;
  uint32_t string_table_size;      // this is for all sections combined now
  uint32_t table_hash;             // hash of the table name
  uint32_t layout_hash;            // this is a hash field that changes only when the structure of the data changes
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                 // as seen in TextWowEnum
  uint16_t flags;                  // possible values are listed in Known Flag Meanings
  uint16_t id_index;               // this is the index of the field containing ID values; this is ignored if flags & 0x04 != 0
  uint32_t total_field_count;      // from WDC1 onwards, this value seems to always be the same as the 'field_count' value
  uint32_t bitpacked_data_offset;  // relative position in record where bitpacked data begins; not important for parsing the file
  uint32_t lookup_column_count;
  uint32_t field_storage_info_size;
  uint32_t common_data_size;
  uint32_t pallet_data_size;
  uint32_t section_count;          // new to WDC2, this is number of sections of data
};
wdc4_db2_header header;

// a section = records + string block + id list + copy table + offset map + offset map id list + relationship map
struct wdc4_section_header
{
  uint64_t tact_key_hash;          // TactKeyLookup hash
  uint32_t file_offset;            // absolute position to the beginning of the section
  uint32_t record_count;           // 'record_count' for the section
  uint32_t string_table_size;      // 'string_table_size' for the section
  uint32_t offset_records_end;     // Offset to the spot where the records end in a file with an offset map structure;
  uint32_t id_list_size;           // Size of the list of ids present in the section
  uint32_t relationship_data_size; // Size of the relationship data in the section
  uint32_t offset_map_id_count;    // Count of ids present in the offset map in the section
  uint32_t copy_table_count;       // Count of the number of deduplication entries (you can multiply by 8 to mimic the old 'copy_table_size' field)
};
wdc4_section_header section_headers[header.section_count];

struct field_structure
{
   int16_t size;                   // size in bits as calculated by: byteSize = (32 - size) / 8; this value can be negative to indicate field sizes larger than 32-bits
   uint16_t position;              // position of the field within the record, relative to the start of the record
};
field_structure fields[header.total_field_count];

enum field_compression
{
  // None -- usually the field is a 8-, 16-, 32-, or 64-bit integer in the record data. But can contain 96-bit value representing 3 floats as well
  field_compression_none,
  // Bitpacked -- the field is a bitpacked integer in the record data.  It
  // is field_size_bits long and starts at field_offset_bits.
  // A bitpacked value occupies
  //   (field_size_bits + (field_offset_bits & 7) + 7) / 8
  // bytes starting at byte
  //   field_offset_bits / 8
  // in the record data.  These bytes should be read as a little-endian value,
  // then the value is shifted to the right by (field_offset_bits & 7) and
  // masked with ((1ull << field_size_bits) - 1).
  field_compression_bitpacked,
  // Common data -- the field is assumed to be a default value, and exceptions
  // from that default value are stored in the corresponding section in
  // common_data as pairs of { uint32_t record_id; uint32_t value; }.
  field_compression_common_data,
  // Bitpacked indexed -- the field has a bitpacked index in the record data.
  // This index is used as an index into the corresponding section in
  // pallet_data.  The pallet_data section is an array of uint32_t, so the index
  // should be multiplied by 4 to obtain a byte offset.
  field_compression_bitpacked_indexed,
  // Bitpacked indexed array -- the field has a bitpacked index in the record
  // data.  This index is used as an index into the corresponding section in
  // pallet_data.  The pallet_data section is an array of uint32_t[array_count],
  //
  field_compression_bitpacked_indexed_array,
  // Same as field_compression_bitpacked
  field_compression_bitpacked_signed,
};

struct field_storage_info
{
  uint16_t          field_offset_bits;
  uint16_t          field_size_bits; // very important for reading bitpacked fields; size is the sum of all array pieces in bits - for example, uint32[3] will appear here as '96'
  // additional_data_size is the size in bytes of the corresponding section in
  // common_data or pallet_data.  These sections are in the same order as the
  // field_info, so to find the offset, add up the additional_data_size of any
  // previous fields which are stored in the same block (common_data or
  // pallet_data).
  uint32_t          additional_data_size;
  field_compression storage_type;
  switch (storage_type)
  {
    case field_compression.field_compression_bitpacked:
    case field_compression.field_compression_bitpacked_signed:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t flags; // known values - 0x01: sign-extend (signed)
      break;
    case field_compression.field_compression_common_data:
      uint32_t default_value;
      uint32_t unk_or_unused2;
      uint32_t unk_or_unused3;
      break;
    case field_compression.field_compression_bitpacked_indexed:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t unk_or_unused3;
      break;
    case field_compression.field_compression_bitpacked_indexed_array:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t array_count;
      break;
    default:
      uint32_t unk_or_unused1;
      uint32_t unk_or_unused2;
      uint32_t unk_or_unused3;
      break;
  }
};

// Holds encryption status of specific ID in a section where tact_key_hash is not 0.
struct encrypted_status {
  int encrypted_id_count;
  int encrypted_id [encrypted_id_count];
};

field_storage_info field_info[header.field_storage_info_size / sizeof(field_storage_info)];
char pallet_data[header.pallet_data_size];
char common_data[header.common_data_size];

// encrypted_status structure is only used on sections where tact_key_hash is not 0
local int i;
for (i = 0; i < header.section_count; ++i)
{
  if (section_headers[i].tact_key_hash == 0)
    continue;

  encrypted_status encrypted_records;
}

struct section
{
  if ((header.flags & 1) == 0) {
    // Normal records
    struct record_data
    {
      char data[header.record_size];
    };
    record_data records[section_headers.record_count];
    char string_data[section_headers.string_table_size];
  } else {
    // Offset map records -- these records have null-terminated strings inlined, and
    // since they are variable-length, they are pointed to by an array of 6-byte offset+size pairs.
    char variable_record_data[section_headers.offset_records_end - section_headers.file_offset];
  }
  
  uint32_t id_list[section_headers.id_list_size / 4];
  if (section_headers.copy_table_count > 0) {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[section_headers.copy_table_count];
  }
  struct offset_map_entry
  {
    uint32_t offset;
    uint16_t size;
  };
  offset_map_entry offset_map[section_headers.offset_map_id_count];
  if (section_headers.relationship_data_size > 0) {
    // In some tables, this relationship mapping replaced columns that were used only as a lookup, such as the SpellID in SpellX* tables.
    struct relationship_entry
    {
      // This is the id of the foreign key for the record, e.g. SpellID in SpellX* tables.
      uint32_t foreign_id;
      // This is the index of the record in record_data.  Note that this is *not* the record's own ID *unless* flag 0x02 is set.
      uint32_t record_index;
    };
    struct relationship_mapping
    {
      uint32_t            num_entries;
      uint32_t            min_id;
      uint32_t            max_id;
      relationship_entry  entries[num_entries];
    };
    relationship_mapping relationship_map;
  }
  // Note, if flag 0x02 is set offset_map_id_list will appear before relationship_map instead.
  uint32_t offset_map_id_list[section_headers.offset_map_id_count];
};
section data_sections[header.section_count];

WDC5

This section only applies to versions ≥ DF (10.2.5.52432).

WDC5 adds what appears to be a numeric version number right after the WDC5 magic and a build string padded up to 128 bytes after that. No apparent further changes.

Structure

struct wdc5_db2_header
{
  uint32_t magic;                  // 'WDC5'
  uint32_t versionNum;             // 5, probably numeric version?
  char schemaString[128];          // "WowStatic_Patch_10_2_5" + padding in 10.2.5.52432
  uint32_t record_count;           // this is for all sections combined now
  uint32_t field_count;
  uint32_t record_size;
  uint32_t string_table_size;      // this is for all sections combined now
  uint32_t table_hash;             // hash of the table name
  uint32_t layout_hash;            // this is a hash field that changes only when the structure of the data changes
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                 // as seen in TextWowEnum
  uint16_t flags;                  // possible values are listed in Known Flag Meanings
  uint16_t id_index;               // this is the index of the field containing ID values; this is ignored if flags & 0x04 != 0
  uint32_t total_field_count;      // from WDC1 onwards, this value seems to always be the same as the 'field_count' value
  uint32_t bitpacked_data_offset;  // relative position in record where bitpacked data begins; not important for parsing the file
  uint32_t lookup_column_count;
  uint32_t field_storage_info_size;
  uint32_t common_data_size;
  uint32_t pallet_data_size;
  uint32_t section_count;          // new to WDC2, this is number of sections of data
};
wdc5_db2_header header;
// a section = records + string block + id list + copy table + offset map + offset map id list + relationship map
struct wdc5_section_header
{
  uint64_t tact_key_hash;          // TactKeyLookup hash
  uint32_t file_offset;            // absolute position to the beginning of the section
  uint32_t record_count;           // 'record_count' for the section
  uint32_t string_table_size;      // 'string_table_size' for the section
  uint32_t offset_records_end;     // Offset to the spot where the records end in a file with an offset map structure;
  uint32_t id_list_size;           // Size of the list of ids present in the section
  uint32_t relationship_data_size; // Size of the relationship data in the section
  uint32_t offset_map_id_count;    // Count of ids present in the offset map in the section
  uint32_t copy_table_count;       // Count of the number of deduplication entries (you can multiply by 8 to mimic the old 'copy_table_size' field)
};
wdc5_section_header section_headers[header.section_count];

struct field_structure
{
   int16_t size;                   // size in bits as calculated by: byteSize = (32 - size) / 8; this value can be negative to indicate field sizes larger than 32-bits
   uint16_t position;              // position of the field within the record, relative to the start of the record
};
field_structure fields[header.total_field_count];

enum field_compression
{
  // None -- usually the field is a 8-, 16-, 32-, or 64-bit integer in the record data. But can contain 96-bit value representing 3 floats as well
  field_compression_none,
  // Bitpacked -- the field is a bitpacked integer in the record data.  It
  // is field_size_bits long and starts at field_offset_bits.
  // A bitpacked value occupies
  //   (field_size_bits + (field_offset_bits & 7) + 7) / 8
  // bytes starting at byte
  //   field_offset_bits / 8
  // in the record data.  These bytes should be read as a little-endian value,
  // then the value is shifted to the right by (field_offset_bits & 7) and
  // masked with ((1ull << field_size_bits) - 1).
  field_compression_bitpacked,
  // Common data -- the field is assumed to be a default value, and exceptions
  // from that default value are stored in the corresponding section in
  // common_data as pairs of { uint32_t record_id; uint32_t value; }.
  field_compression_common_data,
  // Bitpacked indexed -- the field has a bitpacked index in the record data.
  // This index is used as an index into the corresponding section in
  // pallet_data.  The pallet_data section is an array of uint32_t, so the index
  // should be multiplied by 4 to obtain a byte offset.
  field_compression_bitpacked_indexed,
  // Bitpacked indexed array -- the field has a bitpacked index in the record
  // data.  This index is used as an index into the corresponding section in
  // pallet_data.  The pallet_data section is an array of uint32_t[array_count],
  //
  field_compression_bitpacked_indexed_array,
  // Same as field_compression_bitpacked
  field_compression_bitpacked_signed,
};

struct field_storage_info
{
  uint16_t          field_offset_bits;
  uint16_t          field_size_bits; // very important for reading bitpacked fields; size is the sum of all array pieces in bits - for example, uint32[3] will appear here as '96'
  // additional_data_size is the size in bytes of the corresponding section in
  // common_data or pallet_data.  These sections are in the same order as the
  // field_info, so to find the offset, add up the additional_data_size of any
  // previous fields which are stored in the same block (common_data or
  // pallet_data).
  uint32_t          additional_data_size;
  field_compression storage_type;
  switch (storage_type)
  {
    case field_compression.field_compression_bitpacked:
    case field_compression.field_compression_bitpacked_signed:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t flags; // known values - 0x01: sign-extend (signed)
      break;
    case field_compression.field_compression_common_data:
      uint32_t default_value;
      uint32_t unk_or_unused2;
      uint32_t unk_or_unused3;
      break;
    case field_compression.field_compression_bitpacked_indexed:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t unk_or_unused3;
      break;
    case field_compression.field_compression_bitpacked_indexed_array:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t array_count;
      break;
    default:
      uint32_t unk_or_unused1;
      uint32_t unk_or_unused2;
      uint32_t unk_or_unused3;
      break;
  }
};

// Holds encryption status of specific ID in a section where tact_key_hash is not 0.
struct encrypted_status {
  int encrypted_id_count;
  int encrypted_id [encrypted_id_count];
};

field_storage_info field_info[header.field_storage_info_size / sizeof(field_storage_info)];
char pallet_data[header.pallet_data_size];
char common_data[header.common_data_size];

// encrypted_status structure is only used on sections where tact_key_hash is not 0
local int i;
for (i = 0; i < header.section_count; ++i)
{
  if (section_headers[i].tact_key_hash == 0)
    continue;

  encrypted_status encrypted_records;
}

struct section
{
  if ((header.flags & 1) == 0) {
    // Normal records
    struct record_data
    {
      char data[header.record_size];
    };
    record_data records[section_headers.record_count];
    char string_data[section_headers.string_table_size];
  } else {
    // Offset map records -- these records have null-terminated strings inlined, and
    // since they are variable-length, they are pointed to by an array of 6-byte offset+size pairs.
    char variable_record_data[section_headers.offset_records_end - section_headers.file_offset];
  }
  
  uint32_t id_list[section_headers.id_list_size / 4];
  if (section_headers.copy_table_count > 0) {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[section_headers.copy_table_count];
  }
  struct offset_map_entry
  {
    uint32_t offset;
    uint16_t size;
  };
  offset_map_entry offset_map[section_headers.offset_map_id_count];
  if (section_headers.relationship_data_size > 0) {
    // In some tables, this relationship mapping replaced columns that were used only as a lookup, such as the SpellID in SpellX* tables.
    struct relationship_entry
    {
      // This is the id of the foreign key for the record, e.g. SpellID in SpellX* tables.
      uint32_t foreign_id;
      // This is the index of the record in record_data.  Note that this is *not* the record's own ID *unless* flag 0x02 is set.
      uint32_t record_index;
    };
    struct relationship_mapping
    {
      uint32_t            num_entries;
      uint32_t            min_id;
      uint32_t            max_id;
      relationship_entry  entries[num_entries];
    };
    relationship_mapping relationship_map;
  }
  // Note, if flag 0x02 is set offset_map_id_list will appear before relationship_map instead.
  uint32_t offset_map_id_list[section_headers.offset_map_id_count];
};
section data_sections[header.section_count];

Known Flag Meanings for WDB4+

flags & 0x01 = 'Has offset map'
flags & 0x02 = 'Has relationship data' // This may be 'secondary keys' and is unrelated to WDC1+ relationships (not present in any WDC3 files, but is present in WDC4+, see note there)
flags & 0x04 = 'Has non-inline IDs'
flags & 0x10 = 'Is bitpacked' // WDC1+ (not present in any WDC3 files, possibly outdated?)

Table Hashes

Table Hashes are the name without .db2 or path, e.g. "ZoneStory", slapped into SStrHash (table_name, false, 0) (i.e. implicitly uppercased).

Layout Hashes

It is not currently known how layout hashes are generated.

Layout hashes change when:

  • Columns get renamed
  • Columns get moved
  • Columns change types (i.e. int -> uint)
  • Column going from noninline -> inline vice versa
  • Column status changes (inline/noninline, relation/no relation)
  • Indexes are added/removed (i.e. a primary or unique index)

Layout hashes DO NOT change when:

  • Number of elements in array columns change

Based on the above information one can speculate that layouthashes are a hash of an SQL query