DB2

From wowdev
Jump to navigation Jump to search
This section only applies to versions ≥ Cata.

DB2 files are the new version of client side databases, introduced in Cataclysm, containing data about items, NPCs, environment, world and a lot more. They are similar in many ways to DBC files, so you may want to also look at those. They both have headers with similar data, records containing fields of various types with various data, and a string block where 4-byte string references are made. The difference is that DB2 files have become wildly more complex than their simple DBC brethren, especially in recent times. The structure described here was also used in ADB files for many years, which are a cache of dynamically streamed database entries, typically used hotfixes and content that Blizzard wants to hide from dataminers until the last minute. Recently the file structures of DB2 files and ADB files have diverged greatly.

Table content structures

This page describes the structure of DB2 files. For a list of existing DB2 files and their contents see the categories DBC, Vanilla, Burning Crusade, Wrath of the Lich King, Cataclysm, Mists of Pandaria, Warlords of Draenor and Legion. If you add documentation for a file, please add the correct categories (also the build number) as well.

Field Types

WDB2 began with the following possible field types:

64-bit Integers*
32-bit Integers*
8-bit Integers*
Floats
Strings (strings are represented in the record data as a 32-bit unsigned integer, see the String Block section for more information)

Additionally, WDB3 added the following possible field type:

16-bit Integers*

Note that Blizzard does not differentiate between signed and unsigned field types; WoW code simply casts the data around as it needs. Because of this, some fields will make more sense as signed integers (example: a casting time reduction) and some fields will make more sense as unsigned integers (example: bitfields). You will have to make the determination as to which fields should be which on your own. Personally, I default to Signed 32-bit Integers, Unsigned 16-bit Integers, and Unsigned 8-bit Integers.

As an addendum to the above paragraph, in the binary's DBCMeta for WDB5-ready Legion clients, there is a flag for each field that can designate 'do not sign-extend when decompressing'. However, there are fields which obviously contain unsigned data that are not marked with this flag, so it is not exhaustive.

Traditionally, 64-bit integers have only made an appearance in a very tiny number of DB2s (usually 1 DB2 at any given time) and only for short periods of time. This meant that 64-bit integer support was not necessary. However, the addition of Allied Races in Legion's Patch 7.3.5 Build 25600 caused this to change. There were no longer enough bits for 'RaceMask', a field used in half a dozen different DB2s, and the field was expanded to be a 64-bit integer.

Determining Field Types

In WDB2, you can mostly just assume that every field will be four bytes. Three out of the five possibilities are four bytes, and the 8-bit integers are only used very, very rarely (literally like 3-4 files out of hundreds). The 64-bit Integers are actually only used ONCE so far (in CriteriaTree it was needed to store the number 2,500,000,000, which is 250k gold in copper). Deciding whether or not a four-byte value is a float, an integer, or a string is not terribly difficult (floats will basically always have certain bits set and every value in a string field will be an offset into the string block, which you can check), and this approach gives you ~98% compatibility with the WDB2 format with minimal effort.

In WDB3 and WDB4, things become much, much harder. Determining field types on the fly is virtually impossible, and the majority of DBs (80%+) have at least one field that is not four bytes. The only proper solution is to read the WoW binary executable and parse it for the DBMeta structure. In that structure you will find the field types for all fields.

In the newer formats such as WDB5 and WDB6, determining field types on the fly is much more doable. Any field being compressed cannot be a string or a float as they are always 4 bytes, meaning you can just assume they are integers. You can use the 'field structure block' present in WDB5+ in order to determine the size of the fields and the distance between the offsets in the field structure block in order to determine array length for all fields except the last one (the last field will be padded out to 'record_size').

String Block

DB2 records can contain localized strings. In contrast to DBCs, a DB2 file only contains localized values for a given locale (header.locale).

Since Cataclysm, all DB files contain only localized string values, including DBCs.

The rest of the string block is equivalent to DBC version. See documentation there.

WDB2

This section only applies to versions Cata … < Legion (7.0.1.20740).

This file format was introduced in Cataclysm. It was phased out in favor of WDB3 in Legion (Patch 7.0.1 build 20740).

Structure

See ADB#WCH3 for how to adapt this structure for its .ADB counterpart.

struct db2_header
{
  uint32_t magic;                                               // 'WDB2'
  uint32_t record_count;
  uint32_t field_count;                                         // array fields count as the size of array for WDB2
  uint32_t record_size;
  uint32_t string_table_size;                                   // string block almost always contains at least one zero-byte
  uint32_t table_hash;
  uint32_t build;
  uint32_t timestamp_last_written;                              // set to time(0); when writing in WowClientDB2_Base::Save()
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
  uint32_t copy_table_size;                                     // always zero in WDB2 (?) - see WDB3 for information on how to parse this
};

template<typename record_type>
struct db2_file
{
  db2_header header;

  if (header.max_id != 0)
  {
    int indices[header.max_id - header.min_id + 1];             // maps from id to row index in records[] below
    short string_lengths[header.max_id - header.min_id + 1];    // sum of lengths of all strings in row
  }
 
  record_type records[header.record_count];*
  char string_table[header.string_table_size];
};
  • Note: Each record is aligned to the length of the longest field type in the record. If you have a record with three fields: Int32, Int8, and Int8 - then there will be 2 bytes of padding at the end of every record, including the last one. 'record_size' should account for this (it would be '8' in the example given in the previous sentence).

WDB3

This section only applies to versions Legion (7.0.1.20740) … < Legion (7.0.1.20810).

This file format was introduced in Legion (Patch 7.0.1 build 20740). It was phased out in favor of WDB4/WCH5 just a few builds later, also in Legion (Patch 7.0.1 build 20810). As such, support for WDB3 and WCH4 is likely important for almost nobody.

The major changes are the addition of an optional offset map which forces records to have inline strings, an optional block after the string block that contains non-inline IDs, and an optional block after that block which contains the ID numbers of rows that have been deduplicated to save space.

It is worth noting that min_id, max_id, and copy_table_size fields now see different use, despite the header remaining the same. min_id and max_id are always non-zero, even when the offset_map structure is present. Additionally, the copy_table_size field is now non-zero sometimes, requiring action.

Structure

See ADB#WCH4 for how to adapt this structure for its .ADB counterpart.

template<typename record_type>
struct wdb3_file
{
  db2_header header;
  struct offset_map_entry
  {
    uint32_t offset;                                             // This offset is absolute, not relative to another structure; this can (and often will) be zero, in which case you should ignore that entry and move on
    uint16_t length;                                             // This is the length of the record located at the specified offset
  };
  offset_map_entry offset_map[header.max_id - header.min_id + 1];*
  record_type records[header.record_count]; 
  char string_table[header.string_table_size];
  uint32_t IDs[header.record_count];*
  if (header.copy_table_size > 0)
  {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[header.copy_table_size / sizeof(copy_table_entry)];
  }
};

Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string block usually still exists, just as size 2 with two blank entries.

  • This part of the structure is optional.

WDB4

This section only applies to versions Legion (7.0.1.20810) … < Legion (7.0.3.21414).

This file format was introduced in Legion (Patch 7.0.1 build 20810) and is still in use today. It was partially phased out in favor of WDB5 just a few months later, also in Legion (Patch 7.0.3 build 21414). As such, support for WDB4 is likely important for almost nobody.

The offset_map structure has moved back farther in the file and is now located in between the string table and the non-inline IDs table. Additionally, while not a format change, the string table is now always 0 bytes if the offset map is present. In those cases, the 'string_table_size' header value changes its meaning to "offset to the start of the offset map". This is a minor optimization since parsing the file must begin with the offset map.

The header has changed from the original format; it gained one new field ('flags'). This flags field allows for easy detection of the optional structures.

Structure

See ADB#WCH5 for how to adapt this structure for its .ADB counterpart.

struct wdb4_db2_header
{
  uint32_t magic;                                               // 'WDB4'
  uint32_t record_count;
  uint32_t field_count;                                         // array fields count as the size of array for WDB4
  uint32_t record_size;
  uint32_t string_table_size;                                   // if flags & 0x01 != 0, this field takes on a new meaning - it becomes an absolute offset to the beginning of the offset_map
  uint32_t table_hash;
  uint32_t build;
  uint32_t timestamp_last_written;                              // set to time(0); when writing in WowClientDB2_Base::Save()
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
  uint32_t copy_table_size;
  uint32_t flags;                                               // in WDB3, this field was in the WoW executable's DBCMeta instead; possible values are listed in Known Flag Meanings
};

template<typename record_type>
struct wdb4_file
{
  wdb4_db2_header header;
  record_type records[header.record_count]; 
  char string_table[header.string_table_size];
  if (flags & 0x01 != 0)
  {
    struct offset_map_entry
    {
      uint32_t offset;                                           // this offset is absolute, not relative to another structure; this can (and often will) be zero, in which case you should ignore that entry and move on
      uint16_t length;                                           // this is the length of the record located at the specified offset
    };
    offset_map_entry offset_map[header.max_id - header.min_id + 1];
  }
  if (flags & 0x04 != 0)
  {
    uint32_t IDs[header.record_count];
  }
  if (header.copy_table_size > 0)
  {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[header.copy_table_size / sizeof(copy_table_entry)];
  }
};

Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string_table will not exist.

WDB5

This section only applies to versions Legion (7.0.3.21479) … < Legion (7.2.0.23436).

This file format was introduced in Legion (Patch 7.0.3 build 21479) and was replaced by WDB6 later in Legion (Patch 7.2.0 build 23436). There have been a variety of ADB formats used simultaneously with WDB5, including WCH5, WCH6, WCH7, and WCH8.

There are very significant changes to the format in WDB5 which will require substantial effort for existing tools to support. The main changes are that the field_count now counts arrays as '1' (mirroring very old ADB behavior), a field_structure block has been added directly after the header, and the addition of compressed fields (and 24-bit integers). The introduction of the field_structure block, in particular, is hugely positive for us and should improve the accuracy of DB2 parsers or at least reduce their dependence on the WoW binary's DBCMeta. Note that the WoW binary's DBCMeta will often disagree with the 'field_structure' (DBCMeta might say 'int32' but the field_structure block says the size is '3-bytes'). In those cases, the field_structure block takes priority. Additionally, the DBCMeta is still relevant for parsing ADBs, as they do not support compression.

The header has lost a field compared to WDB4, timestamp_last_written. This field was useless in DB2s and always 0, so its removal is understandable.

In build 21737, a few builds after the introduction of WDB5, more changes to the header were made. The 'flags' header field was split into two shorts - 'flags' and 'id_index'. 'id_index' is very valuable since the inline ID fields no longer have to be the first field in WDB5; they can appear anywhere in the record. This index lets you know which field is ID, so you can, for example, move it to the beginning of the record for the ease of the viewer. Additionally, the 'build' field was changed to be 'layout_hash'. This hash is unique to the specific column layout, including (at least) position, size, and column name (we don't have most of these names, but we can tell this is true based on layout_hash changes that shouldn't have happened otherwise). It is worth noting that array size is actually not unique per hash. In some cases, the size of an array can change without the hash changing. This replacement for BuildID is often beneficial to us because when layout_hash changes, you know the structure changed (usually - there have been some instances of recalculations).

Structure

See ADB#WCH5 for how to adapt this structure for its .ADB counterpart.
struct wdb5_db2_header
{
  uint32_t magic;                                               // 'WDB5' for .db2 (database)
  uint32_t record_count;
  uint32_t field_count;                                         // for the first time, this counts arrays as '1'; in the past, only the ADB variants have counted arrays as 1 field
  uint32_t record_size;
  uint32_t string_table_size;                                   // if flags & 0x01 != 0, this field takes on a new meaning - it becomes an absolute offset to the beginning of the offset_map
  uint32_t table_hash;
  uint32_t layout_hash;                                         // used to be 'build', but after build 21737, this is a new hash field that changes only when the structure of the data changes
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
  uint32_t copy_table_size;
  uint16_t flags;                                               // possible values are listed in Known Flag Meanings
  uint16_t id_index;                                            // new in WDB5 (and only after build 21737), this is the index of the field containing ID values; this is ignored if flags & 0x04 != 0
};

template<typename record_type>
struct wdb5_file
{
  wdb5_db2_header header;
  struct field_structure
  {
    int16_t size;                                               // size in bits as calculated by: byteSize = (32 - size) / 8; this value can be negative to indicate field sizes larger than 32-bits
    uint16_t position;                                          // position of the field within the record, relative to the start of the record
  };
  field_structure fields[header.field_count];
  record_type records[header.record_count];
  char string_table[header.string_table_size];
  if (flags & 0x01 != 0)
  {
    struct offset_map_entry
    {
      uint32_t offset;                                          // this offset is absolute, not relative to another structure; this can (and often will) be zero, in which case you should ignore that entry and move on
      uint16_t length;                                          // this is the length of the record located at the specified offset
    };
    offset_map_entry offset_map[header.max_id - header.min_id + 1];
  }
  if (flags & 0x02 != 0)
  {
      uint32_t relationship_IDs[header.max_id - header.min_id + 1]; // only ever used by wmominimaptexture.db2 to record WMOIDs
  }
  if (flags & 0x04 != 0)
  {
    uint32_t IDs[header.record_count];
  }
  if (header.copy_table_size > 0)
  {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[header.copy_table_size / sizeof(copy_table_entry)];
  }
};

Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string_table will not exist.

WDB6

This section only applies to versions Legion (7.2.0.23436) … < Legion (7.3.5.25600).

This file format was introduced in Legion (Patch 7.2.0 build 23436) and was replaced by WDC1 later in Legion (Patch 7.3.5 build 25600).

Two new header fields and a new data block were added in WDB6. Both of the header fields relate to the new block, which we have named 'common_data_table'. Its purpose is to drastically reduce db2 filesize and memory footprint (affected db2s have had their filesizes halved) by letting the client assume that many columns are always 0, unless there is an entry in the proper 'common_data_table' that is mapped to the ID of the row in question. For example, in build 23436, only 10 columns are in the 'normal' row data section for SpellEffect.db2. However, the 'common_data_table' supports up to 26 columns (designated by the new header field named 'total_field_count'). To find the value for one of the latter 16 columns in SpellEffect.db2, you look up the column in the 'common_data_table' and then the ID in the 'common_data_map' for that column. If there is an entry, use the value paired with the ID. If there is not an entry, the value 'defaults'.

Default values are stored in the WoW binary, in DBMeta. It's worth noting that almost every field's default value is '0', with a few exceptions. For example, the 'Alpha' byte field in CreatureDisplayInfo defaults to 255, and a couple of floats in SpellEffect.db2 default to 1.

Neither strings or arrays are supported in the 'common_data_table'.

Starting from Patch 7.3.0 Build 24473, values in the 'common_data_table' are always padded out to 4 bytes. Detecting this change is very annoying as there were no other accompanying changes. If you wish to support WDB6 both before and after this build, you will need to attempt to navigate the common_data_table once without padding. Compare the distance you just navigated against the common_data_table_size field from the header. If they match, it is either not padded (pre-7.3) or does not matter (because all of the common data fields are 4 bytes). If it does not match, then it is padded (post-7.3). After determining this, you can then parse the table again with the knowledge that you are doing it correctly.

It is worth a minor mention here that from WDB6 onwards, standalone ADB files were discarded in favor of 'ADB#DBCache.bin', a new format that does not mirror DB2 structure at all.

Structure

struct wdb6_db2_header
{
  uint32_t magic;                                               // 'WDB6'
  uint32_t record_count;
  uint32_t field_count;                                         // this counts arrays as '1' field
  uint32_t record_size;
  uint32_t string_table_size;                                   // if flags & 0x01 != 0, this field takes on a new meaning - it becomes an absolute offset to the beginning of the offset_map
  uint32_t table_hash;
  uint32_t layout_hash;                                         // used to be 'build', but now this is a hash field that changes only when the structure of the data changes
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
  uint32_t copy_table_size;
  uint16_t flags;                                               // possible values are listed in Known Flag Meanings
  uint16_t id_index;                                            // this is the index of the field containing ID values; this is ignored if flags & 0x04 != 0
  uint32_t total_field_count;                                   // new in WDB6, includes columns only expressed in the 'common_data_table', unlike field_count
  uint32_t common_data_table_size;                              // new in WDB6, size of new block called 'common_data_table'
};

template<typename record_type>
struct wdb6_file
{
  wdb6_db2_header header;
  struct field_structure
  {
    int16_t size;                                               // size in bits as calculated by: byteSize = (32 - size) / 8; this value can be negative to indicate field sizes larger than 32-bits
    uint16_t position;                                          // position of the field within the record, relative to the start of the record
  };
  field_structure fields[header.field_count];
  record_type records[header.record_count];
  char string_table[header.string_table_size];
  if (flags & 0x01 != 0)
  {
    struct offset_map_entry
    {
      uint32_t offset;                                          // this offset is absolute, not relative to another structure; this can (and often will) be zero, in which case you should ignore that entry and move on
      uint16_t length;                                          // this is the length of the record located at the specified offset
    };
    offset_map_entry offset_map[header.max_id - header.min_id + 1];
  }
  if (flags & 0x02 != 0)
  {
      uint32_t relationship_IDs[header.max_id - header.min_id + 1]; // only ever used by wmominimaptexture.db2 to record WMOIDs
  }
  if (flags & 0x04 != 0)
  {
    uint32_t IDs[header.record_count];
  }
  if (header.copy_table_size > 0)
  {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[header.copy_table_size / sizeof(copy_table_entry)];
  }
  if (header.common_data_table_size > 0)
  {
    uint32_t num_columns_in_table;
    struct common_data_map_entry
    {
      uint32_t id;
      uint32_t value;                                            // Calling this 'uint32_t' is an oversimplification - the size of this field depends on the 'type' from the enum 
                                                                 // (From Patch 7.3.0 Build 24473 onwards, this is no longer true. Values are always padded out to 4 bytes.)
    };
    struct common_data_table_entry
    {
      uint32_t count;
      uint8_t type;                                              // New enum: string = 0, short = 1, byte = 2, float = 3, int = 4 (int64 = 5??)
      common_data_map_entry common_data_map[count];
    };
    common_data_table_entry common_data_table[num_columns_in_table];
  }
};

Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string_table will not exist.

WDC1

This section only applies to versions Legion (7.3.5.25600) … Battle (8.0.1.26231).

This file format was introduced in Legion (Patch 7.3.5 build 25600) and was replaced by WDC2 in Battle for Azeroth (Patch 8.0.1 Build 26231).

WDC1 is an expanded version of WDB6. If you've been following along with the format changes in Legion thus far, you will notice that the theme has been saving space. Most recently, in WDB6, a 'common_data_block' was added in order to handle columns that are usually always one value but rarely are some other value.

WDC1 takes this idea and adds in even more types of space-saving techniques. It adds in bitpacking (for fields who would be best suited to a field size that is not a multiple of 8), indexed values (for fields with values that are commonly always one of a small number of values but who do not have a common 'default' amongst those values), and indexed array values (for array fields where all fields in an array are commonly linked together, for example, with common spell flag values). It also keeps the 'common_data_block' technique from WDB6, but it streamlines the implementation of it, removing the need for a separate header block for 'common_data'. This is done by adding in a new block we call 'field_storage_info'.

'field_storage_info' contains the size in bits and the offset in bits of every column, along with the size of the column's data in the different data blocks (currently the data blocks are 'pallet_data' and 'common_data'). The structure below explains most of this. Compression type '0' is normal in-field data. Compression types '1', '3', and '4' begin as bitpacked data in the record. For compression type '1', you can stop there, as that is the proper value. For compression types '3' and '4', you need to take the bitpacked value you obtained from the record and use it as an index into 'pallet_data'. For compression type '3', you pull a 4-byte value from 'pallet_data' using the formula 'additional_data_offset + (index * 4)', where 'additional_data_offset' is the sum of 'additional_data_size' for every column before the current one. Compression type '4' is very similar, except that it is always an array. Array values are bundled together for compression type '4' - that single index will give you 4-byte values for every single piece of the array at once. You can retrieve these values by using the formula 'additional_data_offset + (index * 4 * array_count) + (iter * 4)', where 'iter' is the 0-based array piece you wish to retrieve (you should iterate over all of them to retrieve all of the values for the array). Lastly, compression type '2' is just WDB6's old 'common_data_block' system; it is the only compression type where no data at all is read from the record. Just apply the 'default_value' (also provided by 'field_storage_info') when the ID for the record is not present in 'common_data'.

The last major quirk of WDC1 is the 'relationship_map' system. In WDC1, any DB2 that had a primary key which was also a foreign key (in Blizzard's database system, I guess) was removed from the record and placed into this new structure. This is probably for increased lookup speed by the game executable. So, for example, almost all Spell*.db2 tables (besides Spell.db2 because it wouldn't be a foreign key there) had their SpellID columns removed and this new block added in their place. The easiest way to handle this block is just to add a new 'fake' column to the end of the DB2 in question and populate it with the values from the relationship map.

Structure

struct wdc1_db2_header
{
  uint32_t magic;                  // 'WDC1'
  uint32_t record_count;
  uint32_t field_count;
  uint32_t record_size;
  uint32_t string_table_size;
  uint32_t table_hash;             // hash of the table name
  uint32_t layout_hash;            // this is a hash field that changes only when the structure of the data changes
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                 // as seen in TextWowEnum
  uint32_t copy_table_size;
  uint16_t flags;                  // possible values are listed in Known Flag Meanings
  uint16_t id_index;               // this is the index of the field containing ID values; this is ignored if flags & 0x04 != 0
  uint32_t total_field_count;      // from WDC1 onwards, this value seems to always be the same as the 'field_count' value
  uint32_t bitpacked_data_offset;  // relative position in record where bitpacked data begins; not important for parsing the file
  uint32_t lookup_column_count;
  uint32_t offset_map_offset;      // Offset to array of struct {uint32_t offset; uint16_t size;}[max_id - min_id + 1];
  uint32_t id_list_size;           // List of ids present in the DB file
  uint32_t field_storage_info_size;
  uint32_t common_data_size;
  uint32_t pallet_data_size;
  uint32_t relationship_data_size;
};

struct field_structure
{
  uint16_t size;
  uint16_t offset;
};

wdc1_db2_header header;
field_structure fields[header.total_field_count];
if ((header.flags & 1) == 0) {
  // Normal records
  struct record_data
  {
    char data[header.record_size];
  };
  record_data records[header.record_count];
  char string_data[header.string_table_size];
} else {
  // Offset map records -- these records have null-terminated strings inlined, and
  // since they are variable-length, they are pointed to by an array of 6-byte
  // offset+size pairs.
  char variable_record_data[header.offset_map_offset - sizeof(wdc1_db2_header) - (sizeof(field_structure) * header.total_field_count)];
  struct offset_map_entry
  {
    uint32_t offset;
    uint16_t size;
  };
  offset_map_entry offset_map[header.max_id - header.min_id + 1];
}

enum field_compression
{
  // None -- the field is a 8-, 16-, 32-, or 64-bit integer in the record data
  field_compression_none,
  // Bitpacked -- the field is a bitpacked integer in the record data.  It
  // is field_size_bits long and starts at field_offset_bits.
  // A bitpacked value occupies
  //   (field_size_bits + (field_offset_bits & 7) + 7) / 8
  // bytes starting at byte
  //   field_offset_bits / 8
  // in the record data.  These bytes should be read as a little-endian value,
  // then the value is shifted to the right by (field_offset_bits & 7) and
  // masked with ((1ull << field_size_bits) - 1).
  field_compression_bitpacked,
  // Common data -- the field is assumed to be a default value, and exceptions
  // from that default value are stored in the corresponding section in
  // common_data as pairs of { uint32_t record_id; uint32_t value; }.
  field_compression_common_data,
  // Bitpacked indexed -- the field has a bitpacked index in the record data.
  // This index is used as an index into the corresponding section in
  // pallet_data.  The pallet_data section is an array of uint32_t, so the index
  // should be multiplied by 4 to obtain a byte offset.
  field_compression_bitpacked_indexed,
  // Bitpacked indexed array -- the field has a bitpacked index in the record
  // data.  This index is used as an index into the corresponding section in
  // pallet_data.  The pallet_data section is an array of uint32_t[array_count],
  //
  field_compression_bitpacked_indexed_array,
};

struct field_storage_info
{
  uint16_t          field_offset_bits;
  uint16_t          field_size_bits; // very important for reading bitpacked fields; size is the sum of all array pieces in bits - for example, uint32[3] will appear here as '96'
  // additional_data_size is the size in bytes of the corresponding section in
  // common_data or pallet_data.  These sections are in the same order as the
  // field_info, so to find the offset, add up the additional_data_size of any
  // previous fields which are stored in the same block (common_data or
  // pallet_data).
  uint32_t          additional_data_size;
  field_compression storage_type;
  switch (storage_type)
  {
    case field_compression.field_compression_bitpacked:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t flags; // known values - 0x01: sign-extend (signed)
      break;
    case field_compression.field_compression_common_data:
      uint32_t default_value;
      uint32_t unk_or_unused2;
      uint32_t unk_or_unused3;
      break;
    case field_compression.field_compression_bitpacked_indexed:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t unk_or_unused3;
      break;
    case field_compression.field_compression_bitpacked_indexed_array:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t array_count;
      break;
    default:
      uint32_t unk_or_unused1;
      uint32_t unk_or_unused2;
      uint32_t unk_or_unused3;
      break;
  }
};

uint32_t id_list[header.id_list_size / 4];
if (header.copy_table_size > 0) {
  struct copy_table_entry
  {
    uint32_t id_of_new_row;
    uint32_t id_of_copied_row;
  };
  copy_table_entry copy_table[header.copy_table_size / sizeof(copy_table_entry)];
}
field_storage_info field_info[header.field_storage_info_size / sizeof(field_storage_info)];
char pallet_data[header.pallet_data_size];
char common_data[header.common_data_size];
if (header.relationship_data_size > 0) {
  // In some tables, this relationship mapping replaced columns that were used
  // only as a lookup, such as the SpellID in SpellX* tables.
  struct relationship_entry
  {
    // This is the id of the foreign key for the record, e.g. SpellID in
    // SpellX* tables.
    uint32_t foreign_id;
    // This is the index of the record in record_data.  Note that this is
    // *not* the record's own ID.
    uint32_t record_index;
  };
  struct relationship_mapping
  {
    uint32_t            num_entries;
    uint32_t            min_id;
    uint32_t            max_id;
    relationship_entry  entries[num_entries];
  };
  relationship_mapping relationship_map;
}

Further Quirks

The 'field structure' section introduced in WDB5 still exists here, but 'field storage info' has all of the same information and more. You may wonder - why should I even bother reading field structure now if I only care about WDC1 support? There is one good reason - array sizes. In WDC1, the field structure section reports the size of the individual field piece in the file (for compression type '0') in bytes. However, the field storage info section reports the size of the entire field in bits. The difference in these values is important if an array is the field in question. For example, in field structure, an array of uint32[3] would be reported as size '4', while in field storage info, it would be reported as '96'. This technicality greatly benefits us as a trivial bit of math allows you to instantly deduce array sizes when reading the file: 'ArrayLength = (SizeReportedByFieldStorageInfo / 8) / SizeReportedByFieldStructure'.

There are several fields in the structure with no apparent purpose, but I believe these are all related. 'bitpacked_data_offset' in the header and 'bitpacking_offset_bits' and 'bitpacking_size_bits' from some variants of 'field_storage_info' (the bitpacked ones) are the fields I am referring to. I believe these three values are used to read from the record starting at the position of the bitpacked values. Our best guess is that Blizzard needs this for some purpose, possibly to integrate with their existing tools.

On the note of relationship maps, the aspect that makes them tricky (the parsing is actually very simple) is that sometimes the column in question remains in DB2 data in addition to being in a relationship map. For example, Achievement.db2's CriteriaTreeID field was not removed, but it had a relationship map added anyway. Cases like this are difficult to handle properly. You could just allow the duplicate data to be added as a fake column to the end, as there may not be any harm, especially if you are expecting it. Alternatively, you could parse the game executable's DBCMeta - the fields removed by relationship maps are still present there. If there is a mismatch between the number of fields reported by DBCMeta and the number of fields reported by the DB2, then you know that the relationship map data is unique and needs to be restored. If the number of fields match, then the data in the relationship map already exists in the record and is superfluous.

WDC2

This section only applies to versions ≥ Battle (8.0.1.26231).

This file format was introduced in Battle for Azeroth (Patch 8.0.1 Build 26231) and is currently in use today.

WDC2 introduces a new system of 'sections' of data. Three chunks of the structure ('field_storage_info', 'pallet_data', and 'common_data') were moved up in front of the records / 'sections'. Everything after that was broken into a new form of 'sections' which can potentially occur many times in one file. Note that implementing this system right now appears to be entirely optional as no DB2 file as of the current build has a 'section_count' higher than 1. Also, it appears that the 'offset_map' format (in use for a small handful of DB2s) has a hardcoded limit of 1 section.

The other major change is to strings and how string offsets are calculated. For the history of WoW up until WDC2, strings in records were represented as 'the relative position of the referenced string to the start of the string block'. In WDC2, strings in records are represented as 'the relative position from the beginning of the field where this offset was stored to the position of the referenced string in the string block'. This complicates parsing the string block directly and storing relative locations, but it does slightly simplify reading records.

If before WDC2 you dealt with strings by reading the string block and assigning relative positions to every string, then you will need to either change your approach (to having a constantly-accessible 1:1 representation of the string table available) or mitigate the damage from the change. One method of mitigation is: when reading the values of the string offsets in the records, replace the data at that point in time with 'old-style' string offsets.

For example, using code like this when reading the field in question works:

row[field].LongVal = reader.ReadVarInt(Fields[field].Size);
if (IsWDC2() && row[field].Column.Type == ColumnType.String)
{
    row[field].LongVal = row[field].LongVal + reader.BaseStream.Position - Fields[field].Size - reader.StringBlockLocation;
}

This sample code reads the field containing the string offset in the record, then recalculates the value it just read by adding in the current position and subtracting the size of the field it just read (to get the absolute position of the referenced string) and then by subtracting the absolute position of the start of the string block (which yields the relative position of the referenced string to the start of the string block). This manipulation ensures that a program converts every string offset back to the pre-WDC2 style.

Structure

struct wdc2_db2_header
{
  uint32_t magic;                  // 'WDC2'
  uint32_t record_count;           // this is for all sections combined now
  uint32_t field_count;
  uint32_t record_size;
  uint32_t string_table_size;      // this is for all sections combined now
  uint32_t table_hash;             // hash of the table name
  uint32_t layout_hash;            // this is a hash field that changes only when the structure of the data changes
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                 // as seen in TextWowEnum
  uint16_t flags;                  // possible values are listed in Known Flag Meanings
  uint16_t id_index;               // this is the index of the field containing ID values; this is ignored if flags & 0x04 != 0
  uint32_t total_field_count;      // from WDC1 onwards, this value seems to always be the same as the 'field_count' value
  uint32_t bitpacked_data_offset;  // relative position in record where bitpacked data begins; not important for parsing the file
  uint32_t lookup_column_count;
  uint32_t field_storage_info_size;
  uint32_t common_data_size;
  uint32_t pallet_data_size;
  uint32_t section_count;          // new to WDC2, this is number of sections of data (records + copy table + id list + relationship map = a section)
};
wdc2_db2_header header;

struct wdc2_section_header
{
  uint32_t wdc2_unk_header1;       // always 0 in Battle (8.0.1.26231) and unnamed in client binary
  uint32_t wdc2_unk_header2;       // always 0 in Battle (8.0.1.26231) and unnamed in client binary
  uint32_t file_offset;            // absolute position to the beginning of the section
  uint32_t record_count;           // 'record_count' for the section
  uint32_t string_table_size;      // 'string_table_size' for the section
  uint32_t copy_table_size;
  uint32_t offset_map_offset;      // Offset to array of struct {uint32_t offset; uint16_t size;}[max_id - min_id + 1];
  uint32_t id_list_size;           // List of ids present in the DB file
  uint32_t relationship_data_size;
};
wdc2_section_header section_headers[section_count];

struct field_structure
{
  uint16_t size;
  uint16_t offset;
};
field_structure fields[header.total_field_count];

enum field_compression
{
  // None -- the field is a 8-, 16-, 32-, or 64-bit integer in the record data
  field_compression_none,
  // Bitpacked -- the field is a bitpacked integer in the record data.  It
  // is field_size_bits long and starts at field_offset_bits.
  // A bitpacked value occupies
  //   (field_size_bits + (field_offset_bits & 7) + 7) / 8
  // bytes starting at byte
  //   field_offset_bits / 8
  // in the record data.  These bytes should be read as a little-endian value,
  // then the value is shifted to the right by (field_offset_bits & 7) and
  // masked with ((1ull << field_size_bits) - 1).
  field_compression_bitpacked,
  // Common data -- the field is assumed to be a default value, and exceptions
  // from that default value are stored in the corresponding section in
  // common_data as pairs of { uint32_t record_id; uint32_t value; }.
  field_compression_common_data,
  // Bitpacked indexed -- the field has a bitpacked index in the record data.
  // This index is used as an index into the corresponding section in
  // pallet_data.  The pallet_data section is an array of uint32_t, so the index
  // should be multiplied by 4 to obtain a byte offset.
  field_compression_bitpacked_indexed,
  // Bitpacked indexed array -- the field has a bitpacked index in the record
  // data.  This index is used as an index into the corresponding section in
  // pallet_data.  The pallet_data section is an array of uint32_t[array_count],
  //
  field_compression_bitpacked_indexed_array,
  // Same as field_compression_bitpacked
  field_compression_bitpacked_signed,
};

struct field_storage_info
{
  uint16_t          field_offset_bits;
  uint16_t          field_size_bits; // very important for reading bitpacked fields; size is the sum of all array pieces in bits - for example, uint32[3] will appear here as '96'
  // additional_data_size is the size in bytes of the corresponding section in
  // common_data or pallet_data.  These sections are in the same order as the
  // field_info, so to find the offset, add up the additional_data_size of any
  // previous fields which are stored in the same block (common_data or
  // pallet_data).
  uint32_t          additional_data_size;
  field_compression storage_type;
  switch (storage_type)
  {
    case field_compression.field_compression_signed_immediate:
    case field_compression.field_compression_bitpacked:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t flags; // known values - 0x01: sign-extend (signed)
                      // <[TOM_RUS]> last one makes no sense - 6/7/18 (for signed_immediate)
      break;
    case field_compression.field_compression_common_data:
      uint32_t default_value;
      uint32_t unk_or_unused2;
      uint32_t unk_or_unused3;
      break;
    case field_compression.field_compression_bitpacked_indexed:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t unk_or_unused3;
      break;
    case field_compression.field_compression_bitpacked_indexed_array:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t array_count;
      break;
    default:
      uint32_t unk_or_unused1;
      uint32_t unk_or_unused2;
      uint32_t unk_or_unused3;
      break;
  }
};
field_storage_info field_info[header.field_storage_info_size / sizeof(field_storage_info)];
char pallet_data[header.pallet_data_size];
char common_data[header.common_data_size];

struct section
{
  if ((header.flags & 1) == 0) {
    // Normal records
    struct record_data
    {
      char data[header.record_size];
    };
    record_data records[section_headers.record_count];
    char string_data[section_headers.string_table_size];
  } else {
    // Offset map records -- these records have null-terminated strings inlined, and
    // since they are variable-length, they are pointed to by an array of 6-byte
    // offset+size pairs.
    char variable_record_data[section_headers.offset_map_offset - section_headers.file_offset];
    struct offset_map_entry
    {
      uint32_t offset;
      uint16_t size;
    };
    offset_map_entry offset_map[header.max_id - header.min_id + 1];
  }
  
  uint32_t id_list[section_headers.id_list_size / 4];
  if (section_headers.copy_table_size > 0) {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[section_headers.copy_table_size / sizeof(copy_table_entry)];
  }
  if (section_headers.relationship_data_size > 0) {
    // In some tables, this relationship mapping replaced columns that were used
    // only as a lookup, such as the SpellID in SpellX* tables.
    struct relationship_entry
    {
      // This is the id of the foreign key for the record, e.g. SpellID in
      // SpellX* tables.
      uint32_t foreign_id;
      // This is the index of the record in record_data.  Note that this is
      // *not* the record's own ID.
      uint32_t record_index;
    };
    struct relationship_mapping
    {
      uint32_t            num_entries;
      uint32_t            min_id;
      uint32_t            max_id;
      relationship_entry  entries[num_entries];
    };
    relationship_mapping relationship_map;
  }
};
section data_sections[section_count];

Known Flag Meanings for WDB4+

flags & 0x01 = 'Has offset map'
flags & 0x02 = 'Has relationship data'
flags & 0x04 = 'Has non-inline IDs'
flags & 0x10 = 'Is bitpacked' // WDC1+