DB2: Difference between revisions

From wowdev
Jump to navigation Jump to search
(→‎WDC2: Updated with newly discovered information about data sections and the new header.)
(19 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{Template:SectionBox/VersionRange|min_expansionlevel=4}}
{{Template:SectionBox/VersionRange|min_expansionlevel=4}}


DB2 files are the new version of client side databases, introduced in Cataclysm, containing data about items, NPCs, environment, world and a lot more. They are similar in many ways to [[DBC|DBC files]], so you may want to also look at those. They both have headers with similar data, records containing fields of various types with various data, and a string block where 4-byte string references are made. The difference is that DB2 files have become wildly more complex than their simple DBC brethren, especially in recent times. The structure described here is also used in [[ADB|ADB files]], which are a cache of dynamically streamed database entries, typically used hotfixes and content that Blizzard wants to hide from dataminers until the last minute.
DB2 files are the new version of client side databases, introduced in Cataclysm, containing data about items, NPCs, environment, world and a lot more. They are similar in many ways to [[DBC|DBC files]], so you may want to also look at those. They both have headers with similar data, records containing fields of various types with various data, and a string block where 4-byte string references are made. The difference is that DB2 files have become wildly more complex than their simple DBC brethren, especially in recent times. The structure described here was also used in [[ADB|ADB files]] for many years, which are a cache of dynamically streamed database entries, typically used hotfixes and content that Blizzard wants to hide from dataminers until the last minute. Recently the file structures of DB2 files and ADB files have diverged greatly.


=Table content structures=
=Table content structures=
This page describes the structure of [[DB2]] and ADB files. For a list of existing DB2 files and their contents see the categories [[:Category:DBC|DBC]], [[:Category:DBC_Vanilla|Vanilla]], [[:Category:DBC_BC|Burning Crusade]], [[:Category:DBC_WotLK|Wrath of the Lich King]], [[:Category:DBC_Cataclysm|Cataclysm]], [[:Category:DBC_MoP|Mists of Pandaria]] and [[:Category:DBC_WoD|Warlords of Draenor]]. If you add documentation for a file, please add the correct categories (also the build number) as well.
This page describes the structure of [[DB2]] files. For a list of existing DB2 files and their contents see the categories [[:Category:DBC|DBC]], [[:Category:DBC_Vanilla|Vanilla]], [[:Category:DBC_BC|Burning Crusade]], [[:Category:DBC_WotLK|Wrath of the Lich King]], [[:Category:DBC_Cataclysm|Cataclysm]], [[:Category:DBC_MoP|Mists of Pandaria]] and [[:Category:DBC_WoD|Warlords of Draenor]]. If you add documentation for a file, please add the correct categories (also the build number) as well.


=Field Types=
=Field Types=
WDB2 / WCH3 began with the following possible field types:
WDB2 began with the following possible field types:
  64-bit Integers*
  64-bit Integers*
  32-bit Integers*
  32-bit Integers*
Line 14: Line 14:
  Strings (strings are represented in the record data as a 32-bit unsigned integer, see the String Block section for more information)
  Strings (strings are represented in the record data as a 32-bit unsigned integer, see the String Block section for more information)


Additionally, WDB3 / WCH4 added the following possible field type:
Additionally, WDB3 added the following possible field type:
  16-bit Integers*
  16-bit Integers*
Most recently, WDB5 added the following possible field type:
24-bit Integers


Note that Blizzard does not differentiate between signed and unsigned field types; WoW code simply casts the data around as it needs. Because of this, some fields will make more sense as signed integers (example: a casting time reduction) and some fields will make more sense as unsigned integers (example: bitfields). You will have to make the determination as to which fields should be which on your own. Personally, I default to Signed 32-bit Integers, Unsigned 16-bit Integers, and Unsigned 8-bit Integers.
Note that Blizzard does not differentiate between signed and unsigned field types; WoW code simply casts the data around as it needs. Because of this, some fields will make more sense as signed integers (example: a casting time reduction) and some fields will make more sense as unsigned integers (example: bitfields). You will have to make the determination as to which fields should be which on your own. Personally, I default to Signed 32-bit Integers, Unsigned 16-bit Integers, and Unsigned 8-bit Integers.
Line 24: Line 21:
As an addendum to the above paragraph, in the binary's DBCMeta for WDB5-ready Legion clients, there is a flag for each field that can designate 'do not sign-extend when decompressing'. However, there are fields which obviously contain unsigned data that are not marked with this flag, so it is not exhaustive.
As an addendum to the above paragraph, in the binary's DBCMeta for WDB5-ready Legion clients, there is a flag for each field that can designate 'do not sign-extend when decompressing'. However, there are fields which obviously contain unsigned data that are not marked with this flag, so it is not exhaustive.


64-bit integers have only made an appearance in a very tiny number of DB2s (usually 1 DB2 at any given time) and only for short periods of time, so any tool supporting DB2 files can likely put 64-bit integer support on the backburner.
Traditionally, 64-bit integers have only made an appearance in a very tiny number of DB2s (usually 1 DB2 at any given time) and only for short periods of time. This meant that 64-bit integer support was not necessary. However, the addition of Allied Races in Legion's Patch 7.3.5 Build 25600 caused this to change. There were no longer enough bits for 'RaceMask', a field used in half a dozen different DB2s, and the field was expanded to be a 64-bit integer.


==Determining Field Types==
==Determining Field Types==
In WDB2/WCH3, you can mostly just assume that every field will be four bytes. Three out of the five possibilities are four bytes, and the 8-bit integers are only used very, very rarely (literally like 3-4 files out of hundreds). The 64-bit Integers are actually only used ONCE so far (in CriteriaTree it was needed to store the number 2,500,000,000, which is 250k gold in copper). Deciding whether or not a four-byte value is a float, an integer, or a string is not terribly difficult (floats will basically always have certain bits set and every value in a string field will be an offset into the string block, which you can check), and this approach gives you ~98% compatibility with the WDB2/WCH3 format with minimal effort.
In WDB2, you can mostly just assume that every field will be four bytes. Three out of the five possibilities are four bytes, and the 8-bit integers are only used very, very rarely (literally like 3-4 files out of hundreds). The 64-bit Integers are actually only used ONCE so far (in CriteriaTree it was needed to store the number 2,500,000,000, which is 250k gold in copper). Deciding whether or not a four-byte value is a float, an integer, or a string is not terribly difficult (floats will basically always have certain bits set and every value in a string field will be an offset into the string block, which you can check), and this approach gives you ~98% compatibility with the WDB2 format with minimal effort.


In WDB3, WDB4, and WCH4+, things become much, much harder. Determining field types on the fly is virtually impossible, and the majority of DBCs (80%+) have at least one field that is not four bytes. The only proper solution is to read the WoW binary executable and parse it for the DBCMeta structure. In that structure you will find the field types for all fields.
In WDB3 and WDB4, things become much, much harder. Determining field types on the fly is virtually impossible, and the majority of DBs (80%+) have at least one field that is not four bytes. The only proper solution is to read the WoW binary executable and parse it for the DBMeta structure. In that structure you will find the field types for all fields.
 
In the newer formats such as WDB5 and WDB6, determining field types on the fly is much more doable. Any field being compressed cannot be a string or a float as they are always 4 bytes, meaning you can just assume they are integers. You can use the 'field structure block' present in WDB5+ in order to determine the size of the fields and the distance between the offsets in the field structure block in order to determine array length for all fields except the last one (the last field will be padded out to 'record_size').


=String Block=
=String Block=
Line 38: Line 37:
The rest of the string block is equivalent to [[DBC]] version. See [[DBC#String_Block|documentation there]].
The rest of the string block is equivalent to [[DBC]] version. See [[DBC#String_Block|documentation there]].


=WDB2 (.db2) / WCH3 (.adb)=
=WDB2=
{{Template:SectionBox/VersionRange|max_expansionlevel=6}}
{{Template:SectionBox/VersionRange|min_expansionlevel=4|max_expansionlevel=7|max_build=7.0.1.20740|max_exclusive=1}}
This file format was introduced in Cataclysm. It was phased out in favor of WDB3/WCH4 in Legion (Patch 7.0.1 build 20740).
This file format was introduced in Cataclysm. It was phased out in favor of WDB3 in Legion (Patch 7.0.1 build 20740).


==Structure==
==Structure==
See [[DB2#WCH3-specific Concerns|WCH3-specific Concerns]] for an issue with the 'field_count' header variable in .ADB files.
See [[ADB#WCH3]] for how to adapt this structure for its .ADB counterpart.
  struct db2_header
  struct db2_header
  {
  {
   uint32_t magic;                                              // 'WDB2' for .db2 (database), 'WCH3' for .adb (cache)
   uint32_t magic;                                              // 'WDB2'
   uint32_t record_count;
   uint32_t record_count;
   uint32_t field_count;                                        // array fields count as the size of array for WDB2 but the entire array only counts as '1' field for WCH3
   uint32_t field_count;                                        // array fields count as the size of array for WDB2
   uint32_t record_size;
   uint32_t record_size;
   uint32_t string_table_size;                                  // string block almost always contains at least one zero-byte
   uint32_t string_table_size;                                  // string block almost always contains at least one zero-byte
Line 64: Line 63:
  {
  {
   db2_header header;
   db2_header header;
  // static_assert (header.record_size == sizeof (record_type));
   
   
   if (header.max_id != 0)
   if (header.max_id != 0)
Line 76: Line 74:
  };
  };


*Note: Each record is aligned to the length of the longest field type in the record. If you have a record with three fields: Int32, Int8, and Int8 - then there will be 2 bytes of padding at the end of every record, including the last one.
*Note: Each record is aligned to the length of the longest field type in the record. If you have a record with three fields: Int32, Int8, and Int8 - then there will be 2 bytes of padding at the end of every record, including the last one. 'record_size' should account for this (it would be '8' in the example given in the previous sentence).


==WCH3-specific Concerns==
=WDB3=
Take for example, the following structure (which I made up):
uint32_t ID;
uint32_t ItemID[8];
uint32_t NPCID[2];
 
The field count in the header of the DB2 file will be 11, which you would expect since a record would contain 11 uint32_t values. However, the field count in the header of the counterpart ADB file will be 3. It counts arrays as one field, regardless of how long that 'field' would have to be. This is very likely a serialization bug in WoW's DB2 or ADB writing, but it has gone unchanged for over half a decade, so you need to plan for it if you rely on field count. If you're reading and parsing DBCMeta directly, you don't really have to ever check or care about field count, though. Blizzard certainly doesn't, which is why this bug has gone unnoticed.*
 
* WDB5 actually addressed this problem many years later. This was because Blizzard actually started using field count in parsing DB2s, so they finally noticed and fixed the bug.
 
=WDB3 (.db2) / WCH4 (.adb)=
{{Template:SectionBox/VersionRange|min_expansionlevel=7|min_build=7.0.1.20740|max_expansionlevel=7|max_build=7.0.1.20810|max_exclusive=1}}
{{Template:SectionBox/VersionRange|min_expansionlevel=7|min_build=7.0.1.20740|max_expansionlevel=7|max_build=7.0.1.20810|max_exclusive=1}}
The file extensions remain unchanged from the previous format. This file format was introduced in Legion (Patch 7.0.1 build 20740). It was phased out in favor of WDB4/WCH5 just a few builds later, also in Legion (Patch 7.0.1 build 20810). As such, support for WDB3 and WCH4 is likely important for almost nobody.
This file format was introduced in Legion (Patch 7.0.1 build 20740). It was phased out in favor of WDB4/WCH5 just a few builds later, also in Legion (Patch 7.0.1 build 20810). As such, support for WDB3 and WCH4 is likely important for almost nobody.


The major changes are the addition of an optional offset map which forces records to have inline strings, an optional block after the string block that contains non-inline IDs, and an optional block after that block which contains the ID numbers of rows that have been deduplicated to save space.
The major changes are the addition of an optional offset map which forces records to have inline strings, an optional block after the string block that contains non-inline IDs, and an optional block after that block which contains the ID numbers of rows that have been deduplicated to save space.


It is worth noting that min_id, max_id, and copy_table_size fields now see different use, despite the header remaining the same. min_id and max_id are always (?) non-zero, even when the offset_map structure is present. Additionally, the copy_table_size field is now non-zero sometimes, requiring action.
It is worth noting that min_id, max_id, and copy_table_size fields now see different use, despite the header remaining the same. min_id and max_id are always non-zero, even when the offset_map structure is present. Additionally, the copy_table_size field is now non-zero sometimes, requiring action.


==Structure for WDB3==
==Structure==
See [[DB2#WCH4-specific Concerns|WCH4-specific Concerns]] for how to adapt this structure for the .ADB counterpart.
See [[ADB#WCH4]] for how to adapt this structure for its .ADB counterpart.
  template<typename record_type>
  template<typename record_type>
  struct wdb3_file
  struct wdb3_file
Line 104: Line 92:
   struct offset_map_entry
   struct offset_map_entry
   {
   {
     uint32_t offset;                                            // this offset is absolute, not relative to another structure; this can (and often will) be zero, in which case you should ignore that entry and move on
     uint32_t offset;                                            // This offset is absolute, not relative to another structure; this can (and often will) be zero, in which case you should ignore that entry and move on
     uint16_t length;                                            // this is the length of the record located at the specified offset
     uint16_t length;                                            // This is the length of the record located at the specified offset
   };
   };
   offset_map_entry offset_map[header.max_id - header.min_id + 1];*
   offset_map_entry offset_map[header.max_id - header.min_id + 1];*
Line 118: Line 106:
       uint32_t id_of_copied_row;
       uint32_t id_of_copied_row;
     };
     };
     copy_table_entry copy_table[header.copy_table_size / copy_table_entry.size()];
     copy_table_entry copy_table[header.copy_table_size / sizeof(copy_table_entry)];
   }
   }
  };
  };
Line 126: Line 114:
*This part of the structure is optional.
*This part of the structure is optional.


==WCH4-specific Concerns==
=WDB4=
In .adb files with the WCH4 signature, the issue with field_count described above in [[DB2#WCH3-specific Concerns|WCH3-specific Concerns]] still exists. Additionally, the optional offset_map structure changes. Specifically, this part defined in the structure above is changed:
struct offset_map_entry
{
  uint32_t ID;
  uint32_t offset;
  uint16_t length;
};
 
Note that, in most cases, the non-inline IDs structure still exists, even if it is made completely redundant by the existence of the offset map. After all, why do work once when you could do it twice?
 
=WDB4 (.db2) / WCH5/WCH6 (.adb)=
{{Template:SectionBox/VersionRange|min_expansionlevel=7|min_build=7.0.1.20810|max_expansionlevel=7|max_build=7.0.3.21414|max_exclusive=1}}
{{Template:SectionBox/VersionRange|min_expansionlevel=7|min_build=7.0.1.20810|max_expansionlevel=7|max_build=7.0.3.21414|max_exclusive=1}}
The file extensions remain unchanged from the previous format. This file format was introduced in Legion (Patch 7.0.1 build 20810) and is still in use today. It was partially phased out in favor of WDB5 just a few months later, also in Legion (Patch 7.0.3 build 21414). As such, support for WDB4 and WCH5/WCH6 is likely important for almost nobody.
This file format was introduced in Legion (Patch 7.0.1 build 20810) and is still in use today. It was partially phased out in favor of WDB5 just a few months later, also in Legion (Patch 7.0.3 build 21414). As such, support for WDB4 is likely important for almost nobody.


The offset_map structure has moved back farther in the file and is now located in between the string table and the non-inline IDs table (this change does not apply to WCH5 - see [[DB2#WCH5-specific Concerns|WCH5-specific Concerns]]). Additionally, while not a format change, the string table is now always 0 bytes if the offset map is present. In those cases, the 'string_table_size' header value changes its meaning to "offset to the start of the offset map". This is a minor optimization since parsing the file must begin with the offset map.
The offset_map structure has moved back farther in the file and is now located in between the string table and the non-inline IDs table. Additionally, while not a format change, the string table is now always 0 bytes if the offset map is present. In those cases, the 'string_table_size' header value changes its meaning to "offset to the start of the offset map". This is a minor optimization since parsing the file must begin with the offset map.


The header has changed from the original format; it gained one new field ('flags'). This flags field allows for easy detection of the optional structures.
The header has changed from the original format; it gained one new field ('flags'). This flags field allows for easy detection of the optional structures.


==Structure for WDB4==
==Structure==
See [[DB2#WCH5-specific Concerns|WCH5-specific Concerns]] for how to adapt this structure for the .ADB counterpart.
See [[ADB#WCH5]] for how to adapt this structure for its .ADB counterpart.
  struct wdb4_db2_header
  struct wdb4_db2_header
  {
  {
   uint32_t magic;                                              // 'WDB4' for .db2 (database), 'WCH5' for .adb (cache)
   uint32_t magic;                                              // 'WDB4'
   uint32_t record_count;
   uint32_t record_count;
   uint32_t field_count;                                        // array fields count as the size of array for WDB4 but the entire array only counts as '1' field for WCH5
   uint32_t field_count;                                        // array fields count as the size of array for WDB4
   uint32_t record_size;
   uint32_t record_size;
   uint32_t string_table_size;                                  // if flags & 0x01 != 0, this field takes on a new meaning - it becomes an absolute offset to the beginning of the offset_map
   uint32_t string_table_size;                                  // if flags & 0x01 != 0, this field takes on a new meaning - it becomes an absolute offset to the beginning of the offset_map
Line 161: Line 138:
   uint32_t locale;                                              // as seen in [[Loc|TextWowEnum]]
   uint32_t locale;                                              // as seen in [[Loc|TextWowEnum]]
   uint32_t copy_table_size;
   uint32_t copy_table_size;
   uint32_t flags;                                              // in WDB3/WCH4, this field was in the WoW executable's DBCMeta; possible values are listed in [[DB2#Known Flag Meanings|Known Flag Meanings]]
   uint32_t flags;                                              // in WDB3, this field was in the WoW executable's DBCMeta instead; possible values are listed in [[DB2#Known Flag Meanings|Known Flag Meanings]]
  };
  };
   
   
Line 190: Line 167:
       uint32_t id_of_copied_row;
       uint32_t id_of_copied_row;
     };
     };
     copy_table_entry copy_table[header.copy_table_size / copy_table_entry.size()];
     copy_table_entry copy_table[header.copy_table_size / sizeof(copy_table_entry)];
   }
   }
  };
  };
Line 196: Line 173:
Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string_table will not exist.
Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string_table will not exist.


==WCH5/WCH6-specific Concerns==
=WDB5=
struct wch5_adb_pre_21737_header
{
  uint32_t magic;                                              // 'WCH5' for .adb (cache)
  uint32_t record_count;
  uint32_t field_count;                                        // array fields count as the size of array for WDB4 but the entire array only counts as '1' field for WCH5
  uint32_t record_size;
  uint32_t string_table_size;                                  // if flags & 0x01 != 0, this field takes on a new meaning - it becomes an absolute offset to the beginning of the offset_map
  uint32_t table_hash;
  uint32_t build;
  uint32_t timestamp_last_written;                              // set to time(0); when writing in WowClientDB2_Base::Save()
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
};
 
The issue with field counts from WCH3 remains in effect ([[DB2#WCH3-specific Concerns|WCH3-specific Concerns]]), and the offset map changes from WCH4 also remain in effect ([[DB2#WCH4-specific Concerns|WCH4-specific Concerns]]). Note that the offset map changes from WCH4 (to include an 'ID' field for each entry) override the offset map changes for WDB4 (to move the offset map after the string table), so the offset map's position remains unchanged compared to WCH4. Additionally, WCH5 files are missing two header fields compared to its WDB4 counterpart (the last two fields - copy_table_size and flags).
 
Since the deduplication technology we call 'copy_table' is not employed in any of the .ADB formats, copy_table_size variable was always zero anyway. As for flags, Blizzard always loads the .DB2 file before loading the .ADB file, so the flags field being in both files is just redundant (for them); any .ADB file will be read using the flags field from the .DB2 file of the same name. This is inconvenient for us, as most (if not all) of us load .ADB files standalone at the moment, but there are workarounds (such as having a configuration file to remember the last used flags value for each named .DB2 file).
 
struct wch5_adb_post_21737_header
{
  uint32_t magic;                                              // 'WCH5' for .adb (cache)
  uint32_t record_count;
  uint32_t field_count;                                        // array fields count as the size of array for WDB4 but the entire array only counts as '1' field for WCH5
  uint32_t record_size;
  uint32_t string_table_size;                                  // if flags & 0x01 != 0, this field takes on a new meaning - it becomes an absolute offset to the beginning of the offset_map
  uint32_t table_hash;
  uint32_t layout_hash;                                        // from build 21737 onward, this field is present
  uint32_t build;
  uint32_t timestamp_last_written;                              // set to time(0); when writing in WowClientDB2_Base::Save()
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
};
 
After Legion build 21737, an additional header field was added (right before 'build') called 'layout_hash'. This field's appearance is similar to the change to the WDB5 header made in the same build, except in that case, the 'build' field was removed when the 'layout_hash' field was added. For the post-21737 WCH5 header, both fields are present.
 
WCH5 was phased out in favor of in Legion (Patch 7.0.3 build 22345). However, literally nothing changed in WCH6. The introduction of this incremented signature was probably a temporary countermeasure to combat a problem Blizzard was having with 'build' not being updated successfully. This would leave ADBs around basically forever (since they are only wiped when current build is greater than the one in the ADBs being loaded), which could cause conflicts if you downloaded an updated client with new DB2s.
 
=WDB5 (.db2)=
{{Template:SectionBox/VersionRange|min_expansionlevel=7|min_build=7.0.3.21479|max_expansionlevel=7|max_build=7.2.0.23436|max_exclusive=1}}
{{Template:SectionBox/VersionRange|min_expansionlevel=7|min_build=7.0.3.21479|max_expansionlevel=7|max_build=7.2.0.23436|max_exclusive=1}}
The file extension remains unchanged from the previous format. This file format was introduced in Legion (Patch 7.0.3 build 21479) and was replaced by WDB6 later in Legion (Patch 7.2.0 build 23436). There have been a variety of ADB formats used simultaneously with WDB5, including WCH5, WCH6, WCH7, and WCH8.
This file format was introduced in Legion (Patch 7.0.3 build 21479) and was replaced by WDB6 later in Legion (Patch 7.2.0 build 23436). There have been a variety of ADB formats used simultaneously with WDB5, including WCH5, WCH6, WCH7, and WCH8.


There are very significant changes to the format in WDB5 which will require substantial effort for existing tools to support. The main changes are that the field_count now counts arrays as '1' (mirroring very old ADB behavior), a field_structure block has been added directly after the header, and the addition of compressed fields (and 24-bit integers). The introduction of the field_structure block, in particular, is hugely positive for us and should improve the accuracy of DB2 parsers or at least reduce their dependence on the WoW binary's DBCMeta. Note that the WoW binary's DBCMeta will often disagree with the 'field_structure' (DBCMeta might say 'int32' but the field_structure block says the size is '3-bytes'). In those cases, the field_structure block takes priority. Additionally, the DBCMeta is still relevant for parsing ADBs, as they do not support compression.
There are very significant changes to the format in WDB5 which will require substantial effort for existing tools to support. The main changes are that the field_count now counts arrays as '1' (mirroring very old ADB behavior), a field_structure block has been added directly after the header, and the addition of compressed fields (and 24-bit integers). The introduction of the field_structure block, in particular, is hugely positive for us and should improve the accuracy of DB2 parsers or at least reduce their dependence on the WoW binary's DBCMeta. Note that the WoW binary's DBCMeta will often disagree with the 'field_structure' (DBCMeta might say 'int32' but the field_structure block says the size is '3-bytes'). In those cases, the field_structure block takes priority. Additionally, the DBCMeta is still relevant for parsing ADBs, as they do not support compression.
Line 244: Line 181:
The header has lost a field compared to WDB4, timestamp_last_written. This field was useless in DB2s and always 0, so its removal is understandable.
The header has lost a field compared to WDB4, timestamp_last_written. This field was useless in DB2s and always 0, so its removal is understandable.


In build 21737, a few builds after the introduction of WDB5, more changes to the header were made. The 'flags' header field was split into two shorts - 'flags' and 'id_index'. 'id_index' is very valuable since the inline ID fields no longer have to be the first field in WDB5; they can appear anywhere in the record. This index lets you know which field is ID, so you can, for example, move it to the beginning of the record for the ease of the viewer. Additionally, the 'build' field was changed to be 'layout_hash'. This hash is unique to the specific column layout, including (at least) position and size (possibly names and array size as well). This replacement is actually very beneficial to us because when layout_hash changes, you know the structure changed.
In build 21737, a few builds after the introduction of WDB5, more changes to the header were made. The 'flags' header field was split into two shorts - 'flags' and 'id_index'. 'id_index' is very valuable since the inline ID fields no longer have to be the first field in WDB5; they can appear anywhere in the record. This index lets you know which field is ID, so you can, for example, move it to the beginning of the record for the ease of the viewer. Additionally, the 'build' field was changed to be 'layout_hash'. This hash is unique to the specific column layout, including (at least) position, size, and column name (we don't have most of these names, but we can tell this is true based on layout_hash changes that shouldn't have happened otherwise). It is worth noting that array size is actually not unique per hash. In some cases, the size of an array can change without the hash changing. This replacement for BuildID is often beneficial to us because when layout_hash changes, you know the structure changed (usually - there have been some instances of recalculations).


==Structure for WDB5==
==Structure==
See [[ADB#WCH5]] for how to adapt this structure for its .ADB counterpart.
  struct wdb5_db2_header
  struct wdb5_db2_header
  {
  {
   uint32_t magic;                                              // 'WDB5' for .db2 (database)
   uint32_t magic;                                              // 'WDB5' for .db2 (database)
   uint32_t record_count;
   uint32_t record_count;
   uint32_t field_count;                                        // for the first time, this counts arrays as '1'; in the past, only the WCH* variants have counted arrays as 1 field
   uint32_t field_count;                                        // for the first time, this counts arrays as '1'; in the past, only the ADB variants have counted arrays as 1 field
   uint32_t record_size;
   uint32_t record_size;
   uint32_t string_table_size;                                  // if flags & 0x01 != 0, this field takes on a new meaning - it becomes an absolute offset to the beginning of the offset_map
   uint32_t string_table_size;                                  // if flags & 0x01 != 0, this field takes on a new meaning - it becomes an absolute offset to the beginning of the offset_map
Line 260: Line 198:
   uint32_t locale;                                              // as seen in [[Loc|TextWowEnum]]
   uint32_t locale;                                              // as seen in [[Loc|TextWowEnum]]
   uint32_t copy_table_size;
   uint32_t copy_table_size;
   uint16_t flags;                                              // in WDB3/WCH4, this field was in the WoW executable's DBCMeta; possible values are listed in [[DB2#Known Flag Meanings|Known Flag Meanings]]
   uint16_t flags;                                              // possible values are listed in [[DB2#Known Flag Meanings|Known Flag Meanings]]
   uint16_t id_index;                                            // new in WDB5 (and only after build 21737), this is the index of the field containing ID values; this is ignored if flags & 0x04 != 0
   uint16_t id_index;                                            // new in WDB5 (and only after build 21737), this is the index of the field containing ID values; this is ignored if flags & 0x04 != 0
  };
  };
Line 296: Line 234:
       uint32_t id_of_copied_row;
       uint32_t id_of_copied_row;
     };
     };
     copy_table_entry copy_table[header.copy_table_size / copy_table_entry.size()];
     copy_table_entry copy_table[header.copy_table_size / sizeof(copy_table_entry)];
   }
   }
  };
  };
Line 302: Line 240:
Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string_table will not exist.
Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string_table will not exist.


=WCH7 (.adb)=
=WDB6=
{{Template:SectionBox/VersionRange|min_expansionlevel=7|min_build=7.0.3.22451}}
{{Template:SectionBox/VersionRange|min_expansionlevel=7|min_build=7.2.0.23436|max_expansionlevel=7|max_build=7.3.5.25600|max_exclusive=1}}
The file extension remains unchanged from the previous format. This file format was introduced in Legion (Patch 7.0.3 build 22451). It was used alongside WDB5.
This file format was introduced in Legion (Patch 7.2.0 build 23436) and was replaced by WDC1 later in Legion (Patch 7.3.5 build 25600).


A new header field has been added between 'record_count' and 'field_count'. This field designates the size of a new table which is located between the string table and the non-inline IDs table. All records are padded to 'record_size'.
Two new header fields and a new data block were added in WDB6. Both of the header fields relate to the new block, which we have named 'common_data_table'. Its purpose is to drastically reduce db2 filesize and memory footprint (affected db2s have had their filesizes halved) by letting the client assume that many columns are always 0, unless there is an entry in the proper 'common_data_table' that is mapped to the ID of the row in question. For example, in build 23436, only 10 columns are in the 'normal' row data section for SpellEffect.db2. However, the 'common_data_table' supports up to 26 columns (designated by the new header field named 'total_field_count'). To find the value for one of the latter 16 columns in SpellEffect.db2, you look up the column in the 'common_data_table' and then the ID in the 'common_data_map' for that column. If there is an entry, use the value paired with the ID. If there is not an entry, the value 'defaults'.


=WCH8 (.adb)=
Default values are stored in the WoW binary, in DBMeta. It's worth noting that almost every field's default value is '0', with a few exceptions. For example, the 'Alpha' byte field in CreatureDisplayInfo defaults to 255, and a couple of floats in SpellEffect.db2 default to 1.
{{Template:SectionBox/VersionRange|min_expansionlevel=7|min_build=7.0.3.22484}}
The file extension remains unchanged from the previous format. This file format was introduced in Legion (Patch 7.0.3 build 22484) and is still in use today. It is used alongside WDB5.


The only difference to WCH7 is the padding of files with an offset map. These are padded as per the WCH5 specification again; presumably as this was unnecessarily changed in WCH7.
Neither strings or arrays are supported in the 'common_data_table'.


=WDB6 (.db2)=
Starting from Patch 7.3.0 Build 24473, values in the 'common_data_table' are always padded out to 4 bytes. Detecting this change is very annoying as there were no other accompanying changes. If you wish to support WDB6 both before and after this build, you will need to attempt to navigate the common_data_table once without padding. Compare the distance you just navigated against the common_data_table_size field from the header. If they match, it is either not padded (pre-7.3) or does not matter (because all of the common data fields are 4 bytes). If it does not match, then it is padded (post-7.3). After determining this, you can then parse the table again with the knowledge that you are doing it correctly.
{{Template:SectionBox/VersionRange|min_expansionlevel=7|min_build=7.2.0.23436}}
The file extension remains unchanged from the previous format. This file format was introduced in Legion (Patch 7.2.0 build 23436) and is still in use today.


Two new header fields and a new data block were added in WDB6. Both of the header fields relate to the new block, which we have named 'common_data_table'. Its purpose is to drastically reduce db2 filesize and memory footprint (affected db2s have had their filesizes halved) by letting the client assume that many columns are always 0, unless there is an entry in the proper 'common_data_table' that is mapped to the ID of the row in question. For example, in build 23436, only 10 columns are in the 'normal' row data section for SpellEffect.db2. However, the 'common_data_table' supports up to 26 columns (designated by the new header field named 'total_field_count'). To find the value for one of the latter 16 columns in SpellEffect.db2, you look up the column in the 'common_data_table' and then the ID in the 'common_data_map' for that column. If there is an entry, use the value paired with the ID. If there is not an entry, the value 'defaults'.
It is worth a minor mention here that from WDB6 onwards, standalone [[ADB#WCH8|ADB files]] were discarded in favor of '[[ADB#DBCache.bin]]', a new format that does not mirror DB2 structure at all.


Default values are stored in the WoW binary, in DBCMeta. It's worth noting that almost every field's default value is '0', with a few exceptions. For example, the 'Alpha' byte field in CreatureDisplayInfo defaults to 255, and a couple of floats in SpellEffect.db2 default to 1.
==Structure==
 
At this time, we assume that neither strings or arrays are supported in the 'common_data_table', as there were opportunities to optimize some rarely used strings and arrays that were purposefully skipped by Blizzard.
 
Note : starting from 7.3, it seems that common data values are always stored in 4 bytes, whatever their real type. This is at least tru for CreatureDisplayInfo table
 
==Structure for WDB6==
  struct wdb6_db2_header
  struct wdb6_db2_header
  {
  {
   uint32_t magic;                                              // 'WDB6' for .db2 (database)
   uint32_t magic;                                              // 'WDB6'
   uint32_t record_count;
   uint32_t record_count;
   uint32_t field_count;                                        // for the first time, this counts arrays as '1'; in the past, only the WCH* variants have counted arrays as 1 field
   uint32_t field_count;                                        // this counts arrays as '1' field
   uint32_t record_size;
   uint32_t record_size;
   uint32_t string_table_size;                                  // if flags & 0x01 != 0, this field takes on a new meaning - it becomes an absolute offset to the beginning of the offset_map
   uint32_t string_table_size;                                  // if flags & 0x01 != 0, this field takes on a new meaning - it becomes an absolute offset to the beginning of the offset_map
   uint32_t table_hash;
   uint32_t table_hash;
   uint32_t layout_hash;                                        // used to be 'build', but after build 21737, this is a new hash field that changes only when the structure of the data changes
   uint32_t layout_hash;                                        // used to be 'build', but now this is a hash field that changes only when the structure of the data changes
   uint32_t min_id;
   uint32_t min_id;
   uint32_t max_id;
   uint32_t max_id;
   uint32_t locale;                                              // as seen in [[Loc|TextWowEnum]]
   uint32_t locale;                                              // as seen in [[Loc|TextWowEnum]]
   uint32_t copy_table_size;
   uint32_t copy_table_size;
   uint16_t flags;                                              // in WDB3/WCH4, this field was in the WoW executable's DBCMeta; possible values are listed in [[DB2#Known Flag Meanings|Known Flag Meanings]]
   uint16_t flags;                                              // possible values are listed in [[DB2#Known Flag Meanings|Known Flag Meanings]]
   uint16_t id_index;                                            // new in WDB5 (and only after build 21737), this is the index of the field containing ID values; this is ignored if flags & 0x04 != 0
   uint16_t id_index;                                            // this is the index of the field containing ID values; this is ignored if flags & 0x04 != 0
   uint32_t total_field_count;                                  // new in WDB6, includes columns only expressed in the 'common_data_table', unlike field_count
   uint32_t total_field_count;                                  // new in WDB6, includes columns only expressed in the 'common_data_table', unlike field_count
   uint32_t common_data_table_size;                              // new in WDB6, size of new block called 'common_data_table'
   uint32_t common_data_table_size;                              // new in WDB6, size of new block called 'common_data_table'
Line 378: Line 306:
       uint32_t id_of_copied_row;
       uint32_t id_of_copied_row;
     };
     };
     copy_table_entry copy_table[header.copy_table_size / copy_table_entry.size()];
     copy_table_entry copy_table[header.copy_table_size / sizeof(copy_table_entry)];
   }
   }
   if (header.common_data_table_size > 0)
   if (header.common_data_table_size > 0)
Line 387: Line 315:
       uint32_t id;
       uint32_t id;
       uint32_t value;                                            // Calling this 'uint32_t' is an oversimplification - the size of this field depends on the 'type' from the enum  
       uint32_t value;                                            // Calling this 'uint32_t' is an oversimplification - the size of this field depends on the 'type' from the enum  
                                                                   // (seems not true anymore for 7.3+, values are always stored in 4 bytes - at least in CreatureDisplayInfo table)
                                                                   // (From Patch 7.3.0 Build 24473 onwards, this is no longer true. Values are always padded out to 4 bytes.)
     };
     };
     struct common_data_table_entry
     struct common_data_table_entry
Line 401: Line 329:
Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string_table will not exist.
Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string_table will not exist.


=Known Flag Meanings for WDB4 / WDB5 / WDB6=
=WDC1=
{{Template:SectionBox/VersionRange|min_expansionlevel=7|min_build=7.3.5.25600|max_expansionlevel=8|max_build=8.0.1.26231}}
This file format was introduced in Legion (Patch 7.3.5 build 25600) and was replaced by WDC2 in Battle for Azeroth (Patch 8.0.1 Build 26231).
 
WDC1 is an expanded version of WDB6. If you've been following along with the format changes in Legion thus far, you will notice that the theme has been saving space. Most recently, in WDB6, a 'common_data_block' was added in order to handle columns that are usually always one value but rarely are some other value.
 
WDC1 takes this idea and adds in even more types of space-saving techniques. It adds in bitpacking (for fields who would be best suited to a field size that is not a multiple of 8), indexed values (for fields with values that are commonly always one of a small number of values but who do not have a common 'default' amongst those values), and indexed array values (for array fields where all fields in an array are commonly linked together, for example, with common spell flag values). It also keeps the 'common_data_block' technique from WDB6, but it streamlines the implementation of it, removing the need for a separate header block for 'common_data'. This is done by adding in a new block we call 'field_storage_info'.
 
'field_storage_info' contains the size in bits and the offset in bits of every column, along with the size of the column's data in the different data blocks (currently the data blocks are 'pallet_data' and 'common_data'). The structure below explains most of this. Compression type '0' is normal in-field data. Compression types '1', '3', and '4' begin as bitpacked data in the record. For compression type '1', you can stop there, as that is the proper value. For compression types '3' and '4', you need to take the bitpacked value you obtained from the record and use it as an index into 'pallet_data'. For compression type '3', you pull a 4-byte value from 'pallet_data' using the formula 'additional_data_offset + (index * 4)', where 'additional_data_offset' is the sum of 'additional_data_size' for every column before the current one. Compression type '4' is very similar, except that it is always an array. Array values are bundled together for compression type '4' - that single index will give you 4-byte values for every single piece of the array at once. You can retrieve these values by using the formula 'additional_data_offset + (index * 4 * array_count) + (iter * 4)', where 'iter' is the 0-based array piece you wish to retrieve (you should iterate over all of them to retrieve all of the values for the array). Lastly, compression type '2' is just WDB6's old 'common_data_block' system; it is the only compression type where no data at all is read from the record. Just apply the 'default_value' (also provided by 'field_storage_info') when the ID for the record is not present in 'common_data'.
 
The last major quirk of WDC1 is the 'relationship_map' system. In WDC1, any DB2 that had a primary key which was also a foreign key (in Blizzard's database system, I guess) was removed from the record and placed into this new structure. This is probably for increased lookup speed by the game executable. So, for example, almost all Spell*.db2 tables (besides Spell.db2 because it wouldn't be a foreign key there) had their SpellID columns removed and this new block added in their place. The easiest way to handle this block is just to add a new 'fake' column to the end of the DB2 in question and populate it with the values from the relationship map.
 
==Structure==
struct wdc1_db2_header
{
  uint32_t magic;                  // 'WDC1'
  uint32_t record_count;
  uint32_t field_count;
  uint32_t record_size;
  uint32_t string_table_size;
  uint32_t table_hash;            // hash of the table name
  uint32_t layout_hash;            // this is a hash field that changes only when the structure of the data changes
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                // as seen in TextWowEnum
  uint32_t copy_table_size;
  uint16_t flags;                  // possible values are listed in Known Flag Meanings
  uint16_t id_index;              // this is the index of the field containing ID values; this is ignored if flags & 0x04 != 0
  uint32_t total_field_count;      // from WDC1 onwards, this value seems to always be the same as the 'field_count' value
  uint32_t bitpacked_data_offset;  // relative position in record where bitpacked data begins; not important for parsing the file
  uint32_t lookup_column_count;
  uint32_t offset_map_offset;      // Offset to array of struct {uint32_t offset; uint16_t size;}[max_id - min_id + 1];
  uint32_t id_list_size;          // List of ids present in the DB file
  uint32_t field_storage_info_size;
  uint32_t common_data_size;
  uint32_t pallet_data_size;
  uint32_t relationship_data_size;
};
struct field_structure
{
  uint16_t size;
  uint16_t offset;
};
wdc1_db2_header header;
field_structure fields[header.total_field_count];
if ((header.flags & 1) == 0) {
  // Normal records
  struct record_data
  {
    char data[header.record_size];
  };
  record_data records[header.record_count];
  char string_data[header.string_table_size];
} else {
  // Offset map records -- these records have null-terminated strings inlined, and
  // since they are variable-length, they are pointed to by an array of 6-byte
  // offset+size pairs.
  char variable_record_data[header.offset_map_offset - sizeof(wdc1_db2_header) - (sizeof(field_structure) * header.total_field_count)];
  struct offset_map_entry
  {
    uint32_t offset;
    uint16_t size;
  };
  offset_map_entry offset_map[header.max_id - header.min_id + 1];
}
enum field_compression
{
  // None -- the field is a 8-, 16-, 32-, or 64-bit integer in the record data
  field_compression_none,
  // Bitpacked -- the field is a bitpacked integer in the record data.  It
  // is field_size_bits long and starts at field_offset_bits.
  // A bitpacked value occupies
  //  (field_size_bits + (field_offset_bits & 7) + 7) / 8
  // bytes starting at byte
  //  field_offset_bits / 8
  // in the record data.  These bytes should be read as a little-endian value,
  // then the value is shifted to the right by (field_offset_bits & 7) and
  // masked with ((1ull << field_size_bits) - 1).
  field_compression_bitpacked,
  // Common data -- the field is assumed to be a default value, and exceptions
  // from that default value are stored in the corresponding section in
  // common_data as pairs of { uint32_t record_id; uint32_t value; }.
  field_compression_common_data,
  // Bitpacked indexed -- the field has a bitpacked index in the record data.
  // This index is used as an index into the corresponding section in
  // pallet_data.  The pallet_data section is an array of uint32_t, so the index
  // should be multiplied by 4 to obtain a byte offset.
  field_compression_bitpacked_indexed,
  // Bitpacked indexed array -- the field has a bitpacked index in the record
  // data.  This index is used as an index into the corresponding section in
  // pallet_data.  The pallet_data section is an array of uint32_t[array_count],
  //
  field_compression_bitpacked_indexed_array,
};
struct field_storage_info
{
  uint16_t          field_offset_bits;
  uint16_t          field_size_bits; // very important for reading bitpacked fields; size is the sum of all array pieces in bits - for example, uint32[3] will appear here as '96'
  // additional_data_size is the size in bytes of the corresponding section in
  // common_data or pallet_data.  These sections are in the same order as the
  // field_info, so to find the offset, add up the additional_data_size of any
  // previous fields which are stored in the same block (common_data or
  // pallet_data).
  uint32_t          additional_data_size;
  field_compression storage_type;
  switch (storage_type)
  {
    case field_compression.field_compression_bitpacked:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t flags; // known values - 0x01: sign-extend (signed)
      break;
    case field_compression.field_compression_common_data:
      uint32_t default_value;
      uint32_t unk_or_unused2;
      uint32_t unk_or_unused3;
      break;
    case field_compression.field_compression_bitpacked_indexed:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t unk_or_unused3;
      break;
    case field_compression.field_compression_bitpacked_indexed_array:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t array_count;
      break;
    default:
      uint32_t unk_or_unused1;
      uint32_t unk_or_unused2;
      uint32_t unk_or_unused3;
      break;
  }
};
uint32_t id_list[header.id_list_size / 4];
if (header.copy_table_size > 0) {
  struct copy_table_entry
  {
    uint32_t id_of_new_row;
    uint32_t id_of_copied_row;
  };
  copy_table_entry copy_table[header.copy_table_size / sizeof(copy_table_entry)];
}
field_storage_info field_info[header.field_storage_info_size / sizeof(field_storage_info)];
char pallet_data[header.pallet_data_size];
char common_data[header.common_data_size];
if (header.relationship_data_size > 0) {
  // In some tables, this relationship mapping replaced columns that were used
  // only as a lookup, such as the SpellID in SpellX* tables.
  struct relationship_entry
  {
    // This is the id of the foreign key for the record, e.g. SpellID in
    // SpellX* tables.
    uint32_t foreign_id;
    // This is the index of the record in record_data.  Note that this is
    // *not* the record's own ID.
    uint32_t record_index;
  };
  struct relationship_mapping
  {
    uint32_t            num_entries;
    uint32_t            min_id;
    uint32_t            max_id;
    relationship_entry  entries[num_entries];
  };
  relationship_mapping relationship_map;
}
 
==Further Quirks==
The 'field structure' section introduced in WDB5 still exists here, but 'field storage info' has all of the same information and more. You may wonder - why should I even bother reading field structure now if I only care about WDC1 support? There is one good reason - array sizes. In WDC1, the field structure section reports the size of the individual field piece in the file (for compression type '0') in bytes. However, the field storage info section reports the size of the entire field in bits. The difference in these values is important if an array is the field in question. For example, in field structure, an array of uint32[3] would be reported as size '4', while in field storage info, it would be reported as '96'. This technicality greatly benefits us as a trivial bit of math allows you to instantly deduce array sizes when reading the file: 'ArrayLength = (SizeReportedByFieldStorageInfo / 8) / SizeReportedByFieldStructure'.
 
There are several fields in the structure with no apparent purpose, but I believe these are all related. 'bitpacked_data_offset' in the header and 'bitpacking_offset_bits' and 'bitpacking_size_bits' from some variants of 'field_storage_info' (the bitpacked ones) are the fields I am referring to. I believe these three values are used to read from the record starting at the position of the bitpacked values. Our best guess is that Blizzard needs this for some purpose, possibly to integrate with their existing tools.
 
On the note of relationship maps, the aspect that makes them tricky (the parsing is actually very simple) is that sometimes the column in question remains in DB2 data in addition to being in a relationship map. For example, Achievement.db2's CriteriaTreeID field was not removed, but it had a relationship map added anyway. Cases like this are difficult to handle properly. You could just allow the duplicate data to be added as a fake column to the end, as there may not be any harm, especially if you are expecting it. Alternatively, you could parse the game executable's DBCMeta - the fields removed by relationship maps are still present there. If there is a mismatch between the number of fields reported by DBCMeta and the number of fields reported by the DB2, then you know that the relationship map data is unique and needs to be restored. If the number of fields match, then the data in the relationship map already exists in the record and is superfluous.
 
=WDC2=
{{Template:SectionBox/VersionRange|min_expansionlevel=8|min_build=8.0.1.26231}}
This file format was introduced in Battle for Azeroth (Patch 8.0.1 Build 26231) and is currently in use today.
 
WDC2 introduces a new system of 'sections' of data. Three chunks of the structure ('field_storage_info', 'pallet_data', and 'common_data') were moved up in front of the records / 'sections'. Everything after that was broken into a new form of 'sections' which can potentially occur many times in one file. Note that implementing this system right now appears to be entirely optional as no DB2 file as of the current build has a 'section_count' higher than 1. Also, it appears that the 'offset_map' format (in use for a small handful of DB2s) has a hardcoded limit of 1 section.
 
The other major change is to strings and how string offsets are calculated. For the history of WoW up until WDC2, strings in records were represented as 'the relative position of the referenced string to the start of the string block'. In WDC2, strings in records are represented as 'the relative position from the beginning of the field where this offset was stored to the position of the referenced string in the string block'. This complicates parsing the string block directly and storing relative locations, but it does slightly simplify reading records.
 
If before WDC2 you dealt with strings by reading the string block and assigning relative positions to every string, then you will need to either change your approach (to having a constantly-accessible 1:1 representation of the string table available) or mitigate the damage from the change. One method of mitigation is: when reading the values of the string offsets in the records, replace the data at that point in time with 'old-style' string offsets.
 
For example, using code like this when reading the field in question works:
row[field].LongVal = reader.ReadVarInt(Fields[field].Size);
if (IsWDC2() && row[field].Column.Type == ColumnType.String)
{
    row[field].LongVal = row[field].LongVal + reader.BaseStream.Position - Fields[field].Size - reader.StringBlockLocation;
}
This sample code reads the field containing the string offset in the record, then recalculates the value it just read by adding in the current position and subtracting the size of the field it just read (to get the absolute position of the referenced string) and then by subtracting the absolute position of the start of the string block (which yields the relative position of the referenced string to the start of the string block). This manipulation ensures that a program converts every string offset back to the pre-WDC2 style.
 
==Structure==
struct wdc2_db2_header
{
  uint32_t magic;                  // 'WDC2'
  uint32_t record_count;          // this is for all sections combined now
  uint32_t field_count;
  uint32_t record_size;
  uint32_t string_table_size;      // this is for all sections combined now
  uint32_t table_hash;            // hash of the table name
  uint32_t layout_hash;            // this is a hash field that changes only when the structure of the data changes
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                // as seen in TextWowEnum
  uint16_t flags;                  // possible values are listed in Known Flag Meanings
  uint16_t id_index;              // this is the index of the field containing ID values; this is ignored if flags & 0x04 != 0
  uint32_t total_field_count;      // from WDC1 onwards, this value seems to always be the same as the 'field_count' value
  uint32_t bitpacked_data_offset;  // relative position in record where bitpacked data begins; not important for parsing the file
  uint32_t lookup_column_count;
  uint32_t field_storage_info_size;
  uint32_t common_data_size;
  uint32_t pallet_data_size;
  uint32_t section_count;          // new to WDC2, this is number of sections of data (records + copy table + id list + relationship map = a section)
};
wdc2_db2_header header;
struct wdc2_section_header
{
  uint32_t wdc2_unk_header1;      // always 0 in {{Template:Sandbox/PrettyVersion|expansionlevel=8|build=8.0.1.26231}} and unnamed in client binary
  uint32_t wdc2_unk_header2;      // always 0 in {{Template:Sandbox/PrettyVersion|expansionlevel=8|build=8.0.1.26231}} and unnamed in client binary
  uint32_t file_offset;            // absolute position to the beginning of the section
  uint32_t record_count;          // 'record_count' for the section
  uint32_t string_table_size;      // 'string_table_size' for the section
  uint32_t copy_table_size;
  uint32_t offset_map_offset;      // Offset to array of struct {uint32_t offset; uint16_t size;}[max_id - min_id + 1];
  uint32_t id_list_size;          // List of ids present in the DB file
  uint32_t relationship_data_size;
};
wdc2_section_header section_headers[section_count];
struct field_structure
{
  uint16_t size;
  uint16_t offset;
};
field_structure fields[header.total_field_count];
enum field_compression
{
  // None -- the field is a 8-, 16-, 32-, or 64-bit integer in the record data
  field_compression_none,
  // Bitpacked -- the field is a bitpacked integer in the record data.  It
  // is field_size_bits long and starts at field_offset_bits.
  // A bitpacked value occupies
  //  (field_size_bits + (field_offset_bits & 7) + 7) / 8
  // bytes starting at byte
  //  field_offset_bits / 8
  // in the record data.  These bytes should be read as a little-endian value,
  // then the value is shifted to the right by (field_offset_bits & 7) and
  // masked with ((1ull << field_size_bits) - 1).
  field_compression_bitpacked,
  // Common data -- the field is assumed to be a default value, and exceptions
  // from that default value are stored in the corresponding section in
  // common_data as pairs of { uint32_t record_id; uint32_t value; }.
  field_compression_common_data,
  // Bitpacked indexed -- the field has a bitpacked index in the record data.
  // This index is used as an index into the corresponding section in
  // pallet_data.  The pallet_data section is an array of uint32_t, so the index
  // should be multiplied by 4 to obtain a byte offset.
  field_compression_bitpacked_indexed,
  // Bitpacked indexed array -- the field has a bitpacked index in the record
  // data.  This index is used as an index into the corresponding section in
  // pallet_data.  The pallet_data section is an array of uint32_t[array_count],
  //
  field_compression_bitpacked_indexed_array,
};
struct field_storage_info
{
  uint16_t          field_offset_bits;
  uint16_t          field_size_bits; // very important for reading bitpacked fields; size is the sum of all array pieces in bits - for example, uint32[3] will appear here as '96'
  // additional_data_size is the size in bytes of the corresponding section in
  // common_data or pallet_data.  These sections are in the same order as the
  // field_info, so to find the offset, add up the additional_data_size of any
  // previous fields which are stored in the same block (common_data or
  // pallet_data).
  uint32_t          additional_data_size;
  field_compression storage_type;
  switch (storage_type)
  {
    case field_compression.field_compression_bitpacked:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t flags; // known values - 0x01: sign-extend (signed)
      break;
    case field_compression.field_compression_common_data:
      uint32_t default_value;
      uint32_t unk_or_unused2;
      uint32_t unk_or_unused3;
      break;
    case field_compression.field_compression_bitpacked_indexed:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t unk_or_unused3;
      break;
    case field_compression.field_compression_bitpacked_indexed_array:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t array_count;
      break;
    default:
      uint32_t unk_or_unused1;
      uint32_t unk_or_unused2;
      uint32_t unk_or_unused3;
      break;
  }
};
field_storage_info field_info[header.field_storage_info_size / sizeof(field_storage_info)];
char pallet_data[header.pallet_data_size];
char common_data[header.common_data_size];
struct section
{
  if ((header.flags & 1) == 0) {
    // Normal records
    struct record_data
    {
      char data[section_headers.record_size];
    };
    record_data records[section_headers.record_count];
    char string_data[section_headers.string_table_size];
  } else {
    // Offset map records -- these records have null-terminated strings inlined, and
    // since they are variable-length, they are pointed to by an array of 6-byte
    // offset+size pairs.
    char variable_record_data[section_headers.offset_map_offset - section_headers.file_offset];
    struct offset_map_entry
    {
      uint32_t offset;
      uint16_t size;
    };
    offset_map_entry offset_map[header.max_id - header.min_id + 1];
  }
 
  uint32_t id_list[section_headers.id_list_size / 4];
  if (section_headers.copy_table_size > 0) {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[section_headers.copy_table_size / sizeof(copy_table_entry)];
  }
  if (section_headers.relationship_data_size > 0) {
    // In some tables, this relationship mapping replaced columns that were used
    // only as a lookup, such as the SpellID in SpellX* tables.
    struct relationship_entry
    {
      // This is the id of the foreign key for the record, e.g. SpellID in
      // SpellX* tables.
      uint32_t foreign_id;
      // This is the index of the record in record_data.  Note that this is
      // *not* the record's own ID.
      uint32_t record_index;
    };
    struct relationship_mapping
    {
      uint32_t            num_entries;
      uint32_t            min_id;
      uint32_t            max_id;
      relationship_entry  entries[num_entries];
    };
    relationship_mapping relationship_map;
  }
};
section data_sections[section_count];
 
=Known Flag Meanings for WDB4+=
  flags & 0x01 = 'Has offset map'
  flags & 0x01 = 'Has offset map'
  flags & 0x02 = 'Has secondary key'
  flags & 0x02 = 'Has secondary key'
  flags & 0x04 = 'Has non-inline IDs'
  flags & 0x04 = 'Has non-inline IDs'
flags & 0x10 = ? // WDC1 specific. used on all WDC1 files


[[Category:Format]]
[[Category:Format]]

Revision as of 20:54, 16 March 2018

This section only applies to versions ≥ Cata.

DB2 files are the new version of client side databases, introduced in Cataclysm, containing data about items, NPCs, environment, world and a lot more. They are similar in many ways to DBC files, so you may want to also look at those. They both have headers with similar data, records containing fields of various types with various data, and a string block where 4-byte string references are made. The difference is that DB2 files have become wildly more complex than their simple DBC brethren, especially in recent times. The structure described here was also used in ADB files for many years, which are a cache of dynamically streamed database entries, typically used hotfixes and content that Blizzard wants to hide from dataminers until the last minute. Recently the file structures of DB2 files and ADB files have diverged greatly.

Table content structures

This page describes the structure of DB2 files. For a list of existing DB2 files and their contents see the categories DBC, Vanilla, Burning Crusade, Wrath of the Lich King, Cataclysm, Mists of Pandaria and Warlords of Draenor. If you add documentation for a file, please add the correct categories (also the build number) as well.

Field Types

WDB2 began with the following possible field types:

64-bit Integers*
32-bit Integers*
8-bit Integers*
Floats
Strings (strings are represented in the record data as a 32-bit unsigned integer, see the String Block section for more information)

Additionally, WDB3 added the following possible field type:

16-bit Integers*

Note that Blizzard does not differentiate between signed and unsigned field types; WoW code simply casts the data around as it needs. Because of this, some fields will make more sense as signed integers (example: a casting time reduction) and some fields will make more sense as unsigned integers (example: bitfields). You will have to make the determination as to which fields should be which on your own. Personally, I default to Signed 32-bit Integers, Unsigned 16-bit Integers, and Unsigned 8-bit Integers.

As an addendum to the above paragraph, in the binary's DBCMeta for WDB5-ready Legion clients, there is a flag for each field that can designate 'do not sign-extend when decompressing'. However, there are fields which obviously contain unsigned data that are not marked with this flag, so it is not exhaustive.

Traditionally, 64-bit integers have only made an appearance in a very tiny number of DB2s (usually 1 DB2 at any given time) and only for short periods of time. This meant that 64-bit integer support was not necessary. However, the addition of Allied Races in Legion's Patch 7.3.5 Build 25600 caused this to change. There were no longer enough bits for 'RaceMask', a field used in half a dozen different DB2s, and the field was expanded to be a 64-bit integer.

Determining Field Types

In WDB2, you can mostly just assume that every field will be four bytes. Three out of the five possibilities are four bytes, and the 8-bit integers are only used very, very rarely (literally like 3-4 files out of hundreds). The 64-bit Integers are actually only used ONCE so far (in CriteriaTree it was needed to store the number 2,500,000,000, which is 250k gold in copper). Deciding whether or not a four-byte value is a float, an integer, or a string is not terribly difficult (floats will basically always have certain bits set and every value in a string field will be an offset into the string block, which you can check), and this approach gives you ~98% compatibility with the WDB2 format with minimal effort.

In WDB3 and WDB4, things become much, much harder. Determining field types on the fly is virtually impossible, and the majority of DBs (80%+) have at least one field that is not four bytes. The only proper solution is to read the WoW binary executable and parse it for the DBMeta structure. In that structure you will find the field types for all fields.

In the newer formats such as WDB5 and WDB6, determining field types on the fly is much more doable. Any field being compressed cannot be a string or a float as they are always 4 bytes, meaning you can just assume they are integers. You can use the 'field structure block' present in WDB5+ in order to determine the size of the fields and the distance between the offsets in the field structure block in order to determine array length for all fields except the last one (the last field will be padded out to 'record_size').

String Block

DB2 records can contain localized strings. In contrast to DBCs, a DB2 file only contains localized values for a given locale (header.locale).

Since Cataclysm, all DB files contain only localized string values, including DBCs.

The rest of the string block is equivalent to DBC version. See documentation there.

WDB2

This section only applies to versions Cata … < Legion (7.0.1.20740).

This file format was introduced in Cataclysm. It was phased out in favor of WDB3 in Legion (Patch 7.0.1 build 20740).

Structure

See ADB#WCH3 for how to adapt this structure for its .ADB counterpart.

struct db2_header
{
  uint32_t magic;                                               // 'WDB2'
  uint32_t record_count;
  uint32_t field_count;                                         // array fields count as the size of array for WDB2
  uint32_t record_size;
  uint32_t string_table_size;                                   // string block almost always contains at least one zero-byte
  uint32_t table_hash;
  uint32_t build;
  uint32_t timestamp_last_written;                              // set to time(0); when writing in WowClientDB2_Base::Save()
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
  uint32_t copy_table_size;                                     // always zero in WDB2 (?) - see WDB3 for information on how to parse this
};

template<typename record_type>
struct db2_file
{
  db2_header header;

  if (header.max_id != 0)
  {
    int indices[header.max_id - header.min_id + 1];             // maps from id to row index in records[] below
    short string_lengths[header.max_id - header.min_id + 1];    // sum of lengths of all strings in row
  }
 
  record_type records[header.record_count];*
  char string_table[header.string_table_size];
};
  • Note: Each record is aligned to the length of the longest field type in the record. If you have a record with three fields: Int32, Int8, and Int8 - then there will be 2 bytes of padding at the end of every record, including the last one. 'record_size' should account for this (it would be '8' in the example given in the previous sentence).

WDB3

This section only applies to versions Legion (7.0.1.20740) … < Legion (7.0.1.20810).

This file format was introduced in Legion (Patch 7.0.1 build 20740). It was phased out in favor of WDB4/WCH5 just a few builds later, also in Legion (Patch 7.0.1 build 20810). As such, support for WDB3 and WCH4 is likely important for almost nobody.

The major changes are the addition of an optional offset map which forces records to have inline strings, an optional block after the string block that contains non-inline IDs, and an optional block after that block which contains the ID numbers of rows that have been deduplicated to save space.

It is worth noting that min_id, max_id, and copy_table_size fields now see different use, despite the header remaining the same. min_id and max_id are always non-zero, even when the offset_map structure is present. Additionally, the copy_table_size field is now non-zero sometimes, requiring action.

Structure

See ADB#WCH4 for how to adapt this structure for its .ADB counterpart.

template<typename record_type>
struct wdb3_file
{
  db2_header header;
  struct offset_map_entry
  {
    uint32_t offset;                                             // This offset is absolute, not relative to another structure; this can (and often will) be zero, in which case you should ignore that entry and move on
    uint16_t length;                                             // This is the length of the record located at the specified offset
  };
  offset_map_entry offset_map[header.max_id - header.min_id + 1];*
  record_type records[header.record_count]; 
  char string_table[header.string_table_size];
  uint32_t IDs[header.record_count];*
  if (header.copy_table_size > 0)
  {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[header.copy_table_size / sizeof(copy_table_entry)];
  }
};

Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string block usually still exists, just as size 2 with two blank entries.

  • This part of the structure is optional.

WDB4

This section only applies to versions Legion (7.0.1.20810) … < Legion (7.0.3.21414).

This file format was introduced in Legion (Patch 7.0.1 build 20810) and is still in use today. It was partially phased out in favor of WDB5 just a few months later, also in Legion (Patch 7.0.3 build 21414). As such, support for WDB4 is likely important for almost nobody.

The offset_map structure has moved back farther in the file and is now located in between the string table and the non-inline IDs table. Additionally, while not a format change, the string table is now always 0 bytes if the offset map is present. In those cases, the 'string_table_size' header value changes its meaning to "offset to the start of the offset map". This is a minor optimization since parsing the file must begin with the offset map.

The header has changed from the original format; it gained one new field ('flags'). This flags field allows for easy detection of the optional structures.

Structure

See ADB#WCH5 for how to adapt this structure for its .ADB counterpart.

struct wdb4_db2_header
{
  uint32_t magic;                                               // 'WDB4'
  uint32_t record_count;
  uint32_t field_count;                                         // array fields count as the size of array for WDB4
  uint32_t record_size;
  uint32_t string_table_size;                                   // if flags & 0x01 != 0, this field takes on a new meaning - it becomes an absolute offset to the beginning of the offset_map
  uint32_t table_hash;
  uint32_t build;
  uint32_t timestamp_last_written;                              // set to time(0); when writing in WowClientDB2_Base::Save()
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
  uint32_t copy_table_size;
  uint32_t flags;                                               // in WDB3, this field was in the WoW executable's DBCMeta instead; possible values are listed in Known Flag Meanings
};

template<typename record_type>
struct wdb4_file
{
  wdb4_db2_header header;
  record_type records[header.record_count]; 
  char string_table[header.string_table_size];
  if (flags & 0x01 != 0)
  {
    struct offset_map_entry
    {
      uint32_t offset;                                           // this offset is absolute, not relative to another structure; this can (and often will) be zero, in which case you should ignore that entry and move on
      uint16_t length;                                           // this is the length of the record located at the specified offset
    };
    offset_map_entry offset_map[header.max_id - header.min_id + 1];
  }
  if (flags & 0x04 != 0)
  {
    uint32_t IDs[header.record_count];
  }
  if (header.copy_table_size > 0)
  {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[header.copy_table_size / sizeof(copy_table_entry)];
  }
};

Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string_table will not exist.

WDB5

This section only applies to versions Legion (7.0.3.21479) … < Legion (7.2.0.23436).

This file format was introduced in Legion (Patch 7.0.3 build 21479) and was replaced by WDB6 later in Legion (Patch 7.2.0 build 23436). There have been a variety of ADB formats used simultaneously with WDB5, including WCH5, WCH6, WCH7, and WCH8.

There are very significant changes to the format in WDB5 which will require substantial effort for existing tools to support. The main changes are that the field_count now counts arrays as '1' (mirroring very old ADB behavior), a field_structure block has been added directly after the header, and the addition of compressed fields (and 24-bit integers). The introduction of the field_structure block, in particular, is hugely positive for us and should improve the accuracy of DB2 parsers or at least reduce their dependence on the WoW binary's DBCMeta. Note that the WoW binary's DBCMeta will often disagree with the 'field_structure' (DBCMeta might say 'int32' but the field_structure block says the size is '3-bytes'). In those cases, the field_structure block takes priority. Additionally, the DBCMeta is still relevant for parsing ADBs, as they do not support compression.

The header has lost a field compared to WDB4, timestamp_last_written. This field was useless in DB2s and always 0, so its removal is understandable.

In build 21737, a few builds after the introduction of WDB5, more changes to the header were made. The 'flags' header field was split into two shorts - 'flags' and 'id_index'. 'id_index' is very valuable since the inline ID fields no longer have to be the first field in WDB5; they can appear anywhere in the record. This index lets you know which field is ID, so you can, for example, move it to the beginning of the record for the ease of the viewer. Additionally, the 'build' field was changed to be 'layout_hash'. This hash is unique to the specific column layout, including (at least) position, size, and column name (we don't have most of these names, but we can tell this is true based on layout_hash changes that shouldn't have happened otherwise). It is worth noting that array size is actually not unique per hash. In some cases, the size of an array can change without the hash changing. This replacement for BuildID is often beneficial to us because when layout_hash changes, you know the structure changed (usually - there have been some instances of recalculations).

Structure

See ADB#WCH5 for how to adapt this structure for its .ADB counterpart.
struct wdb5_db2_header
{
  uint32_t magic;                                               // 'WDB5' for .db2 (database)
  uint32_t record_count;
  uint32_t field_count;                                         // for the first time, this counts arrays as '1'; in the past, only the ADB variants have counted arrays as 1 field
  uint32_t record_size;
  uint32_t string_table_size;                                   // if flags & 0x01 != 0, this field takes on a new meaning - it becomes an absolute offset to the beginning of the offset_map
  uint32_t table_hash;
  uint32_t layout_hash;                                         // used to be 'build', but after build 21737, this is a new hash field that changes only when the structure of the data changes
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
  uint32_t copy_table_size;
  uint16_t flags;                                               // possible values are listed in Known Flag Meanings
  uint16_t id_index;                                            // new in WDB5 (and only after build 21737), this is the index of the field containing ID values; this is ignored if flags & 0x04 != 0
};

template<typename record_type>
struct wdb5_file
{
  wdb5_db2_header header;
  struct field_structure
  {
    int16_t size;                                               // size in bits as calculated by: byteSize = (32 - size) / 8; this value can be negative to indicate field sizes larger than 32-bits
    uint16_t position;                                          // position of the field within the record, relative to the start of the record
  };
  field_structure fields[header.field_count];
  record_type records[header.record_count];
  char string_table[header.string_table_size];
  if (flags & 0x01 != 0)
  {
    struct offset_map_entry
    {
      uint32_t offset;                                          // this offset is absolute, not relative to another structure; this can (and often will) be zero, in which case you should ignore that entry and move on
      uint16_t length;                                          // this is the length of the record located at the specified offset
    };
    offset_map_entry offset_map[header.max_id - header.min_id + 1];
  }
  if (flags & 0x04 != 0)
  {
    uint32_t IDs[header.record_count];
  }
  if (header.copy_table_size > 0)
  {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[header.copy_table_size / sizeof(copy_table_entry)];
  }
};

Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string_table will not exist.

WDB6

This section only applies to versions Legion (7.2.0.23436) … < Legion (7.3.5.25600).

This file format was introduced in Legion (Patch 7.2.0 build 23436) and was replaced by WDC1 later in Legion (Patch 7.3.5 build 25600).

Two new header fields and a new data block were added in WDB6. Both of the header fields relate to the new block, which we have named 'common_data_table'. Its purpose is to drastically reduce db2 filesize and memory footprint (affected db2s have had their filesizes halved) by letting the client assume that many columns are always 0, unless there is an entry in the proper 'common_data_table' that is mapped to the ID of the row in question. For example, in build 23436, only 10 columns are in the 'normal' row data section for SpellEffect.db2. However, the 'common_data_table' supports up to 26 columns (designated by the new header field named 'total_field_count'). To find the value for one of the latter 16 columns in SpellEffect.db2, you look up the column in the 'common_data_table' and then the ID in the 'common_data_map' for that column. If there is an entry, use the value paired with the ID. If there is not an entry, the value 'defaults'.

Default values are stored in the WoW binary, in DBMeta. It's worth noting that almost every field's default value is '0', with a few exceptions. For example, the 'Alpha' byte field in CreatureDisplayInfo defaults to 255, and a couple of floats in SpellEffect.db2 default to 1.

Neither strings or arrays are supported in the 'common_data_table'.

Starting from Patch 7.3.0 Build 24473, values in the 'common_data_table' are always padded out to 4 bytes. Detecting this change is very annoying as there were no other accompanying changes. If you wish to support WDB6 both before and after this build, you will need to attempt to navigate the common_data_table once without padding. Compare the distance you just navigated against the common_data_table_size field from the header. If they match, it is either not padded (pre-7.3) or does not matter (because all of the common data fields are 4 bytes). If it does not match, then it is padded (post-7.3). After determining this, you can then parse the table again with the knowledge that you are doing it correctly.

It is worth a minor mention here that from WDB6 onwards, standalone ADB files were discarded in favor of 'ADB#DBCache.bin', a new format that does not mirror DB2 structure at all.

Structure

struct wdb6_db2_header
{
  uint32_t magic;                                               // 'WDB6'
  uint32_t record_count;
  uint32_t field_count;                                         // this counts arrays as '1' field
  uint32_t record_size;
  uint32_t string_table_size;                                   // if flags & 0x01 != 0, this field takes on a new meaning - it becomes an absolute offset to the beginning of the offset_map
  uint32_t table_hash;
  uint32_t layout_hash;                                         // used to be 'build', but now this is a hash field that changes only when the structure of the data changes
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
  uint32_t copy_table_size;
  uint16_t flags;                                               // possible values are listed in Known Flag Meanings
  uint16_t id_index;                                            // this is the index of the field containing ID values; this is ignored if flags & 0x04 != 0
  uint32_t total_field_count;                                   // new in WDB6, includes columns only expressed in the 'common_data_table', unlike field_count
  uint32_t common_data_table_size;                              // new in WDB6, size of new block called 'common_data_table'
};

template<typename record_type>
struct wdb6_file
{
  wdb6_db2_header header;
  struct field_structure
  {
    int16_t size;                                               // size in bits as calculated by: byteSize = (32 - size) / 8; this value can be negative to indicate field sizes larger than 32-bits
    uint16_t position;                                          // position of the field within the record, relative to the start of the record
  };
  field_structure fields[header.field_count];
  record_type records[header.record_count];
  char string_table[header.string_table_size];
  if (flags & 0x01 != 0)
  {
    struct offset_map_entry
    {
      uint32_t offset;                                          // this offset is absolute, not relative to another structure; this can (and often will) be zero, in which case you should ignore that entry and move on
      uint16_t length;                                          // this is the length of the record located at the specified offset
    };
    offset_map_entry offset_map[header.max_id - header.min_id + 1];
  }
  if (flags & 0x04 != 0)
  {
    uint32_t IDs[header.record_count];
  }
  if (header.copy_table_size > 0)
  {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[header.copy_table_size / sizeof(copy_table_entry)];
  }
  if (header.common_data_table_size > 0)
  {
    uint32_t num_columns_in_table;
    struct common_data_map_entry
    {
      uint32_t id;
      uint32_t value;                                            // Calling this 'uint32_t' is an oversimplification - the size of this field depends on the 'type' from the enum 
                                                                 // (From Patch 7.3.0 Build 24473 onwards, this is no longer true. Values are always padded out to 4 bytes.)
    };
    struct common_data_table_entry
    {
      uint32_t count;
      uint8_t type;                                              // New enum: string = 0, short = 1, byte = 2, float = 3, int = 4 (int64 = 5??)
      common_data_map_entry common_data_map[count];
    };
    common_data_table_entry common_data_table[num_columns_in_table];
  }
};

Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string_table will not exist.

WDC1

This section only applies to versions Legion (7.3.5.25600) … Battle (8.0.1.26231).

This file format was introduced in Legion (Patch 7.3.5 build 25600) and was replaced by WDC2 in Battle for Azeroth (Patch 8.0.1 Build 26231).

WDC1 is an expanded version of WDB6. If you've been following along with the format changes in Legion thus far, you will notice that the theme has been saving space. Most recently, in WDB6, a 'common_data_block' was added in order to handle columns that are usually always one value but rarely are some other value.

WDC1 takes this idea and adds in even more types of space-saving techniques. It adds in bitpacking (for fields who would be best suited to a field size that is not a multiple of 8), indexed values (for fields with values that are commonly always one of a small number of values but who do not have a common 'default' amongst those values), and indexed array values (for array fields where all fields in an array are commonly linked together, for example, with common spell flag values). It also keeps the 'common_data_block' technique from WDB6, but it streamlines the implementation of it, removing the need for a separate header block for 'common_data'. This is done by adding in a new block we call 'field_storage_info'.

'field_storage_info' contains the size in bits and the offset in bits of every column, along with the size of the column's data in the different data blocks (currently the data blocks are 'pallet_data' and 'common_data'). The structure below explains most of this. Compression type '0' is normal in-field data. Compression types '1', '3', and '4' begin as bitpacked data in the record. For compression type '1', you can stop there, as that is the proper value. For compression types '3' and '4', you need to take the bitpacked value you obtained from the record and use it as an index into 'pallet_data'. For compression type '3', you pull a 4-byte value from 'pallet_data' using the formula 'additional_data_offset + (index * 4)', where 'additional_data_offset' is the sum of 'additional_data_size' for every column before the current one. Compression type '4' is very similar, except that it is always an array. Array values are bundled together for compression type '4' - that single index will give you 4-byte values for every single piece of the array at once. You can retrieve these values by using the formula 'additional_data_offset + (index * 4 * array_count) + (iter * 4)', where 'iter' is the 0-based array piece you wish to retrieve (you should iterate over all of them to retrieve all of the values for the array). Lastly, compression type '2' is just WDB6's old 'common_data_block' system; it is the only compression type where no data at all is read from the record. Just apply the 'default_value' (also provided by 'field_storage_info') when the ID for the record is not present in 'common_data'.

The last major quirk of WDC1 is the 'relationship_map' system. In WDC1, any DB2 that had a primary key which was also a foreign key (in Blizzard's database system, I guess) was removed from the record and placed into this new structure. This is probably for increased lookup speed by the game executable. So, for example, almost all Spell*.db2 tables (besides Spell.db2 because it wouldn't be a foreign key there) had their SpellID columns removed and this new block added in their place. The easiest way to handle this block is just to add a new 'fake' column to the end of the DB2 in question and populate it with the values from the relationship map.

Structure

struct wdc1_db2_header
{
  uint32_t magic;                  // 'WDC1'
  uint32_t record_count;
  uint32_t field_count;
  uint32_t record_size;
  uint32_t string_table_size;
  uint32_t table_hash;             // hash of the table name
  uint32_t layout_hash;            // this is a hash field that changes only when the structure of the data changes
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                 // as seen in TextWowEnum
  uint32_t copy_table_size;
  uint16_t flags;                  // possible values are listed in Known Flag Meanings
  uint16_t id_index;               // this is the index of the field containing ID values; this is ignored if flags & 0x04 != 0
  uint32_t total_field_count;      // from WDC1 onwards, this value seems to always be the same as the 'field_count' value
  uint32_t bitpacked_data_offset;  // relative position in record where bitpacked data begins; not important for parsing the file
  uint32_t lookup_column_count;
  uint32_t offset_map_offset;      // Offset to array of struct {uint32_t offset; uint16_t size;}[max_id - min_id + 1];
  uint32_t id_list_size;           // List of ids present in the DB file
  uint32_t field_storage_info_size;
  uint32_t common_data_size;
  uint32_t pallet_data_size;
  uint32_t relationship_data_size;
};

struct field_structure
{
  uint16_t size;
  uint16_t offset;
};

wdc1_db2_header header;
field_structure fields[header.total_field_count];
if ((header.flags & 1) == 0) {
  // Normal records
  struct record_data
  {
    char data[header.record_size];
  };
  record_data records[header.record_count];
  char string_data[header.string_table_size];
} else {
  // Offset map records -- these records have null-terminated strings inlined, and
  // since they are variable-length, they are pointed to by an array of 6-byte
  // offset+size pairs.
  char variable_record_data[header.offset_map_offset - sizeof(wdc1_db2_header) - (sizeof(field_structure) * header.total_field_count)];
  struct offset_map_entry
  {
    uint32_t offset;
    uint16_t size;
  };
  offset_map_entry offset_map[header.max_id - header.min_id + 1];
}

enum field_compression
{
  // None -- the field is a 8-, 16-, 32-, or 64-bit integer in the record data
  field_compression_none,
  // Bitpacked -- the field is a bitpacked integer in the record data.  It
  // is field_size_bits long and starts at field_offset_bits.
  // A bitpacked value occupies
  //   (field_size_bits + (field_offset_bits & 7) + 7) / 8
  // bytes starting at byte
  //   field_offset_bits / 8
  // in the record data.  These bytes should be read as a little-endian value,
  // then the value is shifted to the right by (field_offset_bits & 7) and
  // masked with ((1ull << field_size_bits) - 1).
  field_compression_bitpacked,
  // Common data -- the field is assumed to be a default value, and exceptions
  // from that default value are stored in the corresponding section in
  // common_data as pairs of { uint32_t record_id; uint32_t value; }.
  field_compression_common_data,
  // Bitpacked indexed -- the field has a bitpacked index in the record data.
  // This index is used as an index into the corresponding section in
  // pallet_data.  The pallet_data section is an array of uint32_t, so the index
  // should be multiplied by 4 to obtain a byte offset.
  field_compression_bitpacked_indexed,
  // Bitpacked indexed array -- the field has a bitpacked index in the record
  // data.  This index is used as an index into the corresponding section in
  // pallet_data.  The pallet_data section is an array of uint32_t[array_count],
  //
  field_compression_bitpacked_indexed_array,
};

struct field_storage_info
{
  uint16_t          field_offset_bits;
  uint16_t          field_size_bits; // very important for reading bitpacked fields; size is the sum of all array pieces in bits - for example, uint32[3] will appear here as '96'
  // additional_data_size is the size in bytes of the corresponding section in
  // common_data or pallet_data.  These sections are in the same order as the
  // field_info, so to find the offset, add up the additional_data_size of any
  // previous fields which are stored in the same block (common_data or
  // pallet_data).
  uint32_t          additional_data_size;
  field_compression storage_type;
  switch (storage_type)
  {
    case field_compression.field_compression_bitpacked:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t flags; // known values - 0x01: sign-extend (signed)
      break;
    case field_compression.field_compression_common_data:
      uint32_t default_value;
      uint32_t unk_or_unused2;
      uint32_t unk_or_unused3;
      break;
    case field_compression.field_compression_bitpacked_indexed:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t unk_or_unused3;
      break;
    case field_compression.field_compression_bitpacked_indexed_array:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t array_count;
      break;
    default:
      uint32_t unk_or_unused1;
      uint32_t unk_or_unused2;
      uint32_t unk_or_unused3;
      break;
  }
};

uint32_t id_list[header.id_list_size / 4];
if (header.copy_table_size > 0) {
  struct copy_table_entry
  {
    uint32_t id_of_new_row;
    uint32_t id_of_copied_row;
  };
  copy_table_entry copy_table[header.copy_table_size / sizeof(copy_table_entry)];
}
field_storage_info field_info[header.field_storage_info_size / sizeof(field_storage_info)];
char pallet_data[header.pallet_data_size];
char common_data[header.common_data_size];
if (header.relationship_data_size > 0) {
  // In some tables, this relationship mapping replaced columns that were used
  // only as a lookup, such as the SpellID in SpellX* tables.
  struct relationship_entry
  {
    // This is the id of the foreign key for the record, e.g. SpellID in
    // SpellX* tables.
    uint32_t foreign_id;
    // This is the index of the record in record_data.  Note that this is
    // *not* the record's own ID.
    uint32_t record_index;
  };
  struct relationship_mapping
  {
    uint32_t            num_entries;
    uint32_t            min_id;
    uint32_t            max_id;
    relationship_entry  entries[num_entries];
  };
  relationship_mapping relationship_map;
}

Further Quirks

The 'field structure' section introduced in WDB5 still exists here, but 'field storage info' has all of the same information and more. You may wonder - why should I even bother reading field structure now if I only care about WDC1 support? There is one good reason - array sizes. In WDC1, the field structure section reports the size of the individual field piece in the file (for compression type '0') in bytes. However, the field storage info section reports the size of the entire field in bits. The difference in these values is important if an array is the field in question. For example, in field structure, an array of uint32[3] would be reported as size '4', while in field storage info, it would be reported as '96'. This technicality greatly benefits us as a trivial bit of math allows you to instantly deduce array sizes when reading the file: 'ArrayLength = (SizeReportedByFieldStorageInfo / 8) / SizeReportedByFieldStructure'.

There are several fields in the structure with no apparent purpose, but I believe these are all related. 'bitpacked_data_offset' in the header and 'bitpacking_offset_bits' and 'bitpacking_size_bits' from some variants of 'field_storage_info' (the bitpacked ones) are the fields I am referring to. I believe these three values are used to read from the record starting at the position of the bitpacked values. Our best guess is that Blizzard needs this for some purpose, possibly to integrate with their existing tools.

On the note of relationship maps, the aspect that makes them tricky (the parsing is actually very simple) is that sometimes the column in question remains in DB2 data in addition to being in a relationship map. For example, Achievement.db2's CriteriaTreeID field was not removed, but it had a relationship map added anyway. Cases like this are difficult to handle properly. You could just allow the duplicate data to be added as a fake column to the end, as there may not be any harm, especially if you are expecting it. Alternatively, you could parse the game executable's DBCMeta - the fields removed by relationship maps are still present there. If there is a mismatch between the number of fields reported by DBCMeta and the number of fields reported by the DB2, then you know that the relationship map data is unique and needs to be restored. If the number of fields match, then the data in the relationship map already exists in the record and is superfluous.

WDC2

This section only applies to versions ≥ Battle (8.0.1.26231).

This file format was introduced in Battle for Azeroth (Patch 8.0.1 Build 26231) and is currently in use today.

WDC2 introduces a new system of 'sections' of data. Three chunks of the structure ('field_storage_info', 'pallet_data', and 'common_data') were moved up in front of the records / 'sections'. Everything after that was broken into a new form of 'sections' which can potentially occur many times in one file. Note that implementing this system right now appears to be entirely optional as no DB2 file as of the current build has a 'section_count' higher than 1. Also, it appears that the 'offset_map' format (in use for a small handful of DB2s) has a hardcoded limit of 1 section.

The other major change is to strings and how string offsets are calculated. For the history of WoW up until WDC2, strings in records were represented as 'the relative position of the referenced string to the start of the string block'. In WDC2, strings in records are represented as 'the relative position from the beginning of the field where this offset was stored to the position of the referenced string in the string block'. This complicates parsing the string block directly and storing relative locations, but it does slightly simplify reading records.

If before WDC2 you dealt with strings by reading the string block and assigning relative positions to every string, then you will need to either change your approach (to having a constantly-accessible 1:1 representation of the string table available) or mitigate the damage from the change. One method of mitigation is: when reading the values of the string offsets in the records, replace the data at that point in time with 'old-style' string offsets.

For example, using code like this when reading the field in question works:

row[field].LongVal = reader.ReadVarInt(Fields[field].Size);
if (IsWDC2() && row[field].Column.Type == ColumnType.String)
{
    row[field].LongVal = row[field].LongVal + reader.BaseStream.Position - Fields[field].Size - reader.StringBlockLocation;
}

This sample code reads the field containing the string offset in the record, then recalculates the value it just read by adding in the current position and subtracting the size of the field it just read (to get the absolute position of the referenced string) and then by subtracting the absolute position of the start of the string block (which yields the relative position of the referenced string to the start of the string block). This manipulation ensures that a program converts every string offset back to the pre-WDC2 style.

Structure

struct wdc2_db2_header
{
  uint32_t magic;                  // 'WDC2'
  uint32_t record_count;           // this is for all sections combined now
  uint32_t field_count;
  uint32_t record_size;
  uint32_t string_table_size;      // this is for all sections combined now
  uint32_t table_hash;             // hash of the table name
  uint32_t layout_hash;            // this is a hash field that changes only when the structure of the data changes
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                 // as seen in TextWowEnum
  uint16_t flags;                  // possible values are listed in Known Flag Meanings
  uint16_t id_index;               // this is the index of the field containing ID values; this is ignored if flags & 0x04 != 0
  uint32_t total_field_count;      // from WDC1 onwards, this value seems to always be the same as the 'field_count' value
  uint32_t bitpacked_data_offset;  // relative position in record where bitpacked data begins; not important for parsing the file
  uint32_t lookup_column_count;
  uint32_t field_storage_info_size;
  uint32_t common_data_size;
  uint32_t pallet_data_size;
  uint32_t section_count;          // new to WDC2, this is number of sections of data (records + copy table + id list + relationship map = a section)
};
wdc2_db2_header header;

struct wdc2_section_header
{
  uint32_t wdc2_unk_header1;       // always 0 in Battle (8.0.1.26231) and unnamed in client binary
  uint32_t wdc2_unk_header2;       // always 0 in Battle (8.0.1.26231) and unnamed in client binary
  uint32_t file_offset;            // absolute position to the beginning of the section
  uint32_t record_count;           // 'record_count' for the section
  uint32_t string_table_size;      // 'string_table_size' for the section
  uint32_t copy_table_size;
  uint32_t offset_map_offset;      // Offset to array of struct {uint32_t offset; uint16_t size;}[max_id - min_id + 1];
  uint32_t id_list_size;           // List of ids present in the DB file
  uint32_t relationship_data_size;
};
wdc2_section_header section_headers[section_count];

struct field_structure
{
  uint16_t size;
  uint16_t offset;
};
field_structure fields[header.total_field_count];

enum field_compression
{
  // None -- the field is a 8-, 16-, 32-, or 64-bit integer in the record data
  field_compression_none,
  // Bitpacked -- the field is a bitpacked integer in the record data.  It
  // is field_size_bits long and starts at field_offset_bits.
  // A bitpacked value occupies
  //   (field_size_bits + (field_offset_bits & 7) + 7) / 8
  // bytes starting at byte
  //   field_offset_bits / 8
  // in the record data.  These bytes should be read as a little-endian value,
  // then the value is shifted to the right by (field_offset_bits & 7) and
  // masked with ((1ull << field_size_bits) - 1).
  field_compression_bitpacked,
  // Common data -- the field is assumed to be a default value, and exceptions
  // from that default value are stored in the corresponding section in
  // common_data as pairs of { uint32_t record_id; uint32_t value; }.
  field_compression_common_data,
  // Bitpacked indexed -- the field has a bitpacked index in the record data.
  // This index is used as an index into the corresponding section in
  // pallet_data.  The pallet_data section is an array of uint32_t, so the index
  // should be multiplied by 4 to obtain a byte offset.
  field_compression_bitpacked_indexed,
  // Bitpacked indexed array -- the field has a bitpacked index in the record
  // data.  This index is used as an index into the corresponding section in
  // pallet_data.  The pallet_data section is an array of uint32_t[array_count],
  //
  field_compression_bitpacked_indexed_array,
};

struct field_storage_info
{
  uint16_t          field_offset_bits;
  uint16_t          field_size_bits; // very important for reading bitpacked fields; size is the sum of all array pieces in bits - for example, uint32[3] will appear here as '96'
  // additional_data_size is the size in bytes of the corresponding section in
  // common_data or pallet_data.  These sections are in the same order as the
  // field_info, so to find the offset, add up the additional_data_size of any
  // previous fields which are stored in the same block (common_data or
  // pallet_data).
  uint32_t          additional_data_size;
  field_compression storage_type;
  switch (storage_type)
  {
    case field_compression.field_compression_bitpacked:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t flags; // known values - 0x01: sign-extend (signed)
      break;
    case field_compression.field_compression_common_data:
      uint32_t default_value;
      uint32_t unk_or_unused2;
      uint32_t unk_or_unused3;
      break;
    case field_compression.field_compression_bitpacked_indexed:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t unk_or_unused3;
      break;
    case field_compression.field_compression_bitpacked_indexed_array:
      uint32_t bitpacking_offset_bits; // not useful for most purposes; formula they use to calculate is bitpacking_offset_bits = field_offset_bits - (header.bitpacked_data_offset * 8)
      uint32_t bitpacking_size_bits; // not useful for most purposes
      uint32_t array_count;
      break;
    default:
      uint32_t unk_or_unused1;
      uint32_t unk_or_unused2;
      uint32_t unk_or_unused3;
      break;
  }
};
field_storage_info field_info[header.field_storage_info_size / sizeof(field_storage_info)];
char pallet_data[header.pallet_data_size];
char common_data[header.common_data_size];

struct section
{
  if ((header.flags & 1) == 0) {
    // Normal records
    struct record_data
    {
      char data[section_headers.record_size];
    };
    record_data records[section_headers.record_count];
    char string_data[section_headers.string_table_size];
  } else {
    // Offset map records -- these records have null-terminated strings inlined, and
    // since they are variable-length, they are pointed to by an array of 6-byte
    // offset+size pairs.
    char variable_record_data[section_headers.offset_map_offset - section_headers.file_offset];
    struct offset_map_entry
    {
      uint32_t offset;
      uint16_t size;
    };
    offset_map_entry offset_map[header.max_id - header.min_id + 1];
  }
  
  uint32_t id_list[section_headers.id_list_size / 4];
  if (section_headers.copy_table_size > 0) {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[section_headers.copy_table_size / sizeof(copy_table_entry)];
  }
  if (section_headers.relationship_data_size > 0) {
    // In some tables, this relationship mapping replaced columns that were used
    // only as a lookup, such as the SpellID in SpellX* tables.
    struct relationship_entry
    {
      // This is the id of the foreign key for the record, e.g. SpellID in
      // SpellX* tables.
      uint32_t foreign_id;
      // This is the index of the record in record_data.  Note that this is
      // *not* the record's own ID.
      uint32_t record_index;
    };
    struct relationship_mapping
    {
      uint32_t            num_entries;
      uint32_t            min_id;
      uint32_t            max_id;
      relationship_entry  entries[num_entries];
    };
    relationship_mapping relationship_map;
  }
};
section data_sections[section_count];

Known Flag Meanings for WDB4+

flags & 0x01 = 'Has offset map'
flags & 0x02 = 'Has secondary key'
flags & 0x04 = 'Has non-inline IDs'
flags & 0x10 = ? // WDC1 specific. used on all WDC1 files