DB2: Difference between revisions

From wowdev
Jump to navigation Jump to search
(Huge update - includes many more details about all aspects of the format and has updated information about WDB3/WDB4.)
Line 4: Line 4:
This page describes the structure of [[DB2]] files. For a list of existing DB2 files and their contents see the categories [[:Category:DBC|DBC]], [[:Category:DBC_Vanilla|Vanilla]], [[:Category:DBC_BC|Burning Crusade]], [[:Category:DBC_WotLK|Wrath of the Lich King]], [[:Category:DBC_Cataclysm|Cataclysm]], [[:Category:DBC_MoP|Mists of Pandaria]] and [[:Category:DBC_WoD|Warlords of Draenor]]. If you add documentation for a file, please add the correct categories (also the build number) as well.
This page describes the structure of [[DB2]] files. For a list of existing DB2 files and their contents see the categories [[:Category:DBC|DBC]], [[:Category:DBC_Vanilla|Vanilla]], [[:Category:DBC_BC|Burning Crusade]], [[:Category:DBC_WotLK|Wrath of the Lich King]], [[:Category:DBC_Cataclysm|Cataclysm]], [[:Category:DBC_MoP|Mists of Pandaria]] and [[:Category:DBC_WoD|Warlords of Draenor]]. If you add documentation for a file, please add the correct categories (also the build number) as well.


=WDB2 / WCH2=
=Field Types=
WDB2 / WCH3 began with the following possible field types:
64-bit Integers*
32-bit Integers*
8-bit Integers*
Floats
Strings (strings are represented in the record data as a 32-bit unsigned integer, see the String Block section for more information)
Additionally, WDB3 / WCH4 added the following possible field type:
16-bit Integers*
 
*Note that Blizzard does not differentiate between signed and unsigned field types; WoW code simply casts the data around as it needs. Because of this, some fields will make more sense as signed integers (example: a casting time reduction) and some fields will make more sense as unsigned integers (example: bitfields). You will have to make the determination as to which fields should be which on your own. Personally, I default to Signed 32-bit Integers, Unsigned 16-bit Integers, and Unsigned 8-bit Integers.
 
==Determining Field Types==
In WDB2/WCH3, you can mostly just assume that every field will be four bytes. Three out of the five possibilities are four bytes, and the 8-bit integers are only used very, very rarely (literally like 3-4 files out of hundreds). The 64-bit Integers are actually only used ONCE so far (in CriteriaTree it was needed to store the number 2,500,000,000, which is 250k gold in copper). Deciding whether or not a four-byte value is a float, an integer, or a string is not terribly difficult (floats will virtually always have certain bits set, and every value in a string field will be an offset into the string block, which you can check), and this approach gives you ~98% compatibility with the WDB2/WCH3 format with minimal effort.
 
In WDB3+/WCH4+, things become much, much harder. Determining field types on the fly is virtually impossible, and the majority of DBCs (80%+) have at least one field that is not four bytes. The only proper solution is to read the WoW binary executable and parse it for the DBCMeta structure. In that structure you will find the field types for all fields
 
==A Note on Field Counts==
Take for example, the following structure (which I made up):
uint32_t ID;
uint32_t ItemID[8];
uint32_t NPCID[2];
 
The field count in the header of the DB2 file will be 11, which you would expect since a record would contain 11 uint32_t values. However, the field count in the header of the counterpart ADB file will be 3. It counts arrays as one field, regardless of how long that 'field' would have to be. This is very likely a serialization bug in WoW's ADB writing, but it has gone unchanged for over half a decade, so you need to plan for it if you rely on field count. If you're reading and parsing DBCMeta directly, you don't really have to ever check or care about field count, though. Blizzard certainly doesn't, which is why this bug has gone unnoticed.
 
=WDB2 (.db2) / WCH3 (.adb)=
This file format was introduced in Cataclysm. It was phased out in favor of WDB3/WCH4 in Legion (Patch 7.0.1 build 20740).
 
==Structure==
==Structure==
  struct db2_header
  struct db2_header
  {
  {
   uint32_t magic;                                              // 'WDB2' for .db2 (database), 'WCH2' for .adb (cache)
   uint32_t magic;                                              // 'WDB2' for .db2 (database), 'WCH3' for .adb (cache)
   uint32_t record_count;
   uint32_t record_count;
   uint32_t field_count;
   uint32_t field_count;
   uint32_t record_size;
   uint32_t record_size;
   uint32_t string_table_size;                                  // string block always contains at least one zero-byte
   uint32_t string_table_size;                                  // string block almost always contains at least one zero-byte
   uint32_t table_hash;
   uint32_t table_hash;
   uint32_t build;
   uint32_t build;
Line 34: Line 62:
   }
   }
    
    
   record_type records[header.record_count];  
   record_type records[header.record_count];*
   char string_block[header.string_block_size];
   char string_block[header.string_block_size];
  };
  };
*Note: Each record is aligned to the length of the longest field type in the record. If you have a record with three fields: Int32, Int8, and Int8 - then there will be 2 bytes of padding at the end of every record, including the last one.


==String Block==
==String Block==
Line 42: Line 72:


=Localization=
=Localization=
[[DB2]] records can contain localized strings. In contrast to [[DBC|DBCs]], a [[DB2]] file only contains localized values for a given locale (header.locale).  
[[DB2]] records can contain localized strings. In contrast to [[DBC|DBCs]], a [[DB2]] file only contains localized values for a given locale (header.locale).
 
After the transition to [[CASC]], all DB files contain only localized string values, including [[DBC|DBCs]].
 
=WDB3 (.db2) / WCH4 (.adb)=
The file extensions remain unchanged from the previous format. This file format was introduced in Legion (Patch 7.0.1 build 20740). It was phased out in favor of WDB4/WCH5 just a few builds later, also in Legion (Patch 7.0.1 build 20810).
 
It is worth noting that min_id, max_id, and duplicateIDBlockSize fields now see different use, despite the header remaining the same. min_id and max_id are always (?) non-zero, even when the 'offset map' structure is present. Because of this, detection methods for that structure must change. Additionally, the duplicate ID block field is now non-zero sometimes, requiring action.
 
==Structure for WDB3==
See WCH4-specific concerns for how to adapt this structure for the .adb counterpart.
template<typename record_type>
struct wdb3_file
{
  db2_header header;
  struct offsetMapEntry
  {
    uint32_t offset;
    uint16_t length;
  };
  offsetMapEntry offsetMap[header.max_id - header.min_id + 1];*
  record_type records[header.record_count];
  char string_block[header.string_block_size];
  uint32_t IDs[header.record_count];*
  if (header.duplicateIDBlockSize > 0)
  {
    struct duplicateIDEntry
    {
      uint32_t IDOfNewRow;
      uint32_t IDOfRowBeingCopied;
    };
    duplicateIDEntry duplicateIDBlock[header.duplicateIDBlockSize / 8];
  }
};
 
Note: If offsetMap exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string block usually still exists, just as size 2 with two blank entries.
 
*This part of the structure is optional.
 
==WCH4-specific Concerns==
To be written later
 
==Example Detection Code for Optional Structures==
To be written later
===Non-inline indices===
To be written later
===Offset Map===
To be written later


=WDB3 / WCH4=
=WDB4 (.db2) / WCH5 (.adb)=
paraphrased from Simca on #modcraft, Wed Nov 25 2015 12:54:32
The file extensions remain unchanged from the previous format. This file format was introduced in Legion (Patch 7.0.1 build 20810) and is still in use today.


three major differences:
==WCH5-specific Concerns==
* Optionally, non-inline indices, located in a new block after string block
The .adb files now diverge even further from .db2 in WCH5. The offset map changes from WCH4 remain in effect (see WCH4-specific Concerns). Additionally, WCH5 files are missing two header fields compared to its WDB4 counterpart (the last two fields - duplicateIDBlockSize and flags).
* Optionally, row redundancy reduction tech, where the WDB3 file has removed all rows who match other rows exactly. A new block has been added to the end (after the previously mentioned new block) that is an array of 8-byte structs in the form (uint32 IDOfNewRow, uint32 IDOfRowToCopy). The size of this block is stated by the last field of the header. To simplify WDB3 writing, you can read this block, expand the copies out into real, physical rows and then not write this block out. Remember to update the header fields for number of rows to account for the new rows and that the 'copy_table_size' is now 0. The client never -requires- this structure to be present; it looks purely at the last header field to determine whether it should be considered.
* Optionally, row data will be displaced (moved farther into the file) in favor of a wonky array of 6-byte structs in the form (uint32 fileOffset, uint16 recordLength) where the length of the array is 1 + maxID - minID (maxID and minID being header fields). that wonky format has you seek to the fileoffset listed and read the row in place. This format can still have the non-inline indices block or the row redundancy reduction block. Also strings are inline in this format. Since it has to iterate 1 by 1 from minID to maxID, there are often tens of thousands of the 6 byte structs that are pure 0 since the row doesn't exist. Make sure you are detecting if fileOffset is 0 and ignoring those entries. Detecting this format is tricky - the best method is probably to read the first integer after the header and check to see whether it is equal to the fileOffset that would move you to the end of the 6-byte struct array.


[[Category:Format]]
[[Category:Format]]

Revision as of 04:19, 27 December 2015

DB2 files are the new version of client side databases, introduced in Cataclysm, containing data about items, NPCs, environment, world and a lot more. Except for their header they are pretty much equivalent to DBC files. You may want to also look at those. The structure described here is also used in ADB files, which are a cache of dynamically streamed database entries / hotfixes.

Table content structures

This page describes the structure of DB2 files. For a list of existing DB2 files and their contents see the categories DBC, Vanilla, Burning Crusade, Wrath of the Lich King, Cataclysm, Mists of Pandaria and Warlords of Draenor. If you add documentation for a file, please add the correct categories (also the build number) as well.

Field Types

WDB2 / WCH3 began with the following possible field types:

64-bit Integers*
32-bit Integers*
8-bit Integers*
Floats
Strings (strings are represented in the record data as a 32-bit unsigned integer, see the String Block section for more information)

Additionally, WDB3 / WCH4 added the following possible field type:

16-bit Integers*
  • Note that Blizzard does not differentiate between signed and unsigned field types; WoW code simply casts the data around as it needs. Because of this, some fields will make more sense as signed integers (example: a casting time reduction) and some fields will make more sense as unsigned integers (example: bitfields). You will have to make the determination as to which fields should be which on your own. Personally, I default to Signed 32-bit Integers, Unsigned 16-bit Integers, and Unsigned 8-bit Integers.

Determining Field Types

In WDB2/WCH3, you can mostly just assume that every field will be four bytes. Three out of the five possibilities are four bytes, and the 8-bit integers are only used very, very rarely (literally like 3-4 files out of hundreds). The 64-bit Integers are actually only used ONCE so far (in CriteriaTree it was needed to store the number 2,500,000,000, which is 250k gold in copper). Deciding whether or not a four-byte value is a float, an integer, or a string is not terribly difficult (floats will virtually always have certain bits set, and every value in a string field will be an offset into the string block, which you can check), and this approach gives you ~98% compatibility with the WDB2/WCH3 format with minimal effort.

In WDB3+/WCH4+, things become much, much harder. Determining field types on the fly is virtually impossible, and the majority of DBCs (80%+) have at least one field that is not four bytes. The only proper solution is to read the WoW binary executable and parse it for the DBCMeta structure. In that structure you will find the field types for all fields

A Note on Field Counts

Take for example, the following structure (which I made up):

uint32_t ID;
uint32_t ItemID[8];
uint32_t NPCID[2];

The field count in the header of the DB2 file will be 11, which you would expect since a record would contain 11 uint32_t values. However, the field count in the header of the counterpart ADB file will be 3. It counts arrays as one field, regardless of how long that 'field' would have to be. This is very likely a serialization bug in WoW's ADB writing, but it has gone unchanged for over half a decade, so you need to plan for it if you rely on field count. If you're reading and parsing DBCMeta directly, you don't really have to ever check or care about field count, though. Blizzard certainly doesn't, which is why this bug has gone unnoticed.

WDB2 (.db2) / WCH3 (.adb)

This file format was introduced in Cataclysm. It was phased out in favor of WDB3/WCH4 in Legion (Patch 7.0.1 build 20740).

Structure

struct db2_header
{
  uint32_t magic;                                               // 'WDB2' for .db2 (database), 'WCH3' for .adb (cache)
  uint32_t record_count;
  uint32_t field_count;
  uint32_t record_size;
  uint32_t string_table_size;                                   // string block almost always contains at least one zero-byte
  uint32_t table_hash;
  uint32_t build;
  uint32_t timestamp_last_written;                              // set to time(0); when writing in WowClientDB2_Base::Save()
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
  uint32_t copy_table_size;
};

template<typename record_type>
struct db2_file
{
  db2_header header;
  // static_assert (header.record_size == sizeof (record_type));

  if (header.max_id != 0)
  {
    int indices[header.max_id - header.min_id + 1];             // maps from id to row index in records[] below
    short string_lengths[header.max_id - header.min_id + 1];    // sum of lengths of all strings in row
  }
 
  record_type records[header.record_count];*
  char string_block[header.string_block_size];
};
  • Note: Each record is aligned to the length of the longest field type in the record. If you have a record with three fields: Int32, Int8, and Int8 - then there will be 2 bytes of padding at the end of every record, including the last one.

String Block

Equivalent to DBC version. See documentation there.

Localization

DB2 records can contain localized strings. In contrast to DBCs, a DB2 file only contains localized values for a given locale (header.locale).

After the transition to CASC, all DB files contain only localized string values, including DBCs.

WDB3 (.db2) / WCH4 (.adb)

The file extensions remain unchanged from the previous format. This file format was introduced in Legion (Patch 7.0.1 build 20740). It was phased out in favor of WDB4/WCH5 just a few builds later, also in Legion (Patch 7.0.1 build 20810).

It is worth noting that min_id, max_id, and duplicateIDBlockSize fields now see different use, despite the header remaining the same. min_id and max_id are always (?) non-zero, even when the 'offset map' structure is present. Because of this, detection methods for that structure must change. Additionally, the duplicate ID block field is now non-zero sometimes, requiring action.

Structure for WDB3

See WCH4-specific concerns for how to adapt this structure for the .adb counterpart.

template<typename record_type>
struct wdb3_file
{
  db2_header header;
  struct offsetMapEntry
  {
    uint32_t offset;
    uint16_t length;
  };
  offsetMapEntry offsetMap[header.max_id - header.min_id + 1];*
  record_type records[header.record_count]; 
  char string_block[header.string_block_size];
  uint32_t IDs[header.record_count];*
  if (header.duplicateIDBlockSize > 0)
  {
    struct duplicateIDEntry
    {
      uint32_t IDOfNewRow;
      uint32_t IDOfRowBeingCopied;
    };
    duplicateIDEntry duplicateIDBlock[header.duplicateIDBlockSize / 8];
  }
};

Note: If offsetMap exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string block usually still exists, just as size 2 with two blank entries.

  • This part of the structure is optional.

WCH4-specific Concerns

To be written later

Example Detection Code for Optional Structures

To be written later

Non-inline indices

To be written later

Offset Map

To be written later

WDB4 (.db2) / WCH5 (.adb)

The file extensions remain unchanged from the previous format. This file format was introduced in Legion (Patch 7.0.1 build 20810) and is still in use today.

WCH5-specific Concerns

The .adb files now diverge even further from .db2 in WCH5. The offset map changes from WCH4 remain in effect (see WCH4-specific Concerns). Additionally, WCH5 files are missing two header fields compared to its WDB4 counterpart (the last two fields - duplicateIDBlockSize and flags).