DB2

From wowdev
Revision as of 04:19, 27 December 2015 by Simca (talk | contribs) (Huge update - includes many more details about all aspects of the format and has updated information about WDB3/WDB4.)
Jump to navigation Jump to search

DB2 files are the new version of client side databases, introduced in Cataclysm, containing data about items, NPCs, environment, world and a lot more. Except for their header they are pretty much equivalent to DBC files. You may want to also look at those. The structure described here is also used in ADB files, which are a cache of dynamically streamed database entries / hotfixes.

Table content structures

This page describes the structure of DB2 files. For a list of existing DB2 files and their contents see the categories DBC, Vanilla, Burning Crusade, Wrath of the Lich King, Cataclysm, Mists of Pandaria and Warlords of Draenor. If you add documentation for a file, please add the correct categories (also the build number) as well.

Field Types

WDB2 / WCH3 began with the following possible field types:

64-bit Integers*
32-bit Integers*
8-bit Integers*
Floats
Strings (strings are represented in the record data as a 32-bit unsigned integer, see the String Block section for more information)

Additionally, WDB3 / WCH4 added the following possible field type:

16-bit Integers*
  • Note that Blizzard does not differentiate between signed and unsigned field types; WoW code simply casts the data around as it needs. Because of this, some fields will make more sense as signed integers (example: a casting time reduction) and some fields will make more sense as unsigned integers (example: bitfields). You will have to make the determination as to which fields should be which on your own. Personally, I default to Signed 32-bit Integers, Unsigned 16-bit Integers, and Unsigned 8-bit Integers.

Determining Field Types

In WDB2/WCH3, you can mostly just assume that every field will be four bytes. Three out of the five possibilities are four bytes, and the 8-bit integers are only used very, very rarely (literally like 3-4 files out of hundreds). The 64-bit Integers are actually only used ONCE so far (in CriteriaTree it was needed to store the number 2,500,000,000, which is 250k gold in copper). Deciding whether or not a four-byte value is a float, an integer, or a string is not terribly difficult (floats will virtually always have certain bits set, and every value in a string field will be an offset into the string block, which you can check), and this approach gives you ~98% compatibility with the WDB2/WCH3 format with minimal effort.

In WDB3+/WCH4+, things become much, much harder. Determining field types on the fly is virtually impossible, and the majority of DBCs (80%+) have at least one field that is not four bytes. The only proper solution is to read the WoW binary executable and parse it for the DBCMeta structure. In that structure you will find the field types for all fields

A Note on Field Counts

Take for example, the following structure (which I made up):

uint32_t ID;
uint32_t ItemID[8];
uint32_t NPCID[2];

The field count in the header of the DB2 file will be 11, which you would expect since a record would contain 11 uint32_t values. However, the field count in the header of the counterpart ADB file will be 3. It counts arrays as one field, regardless of how long that 'field' would have to be. This is very likely a serialization bug in WoW's ADB writing, but it has gone unchanged for over half a decade, so you need to plan for it if you rely on field count. If you're reading and parsing DBCMeta directly, you don't really have to ever check or care about field count, though. Blizzard certainly doesn't, which is why this bug has gone unnoticed.

WDB2 (.db2) / WCH3 (.adb)

This file format was introduced in Cataclysm. It was phased out in favor of WDB3/WCH4 in Legion (Patch 7.0.1 build 20740).

Structure

struct db2_header
{
  uint32_t magic;                                               // 'WDB2' for .db2 (database), 'WCH3' for .adb (cache)
  uint32_t record_count;
  uint32_t field_count;
  uint32_t record_size;
  uint32_t string_table_size;                                   // string block almost always contains at least one zero-byte
  uint32_t table_hash;
  uint32_t build;
  uint32_t timestamp_last_written;                              // set to time(0); when writing in WowClientDB2_Base::Save()
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
  uint32_t copy_table_size;
};

template<typename record_type>
struct db2_file
{
  db2_header header;
  // static_assert (header.record_size == sizeof (record_type));

  if (header.max_id != 0)
  {
    int indices[header.max_id - header.min_id + 1];             // maps from id to row index in records[] below
    short string_lengths[header.max_id - header.min_id + 1];    // sum of lengths of all strings in row
  }
 
  record_type records[header.record_count];*
  char string_block[header.string_block_size];
};
  • Note: Each record is aligned to the length of the longest field type in the record. If you have a record with three fields: Int32, Int8, and Int8 - then there will be 2 bytes of padding at the end of every record, including the last one.

String Block

Equivalent to DBC version. See documentation there.

Localization

DB2 records can contain localized strings. In contrast to DBCs, a DB2 file only contains localized values for a given locale (header.locale).

After the transition to CASC, all DB files contain only localized string values, including DBCs.

WDB3 (.db2) / WCH4 (.adb)

The file extensions remain unchanged from the previous format. This file format was introduced in Legion (Patch 7.0.1 build 20740). It was phased out in favor of WDB4/WCH5 just a few builds later, also in Legion (Patch 7.0.1 build 20810).

It is worth noting that min_id, max_id, and duplicateIDBlockSize fields now see different use, despite the header remaining the same. min_id and max_id are always (?) non-zero, even when the 'offset map' structure is present. Because of this, detection methods for that structure must change. Additionally, the duplicate ID block field is now non-zero sometimes, requiring action.

Structure for WDB3

See WCH4-specific concerns for how to adapt this structure for the .adb counterpart.

template<typename record_type>
struct wdb3_file
{
  db2_header header;
  struct offsetMapEntry
  {
    uint32_t offset;
    uint16_t length;
  };
  offsetMapEntry offsetMap[header.max_id - header.min_id + 1];*
  record_type records[header.record_count]; 
  char string_block[header.string_block_size];
  uint32_t IDs[header.record_count];*
  if (header.duplicateIDBlockSize > 0)
  {
    struct duplicateIDEntry
    {
      uint32_t IDOfNewRow;
      uint32_t IDOfRowBeingCopied;
    };
    duplicateIDEntry duplicateIDBlock[header.duplicateIDBlockSize / 8];
  }
};

Note: If offsetMap exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string block usually still exists, just as size 2 with two blank entries.

  • This part of the structure is optional.

WCH4-specific Concerns

To be written later

Example Detection Code for Optional Structures

To be written later

Non-inline indices

To be written later

Offset Map

To be written later

WDB4 (.db2) / WCH5 (.adb)

The file extensions remain unchanged from the previous format. This file format was introduced in Legion (Patch 7.0.1 build 20810) and is still in use today.

WCH5-specific Concerns

The .adb files now diverge even further from .db2 in WCH5. The offset map changes from WCH4 remain in effect (see WCH4-specific Concerns). Additionally, WCH5 files are missing two header fields compared to its WDB4 counterpart (the last two fields - duplicateIDBlockSize and flags).