DB2: Difference between revisions

From wowdev
Jump to navigation Jump to search
(→‎Structure for WDB4: Missing space.)
(→‎WDB3 (.db2) / WCH4 (.adb): Note about the useless-ness of WDB3/WCH4)
Line 73: Line 73:


=WDB3 (.db2) / WCH4 (.adb)=
=WDB3 (.db2) / WCH4 (.adb)=
The file extensions remain unchanged from the previous format. This file format was introduced in Legion (Patch 7.0.1 build 20740). It was phased out in favor of WDB4/WCH5 just a few builds later, also in Legion (Patch 7.0.1 build 20810).
The file extensions remain unchanged from the previous format. This file format was introduced in Legion (Patch 7.0.1 build 20740). It was phased out in favor of WDB4/WCH5 just a few builds later, also in Legion (Patch 7.0.1 build 20810). As such, support for WDB3 and WCH4 is likely important for almost nobody.


The major changes are the addition of an optional offset map which forces records to have inline strings, an optional block after the string block that contains non-inline IDs, and an optional block after that block which contains the ID numbers of rows that have been deduplicated to save space.
The major changes are the addition of an optional offset map which forces records to have inline strings, an optional block after the string block that contains non-inline IDs, and an optional block after that block which contains the ID numbers of rows that have been deduplicated to save space.

Revision as of 06:44, 28 December 2015

DB2 files are the new version of client side databases, introduced in Cataclysm, containing data about items, NPCs, environment, world and a lot more. Except for their header they are pretty much equivalent to DBC files. You may want to also look at those. The structure described here is also used in ADB files, which are a cache of dynamically streamed database entries / hotfixes.

Table content structures

This page describes the structure of DB2 files. For a list of existing DB2 files and their contents see the categories DBC, Vanilla, Burning Crusade, Wrath of the Lich King, Cataclysm, Mists of Pandaria and Warlords of Draenor. If you add documentation for a file, please add the correct categories (also the build number) as well.

Field Types

WDB2 / WCH3 began with the following possible field types:

64-bit Integers*
32-bit Integers*
8-bit Integers*
Floats
Strings (strings are represented in the record data as a 32-bit unsigned integer, see the String Block section for more information)

Additionally, WDB3 / WCH4 added the following possible field type:

16-bit Integers*
  • Note that Blizzard does not differentiate between signed and unsigned field types; WoW code simply casts the data around as it needs. Because of this, some fields will make more sense as signed integers (example: a casting time reduction) and some fields will make more sense as unsigned integers (example: bitfields). You will have to make the determination as to which fields should be which on your own. Personally, I default to Signed 32-bit Integers, Unsigned 16-bit Integers, and Unsigned 8-bit Integers.

Determining Field Types

In WDB2/WCH3, you can mostly just assume that every field will be four bytes. Three out of the five possibilities are four bytes, and the 8-bit integers are only used very, very rarely (literally like 3-4 files out of hundreds). The 64-bit Integers are actually only used ONCE so far (in CriteriaTree it was needed to store the number 2,500,000,000, which is 250k gold in copper). Deciding whether or not a four-byte value is a float, an integer, or a string is not terribly difficult (floats will basically always have certain bits set and every value in a string field will be an offset into the string block, which you can check), and this approach gives you ~98% compatibility with the WDB2/WCH3 format with minimal effort.

In WDB3+/WCH4+, things become much, much harder. Determining field types on the fly is virtually impossible, and the majority of DBCs (80%+) have at least one field that is not four bytes. The only proper solution is to read the WoW binary executable and parse it for the DBCMeta structure. In that structure you will find the field types for all fields.

WDB2 (.db2) / WCH3 (.adb)

This file format was introduced in Cataclysm. It was phased out in favor of WDB3/WCH4 in Legion (Patch 7.0.1 build 20740).

Structure

See WCH3-specific Concerns for an issue with the 'field_count' header variable in .ADB files.

struct db2_header
{
  uint32_t magic;                                               // 'WDB2' for .db2 (database), 'WCH3' for .adb (cache)
  uint32_t record_count;
  uint32_t field_count;
  uint32_t record_size;
  uint32_t string_table_size;                                   // string block almost always contains at least one zero-byte
  uint32_t table_hash;
  uint32_t build;
  uint32_t timestamp_last_written;                              // set to time(0); when writing in WowClientDB2_Base::Save()
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
  uint32_t copy_table_size;
};

template<typename record_type>
struct db2_file
{
  db2_header header;
  // static_assert (header.record_size == sizeof (record_type));

  if (header.max_id != 0)
  {
    int indices[header.max_id - header.min_id + 1];             // maps from id to row index in records[] below
    short string_lengths[header.max_id - header.min_id + 1];    // sum of lengths of all strings in row
  }
 
  record_type records[header.record_count];*
  char string_table[header.string_table_size];
};
  • Note: Each record is aligned to the length of the longest field type in the record. If you have a record with three fields: Int32, Int8, and Int8 - then there will be 2 bytes of padding at the end of every record, including the last one.

String Block

Equivalent to DBC version. See documentation there.

WCH3-specific Concerns

Take for example, the following structure (which I made up):

uint32_t ID;
uint32_t ItemID[8];
uint32_t NPCID[2];

The field count in the header of the DB2 file will be 11, which you would expect since a record would contain 11 uint32_t values. However, the field count in the header of the counterpart ADB file will be 3. It counts arrays as one field, regardless of how long that 'field' would have to be. This is very likely a serialization bug in WoW's ADB writing, but it has gone unchanged for over half a decade, so you need to plan for it if you rely on field count. If you're reading and parsing DBCMeta directly, you don't really have to ever check or care about field count, though. Blizzard certainly doesn't, which is why this bug has gone unnoticed.

WDB3 (.db2) / WCH4 (.adb)

The file extensions remain unchanged from the previous format. This file format was introduced in Legion (Patch 7.0.1 build 20740). It was phased out in favor of WDB4/WCH5 just a few builds later, also in Legion (Patch 7.0.1 build 20810). As such, support for WDB3 and WCH4 is likely important for almost nobody.

The major changes are the addition of an optional offset map which forces records to have inline strings, an optional block after the string block that contains non-inline IDs, and an optional block after that block which contains the ID numbers of rows that have been deduplicated to save space.

It is worth noting that min_id, max_id, and copy_table_size fields now see different use, despite the header remaining the same. min_id and max_id are always (?) non-zero, even when the offset_map structure is present. Additionally, the copy_table_size field is now non-zero sometimes, requiring action.

Structure for WDB3

See WCH4-specific Concerns for how to adapt this structure for the .ADB counterpart.

template<typename record_type>
struct wdb3_file
{
  db2_header header;
  struct offset_map_entry
  {
    uint32_t offset;                                             // this offset is absolute, not relative to another structure; this can (and often will) be zero, in which case you should ignore that entry and move on
    uint16_t length;                                             // this is the length of the record located at the specified offset
  };
  offset_map_entry offset_map[header.max_id - header.min_id + 1];*
  record_type records[header.record_count]; 
  char string_table[header.string_table_size];
  uint32_t IDs[header.record_count];*
  if (header.copy_table_size > 0)
  {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[header.copy_table_size / copy_table_entry.size()];
  }
};

Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string block usually still exists, just as size 2 with two blank entries.

  • This part of the structure is optional.

WCH4-specific Concerns

In .adb files with the WCH4 signature, the issue with field_count described above in WCH3-specific Concerns still exists. Additionally, the optional offset_map structure changes. Specifically, this part defined in the structure above is changed:

struct offset_map_entry
{
  uint32_t ID;
  uint32_t offset;
  uint16_t length;
};

Note that, in most cases, the non-inline IDs structure still exists, even if it is made completely redundant by the existence of the offset map. After all, why do work once when you could do it twice?

WDB4 (.db2) / WCH5 (.adb)

The file extensions remain unchanged from the previous format. This file format was introduced in Legion (Patch 7.0.1 build 20810) and is still in use today.

The offset_map structure has moved back farther in the file and is now located in between the string table and the non-inline IDs table (this change does not apply to WCH5 - see WCH5-specific Concerns). Additionally, while not a format change, the string table is now always 0 bytes if the offset map is present. In those cases, the 'string_table_size' header value changes its meaning to "offset to the start of the offset map". This is a minor optimization since parsing the file must begin with the offset map.

The header has changed from the original format; it gained one new field ('flags'). This flags field allows for easy detection of the optional structures.

Structure for WDB4

See WCH5-specific Concerns for how to adapt this structure for the .ADB counterpart.

struct wdb4_db2_header
{
  uint32_t magic;                                               // 'WDB4' for .db2 (database), 'WCH5' for .adb (cache)
  uint32_t record_count;
  uint32_t field_count;
  uint32_t record_size;
  uint32_t string_table_size;                                   // string block almost always contains at least one zero-byte
  uint32_t table_hash;
  uint32_t build;
  uint32_t timestamp_last_written;                              // set to time(0); when writing in WowClientDB2_Base::Save()
  uint32_t min_id;
  uint32_t max_id;
  uint32_t locale;                                              // as seen in TextWowEnum
  uint32_t copy_table_size;
  uint32_t flags;                                               // in WDB3/WCH4, this field was in the WoW executable's DBCMeta; possible values are listed in Known Flag Meanings
};

template<typename record_type>
struct wdb4_file
{
  wdb4_db2_header header;
  record_type records[header.record_count]; 
  char string_table[header.string_table_size];
  if (flags & 0x01 != 0)
  {
    struct offset_map_entry
    {
      uint32_t offset;                                           // this offset is absolute, not relative to another structure; this can (and often will) be zero, in which case you should ignore that entry and move on
      uint16_t length;                                           // this is the length of the record located at the specified offset
    };
    offset_map_entry offset_map[header.max_id - header.min_id + 1];
  }
  if (flags & 0x04 != 0)
  {
    uint32_t IDs[header.record_count];
  }
  if (header.copy_table_size > 0)
  {
    struct copy_table_entry
    {
      uint32_t id_of_new_row;
      uint32_t id_of_copied_row;
    };
    copy_table_entry copy_table[header.copy_table_size / copy_table_entry.size()];
  }
};

Note: If offset_map exists, all the strings will be embedded inline in the records (as null-terminated c-strings). The string_table will not exist.

Known Flag Meanings

flags & 0x01 = 'Has offset map'
flags & 0x04 = 'Has non-inline IDs'

WCH5-specific Concerns

The issue with field counts from WCH3 remains in effect (WCH3-specific Concerns), and the offset map changes from WCH4 also remain in effect (WCH4-specific Concerns). Note that the offset map changes from WCH4 (to include an 'ID' field for each entry) override the offset map changes for WDB4 (to move the offset map after the string table), so the offset map's position remains unchanged compared to WCH4. Additionally, WCH5 files are missing two header fields compared to its WDB4 counterpart (the last two fields - copy_table_size and flags).

Since the deduplication technology we call 'copy_table' is not employed in any of the .ADB formats, copy_table_size variable was always zero anyway. As for flags, Blizzard always loads the .DB2 file before loading the .ADB file, so the flags field being in both files is just redundant (for them); any .ADB file will be read using the flags field from the .DB2 file of the same name. This is inconvenient for us, as most (if not all) of us load .ADB files standalone at the moment, but there are workarounds (such as having a configuration file to remember the last used flags value for each named .DB2 file).

Localization

DB2 records can contain localized strings. In contrast to DBCs, a DB2 file only contains localized values for a given locale (header.locale).

Since Cataclysm, all DB files contain only localized string values, including DBCs.