CASC

From wowdev
Revision as of 13:05, 14 October 2018 by Marlamin (talk | contribs) (Move more information to the TACT page)
Jump to navigation Jump to search
Missing something? This page is being split up into separate pages. For the content transfer part of NGDP, see TACT. This page should now only contain information on the local filesystem format called CASC.

CASC is the name of the new file system that Blizzard has created to replace the outdated format of MPQ.

CASC v1

The CASC file system made its first debut in the Heroes of the Storm Technical Alpha, which was hosted on Blizzard's servers in late January. The form of CASC that Heroes of the Storm uses is designated by Blizzard as "CASC". In contrast, World of Warcraft's "build-playbuild-installer" config line clearly states it is generated by "ngdptool_casc2" (NGDP stands for Next Generation Download Procotol). These are the two most substantial changes between CASC v1 and CASC v2:

  • Sections of CASC v1 data files are grouped together in collections of files we call "packages". These packages all have the same root folder, and if all of the files are not properly added with the package's base directory, the extraction process will produce an incredibly mangled directory output. This system is completely removed in CASC v2.
  • CASC v1's Root file relates content hashes to file names. CASC v2's Root file relates content hashes to name hashes. Translating name hashes to file names requires use of the Jenkins Hash function [1], which in turn requires a listfile to generate the hashes. Essentially CASC v1 has its own listfile (in root). CASC v2 does not, and requires the user to provide names.

The remainder of this article will refer exclusively to the system called CASC v2 as 'CASC'. While many parts of the file system are identical between v1 and v2, there are enough changes to make explaining both formats at once inadvisable.

File References

Files are referred to by many different pieces of data in CASC. A quick summary of them:

  • Filename: The file's real name. Note that one file can have many names - essentially, one encoding key can map to many different name hashes.
  • Locale Flag:
  • Content Flag:
  • Name Hash: The file's name, after being hashed with the Jenkins Hash.
  • Content Hash: The MD5 of the entire file in its uncompressed state; the purest representation of the data.
  • Encoding Hash/Key: MD5 hash of the potentially encoded file. For unencoded files, the content hash. For chunkless BLTE files lacking a chunk table, this hash covers the entire encoded file. For chunked BLTE files, this hash covers only the BLTE headers including chunk table, as the chunk table contains hashes of the content of each chunk. A given file can be encoded in many ways, and a single content hash may potentially have multiple encoding keys.
  • CDN Key: The key used to lookup a file on the CDN. Synonym of encoding key.
  • Header Hash: Inaccurate synonym of encoding key.

States of CASC Data

CASC data comes in all forms and sizes.

Key CASC Files

Encoding

The encoding file maps content hashes C-Keys to encoded-file hashes E-Keys. In addition, there is information on how the files are BLTE-encoded by E-Specs.

Blocks in this file are, in this order

  • header
  • encoding specification data ESpec
  • content key → encoding key table CEKeyPageTable
  • encoding key → encoding spec table EKeySpecPageTable
  • encoding specification data for the encoding file itself

An incomplete/outdated 010 Editor template can be found at this gist which can be used to understand page handling.

Header

Header is a constant 0x16 bytes giving size information for the other blocks, mostly.

struct {
/*0x00*/  char signature[2];                            // "EN"
          enum {
            encoding_version_1 = 1,                     // ≥ WoD (6.0.1.18125)
          };
/*0x02*/  uint8_BE_t version;
/*0x03*/  uint8_BE_t hash_size_ckey;
/*0x04*/  uint8_BE_t hash_size_ekey;
/*0x05*/  uint16_BE_t CEKeyPageTable_page_size_kb;      // in kilo bytes. e.g. 4 in here → 4096 byte pages (default)
/*0x07*/  uint16_BE_t EKeySpecPageTable_page_size_kb;   // ^
/*0x09*/  uint32_BE_t CEKeyPageTable_page_count;
/*0x0D*/  uint32_BE_t EKeySpecPageTable_page_count;
/*0x11*/  uint8_BE_t _unknown_x11;                      // 0 -- sometimes assumed to be part of ESpec_block_size, but actually asserted to be zero by agent
/*0x12*/  uint32_BE_t ESpec_block_size;
/*0x16*/
} header;

ESpec

Encoding specification strings are just a blob of zero terminated strings referenced by EKeySpecPageTable with the accumulated size of header.ESpec_block_size (including zero terminators).

The definition of the format for these strings is described on the BLTE page.

Page Tables

The format of the two page tables is the same:

  • an index for fast key → page access, followed by
  • the actual pages with specific content

In both cases, the entries in the lists have the same count, and semi-dynamic size, depending on header.hash_size_ckey and header.hash_size_ekey. Note that the page checksum size is fixed to MD5's 16 bytes. The position comments below assume the standard key size of 16 bytes.

struct page_index_t {
/*0x00*/  char first_Xkey[header.hash_size_Xkey];       // where X is c for CEKeyPageTable and e for EKeySpecPageTable
/*0x10*/  char page_md5[0x10];
/*0x20 usually*/
};

The pages themselves are filled as much possible with actual entry structs. They are padded to the end for alignment of pages. Pages don't actually have to be full.

CEKeyPageTable

This table maps one ckey to one or more ekeys. This means that there can be multiple representations, e.g. an encrypted and an unencrypted version. This isn't usually the case and on reading any of them can be picked, since they represent the same file content. It can be used to pick an already downloaded archive, or unencrypted one, or handle deleted archives.

struct ckey_ekey_entry_t {
/*0x00*/  uint8_BE_t keyCount;
/*0x01*/  uint40_BE_t file_size;                        // of the non-encoded version of the file
/*0x06*/  char ckey[header.hash_size_ckey];             // this ckey is represented by…
/*0x16*/  char ekey[header.hash_size_ekey][keyCount];   // …these ekeys
/*0x26 usually*/
} page_entries[];
EKeySpecPageTable

This table maps one ekey to the corresponding espec describing how the encoding happened.

struct ekey_espec_entry_t {
/*0x00*/  char ekey[header.hash_size_ekey];
/*0x10*/  uint32_BE_t espec_index;                      // not an offset but an index, assuming zero-terminated espec strings
/*0x14*/  uint40_BE_t file_size;                        // of the encoded version of the file
/*0x19 usually*/
} page_entries[];

Install

File signature: "IN"

The install file lists files installed on disk. Since the install file is shared by architectures and OSs, there are also tags to select a subset of files. When using multiple tags, a binary combination of the bitfields of files to be installed can be created.

Header Structure

The file begins with a 10 byte header describing the number of tags and files listed. Structure names were invented by the author of this page.

Offset (Hex) Type Name Description
0x00 char[2] FileSignature "IN"
0x02 uint8_t Version? 1
0x03 uint8_t hash_size size of hashes used for files (usually md5 -> 16)
0x04 uint16_BE_t num_tags number of tags in header of file
0x06 uint32_BE_t num_entries The number of entries in the body of the file

Tags Structure

After the header, an array with information about available tags follows. Each tag has a bitfield listing the files installed when the given tag is chosen.

Type Name Description
char[] name
uint16_BE_t type A number shared amongst specific flags. Actual meaning is specific to products.
char[divru (header.entries, CHAR_BIT)] files A bitfield that lists which files are installed when the specified tag is installed.

Files Structure

The remainder of the file is populated by a list of files with their content hash, each a variable size (due to the strings). Structure names were invented by the author of this page.

Type Name Description
char[] FileName The name of the file.
char[header.hash_size] hash The hash of the uncompressed file. Usually MD5.
uint32_BE_t Size The size of the file.

C-like structure

char I; char N;
uint8_BE_t _unk3;
uint8_BE_t hash_size;
uint16_BE_t num_tags;
uint32_BE_t num_files;

struct {
  string name;
  uint16_BE_t type;
  char flags[divru (num_files, CHAR_BIT)];
} tags[num_tags];

struct {
  string name;
  char hash[hash_size];
  uint32_BE_t size;
} files[num_files];

Download

The download file lists all files stored in the data archives. The client uses this to download files ahead of time, without it, the client will download on demand which can lead to issues. The download priority is set inside the entries with 0 being the highest and 2 the lowest however, if the game is running, missing assets in the player's vicinity take precedence.

Just like the install file, the download file is shared across all architectures and locales so utilizes the same bitfield-tag system to assess what subset of files are needed.

NOTE: partial-priority download files do not contain the actual file sizes but redefine the FileSize field to ChunkSize.

This file has this structure:

  • Header
  • Entries[Header.EntryCount]
  • Tags[Header.TagsCount]

Download Header

Type Name Description
char[2] Signature The signature for this file (always "DL")
char Version 1 < 7.3.0, 2
char ChecksumSize Always 0x10
char unk ??? Always 1
int [BE] EntryCount The amount of file entries in this file
short [BE] TagCount The amount of tag entries in this file

Download Entry

Type Name Version Added Description
char Unk 2 ??? Appears to be a boolean. Currently only set to 1 on 4 specific records
char[16] Hash 1 This hash is found in every node of the encoding file. (Reverse lookup) MD5
uint40_t [BE] FileSize 1 The compressed size of the file
char DownloadPriority 1 0 = highest, 2 = lowest
char[4] Unk 1 ???

Download Tag

Type Name Description
string Name A C-String indicating this tag's Name.
short [BE] Type Hash type
char[N] Bits an array of size N = Header.EntryCount / 8 + (Header.EntryCount % 8 > 0 ? 1 : 0); that is basically a massive bit mess. Use Schroeppel's 8 bits reverse function on it to have bits.

code-ish

struct {
/*0x00*/  char signature[2];                                // "DL"
          enum {
            download_version_1 = 1,
            download_version_2 = 2,                         // ≥ Legion (7.3.0.???)
            download_version_3 = 3,
          };
/*0x02*/  uint8_BE_t version;
/*0x03*/  uint8_BE_t hash_size_ekey;
/*0x04*/  uint8_BE_t has_checksum_in_entry;
/*0x05*/  uint32_BE_t entry_count;
/*0x09*/  uint16_BE_t tag_count;
/*0x0b*/

#if version ≥ download_version_2
    /*0x0b*/  uint8_BE_t number_of_flag_bytes;             // defaults to 0, up to 4
    /*0x0c*/

   #if version >= download_version_3
       /*0x0c*/  uint8_BE_t base_priority;                 // defaults to 0
       /*0x0d*/  char _unknown_0d[3];                       // As of 1.15.6.2-test4, this is explicitly 0. It is ignored on reading.
       /*0x10*/
   #endif
#endif
} header;

struct {
/*0x00*/  char ekey[header.hash_size_ekey];
/*0x10*/  uint40_BE_t file_size;
/*0x15*/  uint8_BE_t priority;                              // header.base_priority is subtracted on parse
/*0x16*/

#if header.has_checksum_in_entry
/*0x16*/  uint32_BE_t checksum;
/*0x1a*/
#endif

#if header.version ≥ download_version_2
          enum {
            download_flag_plugin = 1,                       // "plugin"
            download_flag_plugin_data = 2,                  // "plugin-data"
          }; 
          uint8_BE_t flags[header.number_of_flag_bytes];    // defaults to 0 if no flag bytes present
#endif

} entries[header.entry_count];

struct {
         char const name[];                                 // this string is zero terminated, no fixed size
                                                            // thus for readability we start offset at 0 here.
/*0+00*/ uint16_BE_t type;                                  // game specific. usually architecture, category, locale, os, region or alike.
/*0+02*/ char mask[divru (header.entry_count._, CHAR_BIT)]; // if bit is set, entries[bit] is part of this tag
/*0x??*/
} tag[header.tag_count];

Download Size

Build 27547 introduced the Download Size file of unknown purpose. The file is a stripped-down Download file with partial EKeys and files sorted by encoded files size. The purpose of this file is not clear.

struct Header
{
	char signature[2]; // "DS"
	uint8_t version;
	uint8_t ekeySize; // 9
	uint32_BE_t numFiles;
	uint16_BE_t numTags;
	uint40_BE_t totalSize; // Size of all files combined
};

struct TagEntry
{
	char name[]; // Null-terminated
	uint16_BE_t type;
	char fileMask[(hdr.numFiles + 7) / 8];
}

struct FileEntry
{
	char ekey[hdr.ekeySize];
	uint32_BE_t esize;
};

SizeHeader hdr;
TagEntry tags[hdr.numTags];
FileEntries files[hdr.numFiles]; // Sorted descending by esize

Patch

Type Name Description
char[2] Signature The signature for this file (always "PA")
char version 1 or 2
char file_key_size <= 0x10
char size_b <= 0x10
char patch_key_size <= 0x10
char block_size_bits 2 <= block_size_bits <= 24. block size == 2^block_size_bits.
short block_count
char flags
char[16] EncodingCkey ckey for encoding file
char[16] EncodingEkey ekey for encoding file
int DecodedSize Decoded encoding file size in bytes
int EncodedSize Encoded encoding file size in bytes
char EspecLength Length of the following string
char[EspecLength] EncodingEspec espec of encoding file
char[] ??? byte array containing blocks entries, blocks, and optional patch tail


header+entries needs to be less than 0x10000 bytes (at least in wow-18179). md5sum is only checked for header+entries, file might be larger thus.

struct PatchManifest_Header
{
  uint16_BE_t magic; // 'PA'
  uint8_t version; // 1 or 2
  uint8_t file_key_size; // <= 0x10
  uint8_t size_b; // <= 0x10
  uint8_t patch_key_size; // <= 0x10
  uint8_t block_size_bits; // 12 <= block_size_bits <= 24. max block size == 2^block_size_bits
  uint16_BE_t block_count; // (file_key_size + 20) * entry_count + sizeof (PatchManifest_Header) < 0x10000
  uint8_t unk2; // flags

#if encoding_information_apparently_added_after_18179
  uint8_t encoding_ckey[16];
  uint8_t encoding_ekey[16]; // probably since PA2
  uint32_BE_t decoded_size;
  uint32_BE_t encoded_size;
  uint8_t encoding_espec_length;
  char encoding_format[encoding_espec_length];
#endif
 } header;

struct PatchManifest_Block
{
  uint8_t last_file_ckey[header.file_key_size];
  uint8_t md5_of_block[16];
  uint32_BE_t block_offset; // in this file
} blocks[header.block_count]; // sorted ascending by key

// at positions given in PatchManifest_Block
struct block
{
  struct
  {
    uint8_t num_patches; // <= 0x10.
    uint8_t target_file_ckey[header.file_key_size];
    uint40_BE_t decoded_size;
    struct
    {
      uint8_t source_file_ekey[header.file_key_size];
      uint40_BE_t decoded_size;
      uint8_t patch_ekey[header.patch_key_size];
      uint32_BE_t patch_size;
      uint8_t unk; // some sort of patch index number. first entry seems to always be 1
    } patches[num_patches];
  } files[]; // count unspecified: read until the next file num_patches would be 0 
               // OR block would exceed max block size
};

// some files have a block of data after the last block of the patch manifest (which may be shorter than max block size). this block appears to be a patch of encoding, but the format is not understood.

CDN File Organization

Data for every CASC-based game exists on the CDN in one of two places at any given time. To reduce file-system and download overhead many files are packed into archives and indexed by archive indices, both of a different format than the CASC installations found on client systems; other "unarchived", "standalone", or "loose" files are stored as separate files on the CDN that must be downloaded independently. There are at least three different reasons why a particular file is found in one or the other:

  1. Small files typically incur larger filesystem overhead and benefit most from being packed into archives. A rough rule of thumb appears to be that files smaller than 2 MB or so are put in archives. Presumably larger files are not archived because they make it more difficult to minimize the number of archive files, which are limited in size (a 2 MB file limit would limit unused space in an archive to under 0.8%, given enough data to form a full archive to begin with).
  2. Key files such as encoding, partial-priority, TVFS (Warcraft 3 root), and the download, install, and patch manifests, as well as their respective patches, are typically stored loose for quick access and are rarely ever found in archives.
  3. As games evolve old files become obsolete and are removed from the CDN. However, the archive system means that archives can only be removed when every file in the archive is no longer needed, potentially wasting large amounts of space on the CDN - the exact opposite of the purpose of bundling files into archives to begin with. Thus as the amount of unused data in an archive grows over time, files still in use may be converted to loose files to allow the archive to be purged from the CDN, even when the files are unusually small to be found independently.

Previously, no official indices/manifests of loose files existed, and they could only be found by subtracting archived files from file lists in encoding or manifest files. Beginning with Warcraft 3 and subsequently being deployed more widely on 7/24/2018, new fields in the CDN config file link to index files containing "all" loose data or patch files on the current CDNs.

Archives

Archives are extensionless 256 MB files that are usually only stored on the Blizzard CDNs. Their naming follows the standard URL hash format using the '/data/' path type.

The structure of the archives is presumably just file fragment after file fragment. You will never need to parse it because you can just look up offset + size of your file fragment in the index files and then take the piece directly out of the archive.

The fragments are all BLTE encoded.

The filename is NOT the hash of the archive content but the hash of the index's footer.

Archive Indexes (.index)

These '.index' files reveal to the user where the compressed game files are located within the archives. All indexes (except the Archive-Group index, see below) are named after their archive (only difference is these have an extension). '.index' files are stored on the CDN using the standard hash naming scheme (remember they have an extension though). They are also located in the directory 'INSTALL_DIR/Data/indices/' for a WoW install.

Normal Index Entry Structure

  • The file is divided into 4kb chunks populated by these standard index entries of 0x18 (hex) bytes. Each chunk is zero-padded to a full 4kb, though there may be more than 0x18 bytes of padding at the end of a chunk -- be sure the check for all-null blte_header_hash fields. The last chunk is a table-of-contents, listing the LAST blte_header_hash in each chunk and checksum of each block. All checksums are the first checksumSize bytes of the MD5 of the respective data. Structure names were invented by the author of this page.
struct index_entry {
  char blte_header_hash[footer.keySizeInBytes];
  uint_BE_t(footer.sizeBytes) blte_encoded_size;
  uint_BE_t(footer.offsetBytes) offset_to_blte_encoded_data_in_archive;
};

struct index_block {
  static constexpr const block_size = footer.blockSizeKb << 10;
  index_entry entries[block_size / sizeof (index_entry)];
  char padding[block_size - sizeof (entries)];
} blocks[];

struct {
  struct {
    char last_hash[footer.keySizeInBytes]; // last hash of a block
  } entries[num_blocks];

  struct {
    char lower_part_of_md5_of_block[footer.checksumSize]; 
  } blocks_hash[num_blocks];
} toc;

struct {
  char toc_hash[checksumSize]; // client tries to read with 0x10, then backs off when smaller
  char version?;        // always 1
  char _11;             // 0
  char _12;             // 0
  char blockSizeKb?;    // Normally 4. Left-shifted by 10. Believed to be block size in KB.
  char offsetBytes;     // Normally 4 for archive indices, 6 for group indices, and 0 for loose file indices
  char sizeBytes;       // Normally 4
  char keySizeInBytes;  // Normally 16
  char checksumSize;    // Normally 8, <= 0x10
  uint32_t numElements; // BigEndian in _old_ versions (e.g. 18179)
  char footerChecksum[checksumSize];
} footer;
  • footerChecksum is calculated over the footer beginning with version when footerChecksum is zeroed
  • The archive/index name is the MD5 of the footer beginning with toc_hash

archive-group

archive-group is actually a very special .index file that is typically much larger than other indices. There is no archive attached, it is only the index. Also, the index is not found on the CDN! It is assembled by combining all index files given in the CDN config, on client side. The hash is just for verification purposes. Client side a file is assembled using the normal index file format, for caching.

It has a single difference in format to normal indices: While other indices have their offsetBytes long offset field point into the archive, for archive groups, the field also has an archiveIndex:

struct {
  uint16_BE_t archiveIndex; // Index of the archive in the CDN config's archive list
  uint32_BE_t offsetBytes;  // The offset within the specified archive
};

It does not only having the offset, but also an index into a file. Semantically that's still an offset, so no further field for size is used.

It is suggested you do not just parse indices by .index filename locally but take the config files into account. An easy heuristic is that if offsetBytes is not 4, it is a special index, either loose files or a group.

patch-archives

Patch archives are the /patch/ equivalent to the regular (data) archives on the CDN. Like archives, these are binary blobs of fragments indexed by an accompanying .index file with the same name. Again, the index is a hash, size, offset tuple, but the hash is the content hash rather than an encoding hash.

Most files in patch archives are ZBSDIFF1 blobs, though in principle any file that might be in the /patch/ namespace may be found in patch archives and must be handled accordingly.

patch-archive-group

See archive-group. There is no known difference other than the combined data being patch-archives.

Journal-based Data Files

During the installation process for a Blizzard game, the program will download the required files as requested by root, encoding, download, and install. It stores the downloaded data fragments in data files in "INSTALL_DIR\Data\data\". The program will record the content hash (BLTE-compressed hash), size, and position of the file as well as the number of the data file that it is in. It places those four parameters into journal files with the extension '.idx'.

Shared Memory

The shared memory file is called 'shmem' and is usually located in the same folder as the data and .IDX journals. This file contains the path where the data files are stored, which is the current version of each of the .IDX files, and which areas of the data files have unused space. The file is recreated every time a client is started.

Shared Memory Header Structure

  • The first part of the header.
Offset (Hex) Type Name Description
0x00 uint32_t BlockType A value indicating what type of block this is. For this block, the value is 4.
0x04 uint32_t NextBlock The offset of the next block.
0x08 char[0x100] DataPath The path of the data files. This is prefixed with "Global\" if the path is an absolute path.


  • Followed by a number of these entries. The count can be calculated like this: (NextBlock - 264 - idxFileCount * 4) / 8
Offset (Hex) Type Name Description
0x00 uint32_t Size The size of the block.
0x04 uint32_t Offset The offset of the block.


  • Followed by a number of these entries. The count is equal to number of .IDX files (usually 16).
Offset (Hex) Type Name Description
0x00 uint32_t Version The version number. Used to identify the .IDX filename.


Shared Memory Free Space Structure

After a small header, this structure is split up into two equal parts. The first part contains entries with the number of unused bytes. The second part contains entries with the position of the unused bytes.

There can be up to 1090 entries. Each of the two parts will always be 5450 bytes, so if there are fewer than 1090 entries, the rest of the bytes will be padded with '\0'.

  • The header part of the structure.
Offset (Hex) Type Name Description
0x00 uint32_t BlockType A value indicating what type of block this is. For this block, the value is 1.
0x04 uint32_t NextBlock The offset of the next block.
0x08 char[0x18] Padding Padding at the end of the header.


  • This is the number of unused bytes. There can be up to 1090 entries of these. If there are fewer, the rest of the area is padded.
Offset (Hex) Type Name Description
0x00 uint10* DataNumber This is always set to 0 in this part of the block.
0x01 uint30* Count The number of unused bytes.


  • This is the position of the unused bytes. There can be up to 1090 entries of these. If there are fewer, the rest of the area is padded.
Offset (Hex) Type Name Description
0x00 uint10* DataNumber The number of the data file where the unused bytes are located.
0x01 uint30* Offset The position within the data file where the unused bytes are located.

.IDX Journals

Example file path: INSTALL_DIR\Data\data\0e00000054.idx

.IDX journals contain a mapping from keys to the location of their data in the local CASC archives. There used to be one .IDX file per journal, and the naming scheme used to have two separate meanings. The '0e' part of the file name used to designate which archive the .IDX file was associated with. This changed halfway through the Warlords Beta. Now there are 16 indices total, and the first byte of the hex filename says which of the 16 indices it is, while the remainder of the hex filename is just a version number that increments when a new set of files is added to the local archives.

To determine which of the 16 indices a key is bucketed in, the key is hashed by xoring together each 4-bit nibble in the first 9 bytes of the key:

 uint8_t cascGetBucketIndex(const uint8_t k[16]) {
   uint8_t i = k[0] ^ k[1] ^ k[2] ^ k[3] ^ k[4] ^ k[5] ^ k[6] ^ k[7] ^ k[8];
   return (i & 0xf) ^ (i >> 4);
 }


.IDX Header Structure

The header is little-endian:

Offset (Hex) Type Name Description
0x00 uint32 HeaderHashSize The number of bytes to use for the hash at +04; usually 0x10.
0x04 uint32 HeaderHash This should equal the value of pc after calling hashlittle2 on the following HeaderHashSize bytes of the file with an initial value of 0 for pb and pc.
0x08 uint16 Unk0 Must be 7
0x0a uint8 BucketIndex The bucket index of this file; should be the same as the first byte of the hex filename.
0x0b uint8 Unk1 Must be 0
0x0c uint8 EntrySizeBytes Must be 4
0x0d uint8 EntryOffsetBytes Must be 5
0x0e uint8 EntryKeyBytes Must be 9
0x0f uint8 ArchiveFileHeaderBytes Must be 30
0x10 uint64 ArchiveTotalSizeMaximum The maximum size of a casc installation; 0x4000000000, or 256GiB.
0x18 char[8] padding The header is padded with zeroes to the next 0x10-byte boundary.
0x20 uint32 EntriesSize This is the length in bytes of the entries in the index file.
0x24 uint32 EntriesHash This should equal the value of pc after calling hashlittle2 on the following EntriesSize bytes of the file with an initial value of 0 for pb and pc.

.IDX Entry Structure

  • The rest of the file is populated by these normal entries, each 0x12 bytes in size. Structure names were invented by the author of this section because official names were not available.
Offset (Dec) Type Name Description
00 char[9] Key The first 9 bytes of the key for this entry.
09 uint40* Offset Unlike the other little-endian integers in this file, this is a big-endian 5-byte integer. The top 10 bits are the number of the archive (data.%03d), and the bottom 30 bits are the offset in that archive to the file data.
14 uint32 Size The length of the file in bytes.
  • * designates unusual data types. In C#, you can read the Offset by reading a Byte, reading a big-endian UInt32, shifting the byte left 32 bits, and ORing them together. Use a 30-bit mask (0x3fffffff) to get the file offset, and right shift the value 30 bits to get the archive number.

.XXX Data Files

Example file path: INSTALL_DIR\Data\data\data.015

These files consist of a sequence of headers with corresponding BLTE data.

Most .xxx archives begin with 16 special index cross-linking files. These files have no data and have encoding keys of XXYYbba1af16c50e1900000000000000, where XX is the index number and YY is the .xxx number. The purpose of these files is unclear.

  • The data header.
Offset (Hex) Type Name Description
0x00 char[0x10] BlteHash Encoding key of the file, in reversed byte order. Note that only as many bytes (final bytes in this reversed order) of this key as are contained in the .idx files (9) must be accurate, and the remaining 7 bytes may be 0s or otherwise altered.
0x10 uint32_t Size The size of this header + the following data.
0x14 char[0x02] Flags?? Unknown. Mostly 0. Set to 1,0 by Agent.exe on index cross-linking files, possibly indicating data-less metadata files.
0x16 uint32_t ChecksumA hashlittle(first 0x16 bytes of the header, 0x3D6BE971)
0x1A uint32_t ChecksumB Checksum of the first 0x1A bytes of the header. The exact algorithm seems to vary over time.


  • The BLTE data.
Offset (Hex) Type Name Description
0x00 char[Header.Size - 30] Data The BLTE file data. See the BLTE page.

Product Specific

In this section, the game/usage specific parts of CASC are describe. While CASC is a generic format, a lot of stuff is hardcoded, like hash sizes. Other parts are actually left up to the implementation, like root files or download tags.

World of Warcraft

Root

struct CASRecord {
    char content_key[16];  // MD5 hash of the file's raw data
    uint64 name_hash;      // Jenkins96 (lookup3) hash of the file's path 
};

enum locale_flags : uint32 {
  enUS = 0x2,
  koKR = 0x4,
  frFR = 0x10,
  deDE = 0x20,
  zhCN = 0x40,
  esES = 0x80,
  zhTW = 0x100,
  enGB = 0x200,
  enCN = 0x400,
  enTW = 0x800,
  esMX = 0x1000,
  ruRU = 0x2000,
  ptBR = 0x4000,
  itIT = 0x8000,
  ptPT = 0x10000,
};
enum content_flags : uint32 {
  LowViolence = 0x80,
  Bundle = 0x40000000,
  NoCompression = 0x80000000,
};

struct CASBlock {
    int32 num_records;
    content_flags flags;
    locale_flags locale;
    int32 fileDataIDDeltas[num_records];   // each block starts with 0, +1 is implicit per entry, so consecutive ids will have delta=0
    CASRecord records[num_records];

    int32 file_data_id (size_t index) const
    {
      return index == 0
        ? fileDataIDDeltas[index]
        : file_data_id (index - 1) + 1 + fileDataIDDeltas[index];
    }
};

while (FTell() < FileSize())
    CASBlock blocks;

Tags

Values depend on versions, semantic categories are cross version.

  • Platform: The deployment target, i.e. Windows or OSX
  • Architecture: Sub-division of the deployment target, i.e. x86_32 or x86_64
  • Locale: The same as in Localization: Files specific to a single localisation of the game.
  • Region: Equivalent to the patch server regions, i.e. us, eu, kr, tw, cn.
  • Category: A replacement for the MPQ system to tag low priority downloads: speech, text
  • Alternate: A special category for censored content.

Version specific values

This section only applies to versions WoD (6.0.1.18125) … WoD (6.0.1.18761).

Architecture = 1, Locale = 2, Platform = 3

This section only applies to versions WoD (6.0.1.18764) … WoD (6.2.2.20426).

Architecture = 1, Category = 2, Locale = 3, Platform = 4, Region = 5

This section only applies to versions WoD (6.2.2.20438) … Legion.

Platform = 1, Architecture = 2, Locale = 3, Region = 4, Category = 5

This section only applies to versions ≥ Battle (8.0.1.26604). Actually alternate was just missing above but schlumpf doesn't want to verify since when.
enum {
  platform = 1,
  architecture = 2,
  locale = 3,
  region = 4,
  category = 5,
  alternate = 0x4000,
};

hashpath

hashpath (string path) → uint32_t
{
  string normalized = toupper (path).replace (from: '/', to: '\\')
  uint32_t pc = 0, pb = 0;
  hashlittle2 (normalized, strlen (normalized), &pc, &pb);
  return pc;
}