CASC: Difference between revisions

From wowdev
Jump to navigation Jump to search
(Move product specific information to TACT page)
No edit summary
Line 9: Line 9:
* CASC v1's Root file relates content hashes to file names. CASC v2's [[CASC#Root|Root]] file relates content hashes to name hashes. Translating name hashes to file names requires use of the Jenkins Hash function [http://en.wikipedia.org/wiki/Jenkins_hash_function], which in turn requires a listfile to generate the hashes. Essentially CASC v1 has its own listfile (in root). CASC v2 does not, and requires the user to provide names.
* CASC v1's Root file relates content hashes to file names. CASC v2's [[CASC#Root|Root]] file relates content hashes to name hashes. Translating name hashes to file names requires use of the Jenkins Hash function [http://en.wikipedia.org/wiki/Jenkins_hash_function], which in turn requires a listfile to generate the hashes. Essentially CASC v1 has its own listfile (in root). CASC v2 does not, and requires the user to provide names.
The remainder of this article will refer exclusively to the system called CASC v2 as 'CASC'. While many parts of the file system are identical between v1 and v2, there are enough changes to make explaining both formats at once inadvisable.
The remainder of this article will refer exclusively to the system called CASC v2 as 'CASC'. While many parts of the file system are identical between v1 and v2, there are enough changes to make explaining both formats at once inadvisable.
=File References=
Files are referred to by many different pieces of data in CASC. A quick summary of them:
* Filename: The file's real name. Note that one file can have many names - essentially, one encoding key can map to many different name hashes.
* Locale Flag:
* Content Flag:
* Name Hash: The file's name, after being hashed with the Jenkins Hash.
* Content Hash: The MD5 of the entire file in its uncompressed state; the purest representation of the data.
* Encoding Hash/Key: MD5 hash of the potentially encoded file. For unencoded files, the content hash. For chunkless [[BLTE]] files lacking a chunk table, this hash covers the entire encoded file. For chunked [[BLTE]] files, this hash covers only the [[BLTE]] headers including chunk table, as the chunk table contains hashes of the content of each chunk. A given file can be encoded in many ways, and a single content hash may potentially have multiple encoding keys.
* CDN Key: The key used to lookup a file on the CDN. Synonym of encoding key.
* Header Hash: Inaccurate synonym of encoding key.


=Journal-based Data Files=
=Journal-based Data Files=

Revision as of 21:53, 14 October 2018

Missing something? This page is being split up into separate pages. For the content transfer part of NGDP, see TACT. This page should now only contain information on the local filesystem format called CASC.

CASC is the name of the new file system that Blizzard created to replace the outdated format of MPQ.

CASC v1

The CASC file system made its first debut in the Heroes of the Storm Technical Alpha, which was hosted on Blizzard's servers in late January. The form of CASC that Heroes of the Storm uses is designated by Blizzard as "CASC". In contrast, World of Warcraft's "build-playbuild-installer" config line clearly states it is generated by "ngdptool_casc2" (NGDP stands for Next Generation Download Procotol). These are the two most substantial changes between CASC v1 and CASC v2:

  • Sections of CASC v1 data files are grouped together in collections of files we call "packages". These packages all have the same root folder, and if all of the files are not properly added with the package's base directory, the extraction process will produce an incredibly mangled directory output. This system is completely removed in CASC v2.
  • CASC v1's Root file relates content hashes to file names. CASC v2's Root file relates content hashes to name hashes. Translating name hashes to file names requires use of the Jenkins Hash function [1], which in turn requires a listfile to generate the hashes. Essentially CASC v1 has its own listfile (in root). CASC v2 does not, and requires the user to provide names.

The remainder of this article will refer exclusively to the system called CASC v2 as 'CASC'. While many parts of the file system are identical between v1 and v2, there are enough changes to make explaining both formats at once inadvisable.

Journal-based Data Files

During the installation process for a Blizzard game, the program will download the required files as requested by root, encoding, download, and install. It stores the downloaded data fragments in data files in "INSTALL_DIR\Data\data\". The program will record the content hash (BLTE-compressed hash), size, and position of the file as well as the number of the data file that it is in. It places those four parameters into journal files with the extension '.idx'.

Shared Memory

The shared memory file is called 'shmem' and is usually located in the same folder as the data and .IDX journals. This file contains the path where the data files are stored, which is the current version of each of the .IDX files, and which areas of the data files have unused space. The file is recreated every time a client is started.

Shared Memory Header Structure

  • The first part of the header.
Offset (Hex) Type Name Description
0x00 uint32_t BlockType A value indicating what type of block this is. For this block, the value is 4.
0x04 uint32_t NextBlock The offset of the next block.
0x08 char[0x100] DataPath The path of the data files. This is prefixed with "Global\" if the path is an absolute path.


  • Followed by a number of these entries. The count can be calculated like this: (NextBlock - 264 - idxFileCount * 4) / 8
Offset (Hex) Type Name Description
0x00 uint32_t Size The size of the block.
0x04 uint32_t Offset The offset of the block.


  • Followed by a number of these entries. The count is equal to number of .IDX files (usually 16).
Offset (Hex) Type Name Description
0x00 uint32_t Version The version number. Used to identify the .IDX filename.


Shared Memory Free Space Structure

After a small header, this structure is split up into two equal parts. The first part contains entries with the number of unused bytes. The second part contains entries with the position of the unused bytes.

There can be up to 1090 entries. Each of the two parts will always be 5450 bytes, so if there are fewer than 1090 entries, the rest of the bytes will be padded with '\0'.

  • The header part of the structure.
Offset (Hex) Type Name Description
0x00 uint32_t BlockType A value indicating what type of block this is. For this block, the value is 1.
0x04 uint32_t NextBlock The offset of the next block.
0x08 char[0x18] Padding Padding at the end of the header.


  • This is the number of unused bytes. There can be up to 1090 entries of these. If there are fewer, the rest of the area is padded.
Offset (Hex) Type Name Description
0x00 uint10* DataNumber This is always set to 0 in this part of the block.
0x01 uint30* Count The number of unused bytes.


  • This is the position of the unused bytes. There can be up to 1090 entries of these. If there are fewer, the rest of the area is padded.
Offset (Hex) Type Name Description
0x00 uint10* DataNumber The number of the data file where the unused bytes are located.
0x01 uint30* Offset The position within the data file where the unused bytes are located.

.IDX Journals

Example file path: INSTALL_DIR\Data\data\0e00000054.idx

.IDX journals contain a mapping from keys to the location of their data in the local CASC archives. There used to be one .IDX file per journal, and the naming scheme used to have two separate meanings. The '0e' part of the file name used to designate which archive the .IDX file was associated with. This changed halfway through the Warlords Beta. Now there are 16 indices total, and the first byte of the hex filename says which of the 16 indices it is, while the remainder of the hex filename is just a version number that increments when a new set of files is added to the local archives.

To determine which of the 16 indices a key is bucketed in, the key is hashed by xoring together each 4-bit nibble in the first 9 bytes of the key:

 uint8_t cascGetBucketIndex(const uint8_t k[16]) {
   uint8_t i = k[0] ^ k[1] ^ k[2] ^ k[3] ^ k[4] ^ k[5] ^ k[6] ^ k[7] ^ k[8];
   return (i & 0xf) ^ (i >> 4);
 }


.IDX Header Structure

The header is little-endian:

Offset (Hex) Type Name Description
0x00 uint32 HeaderHashSize The number of bytes to use for the hash at +04; usually 0x10.
0x04 uint32 HeaderHash This should equal the value of pc after calling hashlittle2 on the following HeaderHashSize bytes of the file with an initial value of 0 for pb and pc.
0x08 uint16 Unk0 Must be 7
0x0a uint8 BucketIndex The bucket index of this file; should be the same as the first byte of the hex filename.
0x0b uint8 Unk1 Must be 0
0x0c uint8 EntrySizeBytes Must be 4
0x0d uint8 EntryOffsetBytes Must be 5
0x0e uint8 EntryKeyBytes Must be 9
0x0f uint8 ArchiveFileHeaderBytes Must be 30
0x10 uint64 ArchiveTotalSizeMaximum The maximum size of a casc installation; 0x4000000000, or 256GiB.
0x18 char[8] padding The header is padded with zeroes to the next 0x10-byte boundary.
0x20 uint32 EntriesSize This is the length in bytes of the entries in the index file.
0x24 uint32 EntriesHash This should equal the value of pc after calling hashlittle2 on the following EntriesSize bytes of the file with an initial value of 0 for pb and pc.

.IDX Entry Structure

  • The rest of the file is populated by these normal entries, each 0x12 bytes in size. Structure names were invented by the author of this section because official names were not available.
Offset (Dec) Type Name Description
00 char[9] Key The first 9 bytes of the key for this entry.
09 uint40* Offset Unlike the other little-endian integers in this file, this is a big-endian 5-byte integer. The top 10 bits are the number of the archive (data.%03d), and the bottom 30 bits are the offset in that archive to the file data.
14 uint32 Size The length of the file in bytes.
  • * designates unusual data types. In C#, you can read the Offset by reading a Byte, reading a big-endian UInt32, shifting the byte left 32 bits, and ORing them together. Use a 30-bit mask (0x3fffffff) to get the file offset, and right shift the value 30 bits to get the archive number.

.XXX Data Files

Example file path: INSTALL_DIR\Data\data\data.015

These files consist of a sequence of headers with corresponding BLTE data.

Most .xxx archives begin with 16 special index cross-linking files. These files have no data and have encoding keys of XXYYbba1af16c50e1900000000000000, where XX is the index number and YY is the .xxx number. The purpose of these files is unclear.

  • The data header.
Offset (Hex) Type Name Description
0x00 char[0x10] BlteHash Encoding key of the file, in reversed byte order. Note that only as many bytes (final bytes in this reversed order) of this key as are contained in the .idx files (9) must be accurate, and the remaining 7 bytes may be 0s or otherwise altered.
0x10 uint32_t Size The size of this header + the following data.
0x14 char[0x02] Flags?? Unknown. Mostly 0. Set to 1,0 by Agent.exe on index cross-linking files, possibly indicating data-less metadata files.
0x16 uint32_t ChecksumA hashlittle(first 0x16 bytes of the header, 0x3D6BE971)
0x1A uint32_t ChecksumB Checksum of the first 0x1A bytes of the header. The exact algorithm seems to vary over time.


  • The BLTE data.
Offset (Hex) Type Name Description
0x00 char[Header.Size - 30] Data The BLTE file data. See the BLTE page.