CASC: Difference between revisions

From wowdev
Jump to navigation Jump to search
(Added missing NGDP URL)
(Update example URLs)
Line 62: Line 62:
* patch - contains patch files
* patch - contains patch files


Example URL: http://dist.blizzard.com.edgesuite.net/tpr/wow/config/5b/27/5b277d732299a79a935bc5a3f6ed3240
Blizzard regularly cleans old builds from the CDN so any example files mentioned in this article might be unavailable at the time of reading.
Example URL: http://dist.blizzard.com.edgesuite.net/tpr/wow/config/0a/6f/0a6f07f48525c4203cb2fdbf6a7d7e9a


==Config Files==
==Config Files==


===Build Config===
===Build Config===
Example file: http://dist.blizzard.com.edgesuite.net/tpr/wow/config/4d/8b/4d8bb3b6fd0416aa9371a80aaefd2e93
Example file: http://dist.blizzard.com.edgesuite.net/tpr/wow/config/0a/6f/0a6f07f48525c4203cb2fdbf6a7d7e9a


===CDN Config===
===CDN Config===
Example file: http://dist.blizzard.com.edgesuite.net/tpr/wow/config/5b/27/5b277d732299a79a935bc5a3f6ed3240
Example file: http://dist.blizzard.com.edgesuite.net/tpr/wow/config/8b/52/8b52f64f8f031ebf0cb7dec0048f018e


===Patch Config===
===Patch Config===
Line 79: Line 81:
==Data Files==
==Data Files==


 
Example index: http://dist.blizzard.com.edgesuite.net/tpr/wow/data/00/72/0072651343c29797b9da4aad2d0c93fa.index
Example index:  
Example archive: http://dist.blizzard.com.edgesuite.net/tpr/wow/data/00/72/0072651343c29797b9da4aad2d0c93fa
Example archive:  


==Patch Files==
==Patch Files==

Revision as of 10:40, 1 September 2015

CASC is the name of the new file system that Blizzard has created to replace the outdated format of MPQ.

CASC v1

The CASC file system made its first debut in the Heroes of the Storm Technical Alpha, which was hosted on Blizzard's servers in late January. The form of CASC that Heroes of the Storm uses is designated by Blizzard as "CASC". In contrast, World of Warcraft's "build-playbuild-installer" config line clearly states it is generated by "ngdptool_casc2" (NGDP stands for Next Generation Download Procotol). These are the two most substantial changes between CASC v1 and CASC v2:

  • Sections of CASC v1 data files are grouped together in collections of files we call "packages". These packages all have the same root folder, and if all of the files are not properly added with the package's base directory, the extraction process will produce an incredibly mangled directory output. This system is completely removed in CASC v2.
  • CASC v1's Root file relates content hashes to file names. CASC v2's Root file relates content hashes to name hashes. Translating name hashes to file names requires use of the Jenkins Hash function [1], which in turn requires a listfile to generate the hashes. Essentially CASC v1 has its own listfile (in root). CASC v2 does not, and requires the user to provide names.

The remainder of this article will refer exclusively to the system called CASC v2 as 'CASC'. While many parts of the file system are identical between v1 and v2, there are enough changes to make explaining both formats at once inadvisable.

NGDP

CASC was introduced simultaneously with a new system for managing configuration, blob, and installation files called NGDP, or Next Generation Download Protocol. When the acronym 'NGDP' is used in conjunction with the term CASC, it is typically referring to the hosted components of the CASC file system, and its ability to stream data on the fly.

NGDP URLs

As of October 14th, 2014, the following generic NGDP URLs are known:

NGDP Program Codes

As of September 1st, 2015, the following program codes are known to support NGDP:

Program Description
d3 Diablo 3 Retail
d3t Diablo 3 Test
hero Heroes of the Storm Retail
herot Heroes of the Storm Test
pro Prometheus (now Overwatch) Retail
prodev Prometheus (now Overwatch) Dev
s2b StarCraft II Beta
storm Heroes of the Storm (Deprecated)
wow World of Warcraft Retail
wowt World of Warcraft Test
wow_beta World of Warcraft Beta

CASC Online

Standard URL Hash Format

URL Format: http://(cdnsHost)/(cdnsPath)/(pathType)/(FirstTwoHexOfHash)/(SecondTwoHexOfHash)/(FullHash)

For WoW, cdnsHost of dist.blizzard.com.edgesuite.net should always be acceptable, and currently the cdnsPath of "tpr/wow" has never changed. If you have any doubts, check the NGDP URL for 'cdns', which contains both pieces of information.

Known path types are:

  • config - contains the three types of config files: Build configs, CDN configs, and Patch configs
  • data - contains archives, indexes, and unarchived standalone files (typically binaries, mp3s, and movies)
  • patch - contains patch files

Blizzard regularly cleans old builds from the CDN so any example files mentioned in this article might be unavailable at the time of reading.

Example URL: http://dist.blizzard.com.edgesuite.net/tpr/wow/config/0a/6f/0a6f07f48525c4203cb2fdbf6a7d7e9a

Config Files

Build Config

Example file: http://dist.blizzard.com.edgesuite.net/tpr/wow/config/0a/6f/0a6f07f48525c4203cb2fdbf6a7d7e9a

CDN Config

Example file: http://dist.blizzard.com.edgesuite.net/tpr/wow/config/8b/52/8b52f64f8f031ebf0cb7dec0048f018e

Patch Config

This configuration file was added after all of the others. It first appeared in CASC v1 for Heroes of the Storm in August 2014. It then appeared in WoW for CASC v2 around build 19000 (approximately October 1st, 2014). The purpose of this file is to reduce redundant downloads. It achieves this by directing the system to download patch files to apply and update previously downloaded material. The structure and purpose of all of the fields of this file are unknown at this time.

Data Files

Example index: http://dist.blizzard.com.edgesuite.net/tpr/wow/data/00/72/0072651343c29797b9da4aad2d0c93fa.index Example archive: http://dist.blizzard.com.edgesuite.net/tpr/wow/data/00/72/0072651343c29797b9da4aad2d0c93fa

Patch Files

File References

Files are referred to by many different pieces of data in CASC. A quick summary of them:

  • Filename: The file's real name. Note that one file can have many names - essentially, one header hash can map to many different name hashes.
  • Locale Flag:
  • Content Flag:
  • Name Hash: The file's name, after being hashed with the Jenkins Hash.
  • Header Hash: The MD5 of the BLTE header of the compressed file.
  • Content Hash: The MD5 of the entire file in its uncompressed state; the purest representation of the data.

BLTE encoded files

Files like encoding are BLTE encoded, which means before reading anything in the file, first you have to decode it. The documentation below refers to decoded files!

struct blte_header
{
  BE_uint32_t magic;             // 'BLTE';
  BE_uint32_t data_start;        // relative to begin of file 
                                 // (size of this header and the infos)
  BE_uint8_t _unk08;             // always 0xf (?)
  BE_uint24_t chunk_count;
} header;
struct blte_chunk_info
{
  BE_uint32_t in_file_size;
  BE_uint32_t logical_size;      // as in, after decoding
  MD5_DIGEST checksum;           // checksum of chunk including the encoding type
} info[header.chunk_count];

struct blte_chunk
{
  enum encoding_type : char
  {
    None = 'N',                  // plain copy the data
    Compressed = 'Z',            // zlib deflated: use zlib's inflate
    Frame = 'F',                 // read data and dive into recursively BLT encoded data (?)
    Crypt = 'E',                 // read data and decrypt with Salsa20
  };
  encoding_type encoding;
  uint8_t data[];                // size as given in corresponding blte_chunk_info::in_file_size - 1
                                 // data has to be decoded according to encoding type given
} chunks[header.chunk_count];

By pasting the decoded chunks' data one after another, you get the actual file content.

States of CASC Data

CASC data comes in all forms and sizes.

Key CASC Files

Root

File signature: None The purpose of Root is to translate Content Hashes into file names


Encoding

File signature: "EN"

The encoding file contains data which is used to map content hash to file key.

The file contains the following in order:

  • File header
  • String block #1
  • Table A header
  • Table A entries
  • Table B header
  • Table B entries
  • String block #2


Encoding Header Structure

  • The beginning of the file is compromised of this structure of 0x16 bytes. Structure names were invented by the author of this page.
Offset (Hex) Type Name Description
0x00 char[2] FileSignature "EN"
0x02 uint8_t UNK ???
0x03 uint8_t checksumSizeA The length of the checksums in table A.
0x04 uint8_t checksumSizeB The length of the checksums in table B.
0x05 uint16_t flagsA Flags for table A.
0x07 uint16_t flagsB Flags for table B.
0x09 uint32_t [BE] numEntriesA The number of entries in table A.
0x0D uint32_t [BE] numEntriesB The number of entries in table B.
0x11 uint8_t UNK ???
0x12 uint32_t [BE] stringBlockSize The size of string block #1.


Encoding Table Header Block Structure

  • Each of the tables have numEntries entries of this structure of 0x20 bytes. They are used to locate what entry in the next part of the table contains a hash and to verify the integrity of that entry once it is read.
Offset (Hex) Type Name Description
0x00 char[checksumSizeA] firstHash The hash of the first file in the entry.
0x10 char[checksumSizeA] blockHash The checksum of the entry.


Encoding Table Entry Block Structure

  • Each of the tables have numEntries entries of 4096 bytes which contains these structures, followed by padding.
Offset (Hex) Type Name Description
0x00 uint16_t keyCount The number of keys.
0x02 uint32_t [BE] fileSize The decompressed size of the file.
0x06 char[checksumSizeA] hash The hash of the file.
0x16 char[checksumSizeA*keyCount] keys The file keys belonging to the file. This can be used to look up the location of the file in the .IDX files.


Encoding Layout Table Header Block Structure

  • Each of the tables have numEntries entries of this structure of 0x20 bytes. They are used to locate what entry in the next part of the table contains a hash and to verify the integrity of that entry once it is read.
Offset (Hex) Type Name Description
0x00 char[checksumSizeB] firstKey The key of the first file in the entry.
0x10 char[checksumSizeB] blockHash The checksum of the entry.


Encoding Layout Table Entry Block Structure

  • Each of the tables have numEntries entries of 4096 bytes which contains these structures, followed by padding.
Offset (Hex) Type Name Description
0x00 char[checksumSizeB] key The key of the file.
0x10 uint32_t [BE] stringIndex The index into string block #1.
0x14 char UNK ???
0x15 uint32_t [BE] fileSize The compressed size of the file.


String blocks

The two string blocks contain descriptions of file layouts, providing information about the sections and compression mode of the files.

  • Block #1 is referenced by the layout table (see above).
  • Block #2 is the description of the encoding file itself.


The string uses the following format:

<encoding_mode>:{<comma-separated subchunks>}
Note: Usually <encoding_mode> is b for BLTE in the top chunk.


It specifies each subchunk in this form:

<size>=<encoding_mode>


<size>:

Value refers to the number of bytes that chunk (at a minimum, see below) contains.
The value might contain K, M or *.
* If K is present, multiply the number with 1024.
* If M is present, multiply the number with 1048576.
* If * is present, the chunk is "greedy" and it contains the rest of the bytes in the file in addition to any number specified.


<encoding_mode>:

Values will be either n, z, f, or c.
Note: It can also include a subchunk specifier (ex: =z:{6,mpq}) which specifies encoder parameters (ex: z:{6, mpq} means level == 6 and windowBits == 0).


Example:

b:{64=n,256K*=z}


Install

File signature: "IN"

Install Header Structure

  • The beginning of the file is compromised of this structure of 0x0A bytes. Structure names were invented by the author of this page.
Offset (Hex) Type Name Description
0x00 char[2] FileSignature "IN"
0x02 uint32 UNK ???
0x06 uint32 numEntries The number of entries in the body of the file

Install Header Entry Structure

  • The remainder of the header is populated by these header entries, each a variable size (due to the strings). Structure names were invented by the author of this page.
Type Name Description
char[] FlagName The name of the optional flag for the entry
uint16 FlagType A number shared amongst specific flags. For example, languages are '3'. Regions are '5'. Architecture type is '0'.
byte[28] FileFlags This appears to be a bit array represented in hex form. Each bit appears to represent an entry of this file; if the bit is enabled, then the flag named by FlagName is active for that file.

Install Entry Structure

  • The rest of the file is populated by these normal entries, each a variable size (due to the strings). Structure names were invented by the author of this page.
Type Name Description
char[] FileName The name of the file.
char[16] MD5 The MD5 of the uncompressed (?) file.
byte[28] Size The size of the file.

Download

File signature: "DL"

Patch

File signature: "PA" The structure and purpose of all of the fields of this file are unknown at this time.

Blizzard-Created Archives

In its natural state, the vast majority of the data for any CASC-based game exists in the archives.

Archives

Archives are extensionless 256 MB files that are usually only stored on the Blizzard CDNs. Their naming follows the standard URL hash format using the '/data/' path type.

The structure of the archives is presumably just file fragment after file fragment. You will never need to parse it because you can just look up offset + size of your file fragment in the index files and then take the piece directly out of the archive.

Archive Indexes (.index)

These '.index' files reveal to the user where the compressed game files are located within the archives. All indexes (except the Archive-Group index, see below) are named after their archive (only difference is these have an extension). '.index' files are stored on the CDN using the standard hash naming scheme (remember they have an extension though). They are also located in the directory 'INSTALL_DIR/Data/indices/' for a WoW install.

Normal Index Entry Structure

  • The entire file is populated by these standard index entries of 0x18 (hex) bytes. Structure names were invented by the author of this page.
  • NOTE: This structure uses big endian numbers.
Offset (Hex) Type Name Description
0x00 char[16] HeaderHash The MD5 of the BLTE header for the compressed fragment that this index entry represents
0x10 uint32 Offset Position of the fragment in the archive
0x14 uint32 Size Size of the fragment

Archive-Group Index (.index)

Archive-group is actually a very special '.index' file. While virtually all '.index' files are under 2 MB, the archive-group '.index' file is always over 15 MB. It is essentially a merger of all .index files, with a structure change. There is a new uint16 field that serves as an index for the array of archives from this build's CDN config.

Therefore, it is critical that you identify this outlier - if you try to parse it as a regular '.index' purely because of its extension, your program will undoubtedly fail. You can identify it because it will be named the same as the 'archive-group' hash listed in the CDN config. Additionally, it will not be listed as an archive hash in the CDN config. As discussed before, the different file structure and irregular file size are also viable methods to avoid parsing this file (or to avoid parsing the other '.index' files).

Merged Index Entry Structure

  • The entire file is populated by these 'merged' index entries of 0x1A (hex) bytes. Structure names were invented by the author of this page.
  • NOTE: This structure uses big endian numbers.
Offset (Hex) Type Name Description
0x00 char[16] HeaderHash The MD5 of the BLTE header for the compressed fragment that this index entry represents
0x10 uint16 ArchiveIndex If you placed the hashes of the 'archives = ' line of the CDN config in an array, this number would be the index for that array
0x12 uint32 Offset Position of the fragment in the archive
0x16 uint32 Size Size of the fragment

Journal-based Data Files

During the installation process for a Blizzard game, the program will download the required files as requested by root, encoding, download, and install. It stores the downloaded data fragments in data files in "INSTALL_DIR\Data\data\". The program will record the content hash (BLTE-compressed hash), size, and position of the file as well as the number of the data file that it is in. It places those four parameters into journal files with the extension '.idx'.

Shared Memory

The shared memory file is called 'shmem' and is usually located in the same folder as the data and .IDX journals. This file contains the path where the data files are stored, which is the current version of each of the .IDX files, and which areas of the data files have unused space. The file is recreated every time a client is started.

Shared Memory Header Structure

  • The first part of the header.
Offset (Hex) Type Name Description
0x00 uint32_t BlockType A value indicating what type of block this is. For this block, the value is 4.
0x04 uint32_t NextBlock The offset of the next block.
0x08 char[0x100] DataPath The path of the data files. This is prefixed with "Global\" if the path is an absolute path.


  • Followed by a number of these entries. The count can be calculated like this: (NextBlock - 264 - idxFileCount * 4) / 8
Offset (Hex) Type Name Description
0x00 uint32_t Size The size of the block.
0x04 uint32_t Offset The offset of the block.


  • Followed by a number of these entries. The count is equal to number of .IDX files (usually 16).
Offset (Hex) Type Name Description
0x00 uint32_t Version The version number. Used to identify the .IDX filename.


Shared Memory Free Space Structure

After a small header, this structure is split up into two equal parts. The first part contains entries with the number of unused bytes. The second part contains entries with the position of the unused bytes.

There can be up to 1090 entries. Each of the two parts will always be 5450 bytes, so if there are fewer than 1090 entries, the rest of the bytes will be padded with '\0'.

  • The header part of the structure.
Offset (Hex) Type Name Description
0x00 uint32_t BlockType A value indicating what type of block this is. For this block, the value is 1.
0x04 uint32_t NextBlock The offset of the next block.
0x08 char[0x18] Padding Padding at the end of the header.


  • This is the number of unused bytes. There can be up to 1090 entries of these. If there are fewer, the rest of the area is padded.
Offset (Hex) Type Name Description
0x00 uint10* DataNumber This is always set to 0 in this part of the block.
0x01 uint30* Count The number of unused bytes.


  • This is the position of the unused bytes. There can be up to 1090 entries of these. If there are fewer, the rest of the area is padded.
Offset (Hex) Type Name Description
0x00 uint10* DataNumber The number of the data file where the unused bytes are located.
0x01 uint30* Offset The position within the data file where the unused bytes are located.

.IDX Journals

Example file path: INSTALL_DIR\Data\data\0e00000054.idx

.IDX journals contain references. There used to be one .IDX file per journal, and the naming scheme used to have two separate meanings. The '0e' part of the file name used to designate which archive the .IDX file was associated with. This changed halfway through the Warlords Beta, and the current .IDX names are just iteration numbers.

.IDX Header Structure

???

.IDX Entry Structure

  • The rest of the file is populated by these normal entries, each 0x10 bytes in size. Structure names were invented by the author of this section because official names were not available.
  • Note: .IDX files are chunked into groups of 0x1000 bytes. If a chunk is not filled to exactly 0x1000 bytes, the gap will be filled with '00's.
Offset (Hex) Type Name Description
0x00 char[9] HeaderHash The MD5 of the BLTE header of the compressed file
0x09 uint10* DataNumber The number of the data file to read from
0x10.25 uint30* Offset The position to begin reading from in the data file
0x14 uint32 Size The amount to read from the data file
  • * designates unusual data types. It is probably easiest to read the DataNumber as a Byte (and put it into a UInt16) and the Offset as a UInt32. Then use bit-shifting and a mask on Offset to update DataNumber and apply a mask to update Offset.

.XXX Data Files

Example file path: INSTALL_DIR\Data\data\data.015

hashpath

hashpath (string path) → uint32_t
{
  string normalized = tolower (path).replace (from: '\\', to: '/')
  uint32_t pc = 0, pb = 0;
  hashlittle2 (normalized, strlen (normalized), &pc, &pb);
  return pc;
}