ITSF internal file formats

Preface

In this section, where the description of a file says that an item is an offset into another file, that file may be located in the same CHM, or it may be located in an accompanying CHI file.

The different types of ITSF files contain different internal files. The list below indicates which file types contain which internal files:

CHI
/#ITBITS, /#SYSTEM, /#IDXHDR, /#STRINGS, /#TOCIDX, /#TOPICS, /#URLSTR, /#URLTBL, /#WINDOWS, /$OBJINST, /$WWKeywordLinks/Property, /$WWKeywordLinks/BTree, /$WWKeywordLinks/Data, /$WWKeywordLinks/Map, /$WWAssociativeLinks/BTree, /$WWAssociativeLinks/Data, /$WWAssociativeLinks/Map, /$WWAssociativeLinks/Property
CHM
/#ITBITS, /#SYSTEM, /#IDXHDR, /#STRINGS, /#TOCIDX, /#TOPICS, /#URLSTR, /#URLTBL, /#IVB, /#SUBSETS, /#WINDOWS, /$FIftiMain, /$OBJINST, /$WWAssociativeLinks/BTree, /$WWAssociativeLinks/Data, /$WWAssociativeLinks/Map, /$WWAssociativeLinks/Property, /$WWKeywordLinks/BTree, /$WWKeywordLinks/Data, /$WWKeywordLinks/Map, /$WWKeywordLinks/Property
CHQ
/$FIftiMain, /$OBJINST, /$TitleMap
CHW
/$OBJINST, /$HHTitleMap, /$WWKeywordLinks/BTREE, /$WWKeywordLinks/DATA, /$WWKeywordLinks/MAP, /$WWKeywordLinks/PROPERTY, /$WWAssociativeLinks/BTREE, /$WWAssociativeLinks/DATA, /$WWAssociativeLinks/MAP, /$WWAssociativeLinks/PROPERTY
ITS
These only contain format & author files.
hh.dat
/Path/file.chm/windowtype, /Path/file.chm/AdvSearchUI/Keywords, /Path/file.chm/AdvSearchUI/Properties, /Path/file.chm/Bookmarks/v1/Count, /Path/file.chm/Bookmarks/v1/n/Topic, /Path/file.chm/Bookmarks/v1/n/Url
Seen in HHA.dll or on the internet, but not seen in any ITSF files
/#GRPINF, /#INFOTYPES, /#URLS, /#BSSC

Internal file formats

/#ITBITS

The files I have seen so far have been empty or filled with zero BYTEs so who knows. My guess is that it has something to do with information types. The file where it had a non-zero size (12 zero BYTEs in VOICESDK.CHI from the MSDN) also had a non-zero /#SYSTEM code 15 (Information type checksum) entry of 0xffffffff.

/#SYSTEM

OffsetTypeComment/Value
0DWORD3 (Version number)
4/#SYSTEM entries to the EOF

/#SYSTEM entries have the following format:

OffsetTypeComment/Value
0WORDcode - see below for values & meanings
2WORDlength of data
4BYTEsdata

In the below list of the different codes the order of the codes in the /#SYSTEM file is 10, 9, 4, 2, 3, 16, 6, (5,0,1 or 0,1,5 - haven't been able to make files with all three), 7, 11, 12, 13, 14, 8 and lastly 15.

An eplanation for each of the /#SYSTEM codes
CodeExplanation
0Value of Contents file in [OPTIONS] section of the hhp file. NT
1Value of Index file in [OPTIONS] section of the hhp file. NT
2Value of Default topic in [OPTIONS] section of the hhp file. NT
3Value of Title in [OPTIONS] section of the hhp file. NT
428 or 36 byte structure:
OffsetTypeComment/Value
0DWORDLCID from the HHP file.
4DWORDOne if DBCS is in use.
8DWORDOne if full-text search is on.
0xCDWORDNon-zero if the file has KLinks.
0x10DWORDNon-zero if the file has ALinks.
0x14QWORDtimestamp - Win32 FILETIME structure. Not sure of the start year yet.
0x1CBYTE[8]0 (unknown)
5Value of Default Window in [OPTIONS] section of the hhp file. NT
6Value of Compiled file in [OPTIONS] section of the hhp file. This is the lowercase of the stem of the CHM file name. If the name of the CHM is "..\bar\foo\ FOO-Bar . chm jimmy is a poo-bum" then this will be " foo-bar ". NT
7DWORD present in files with "Binary Index=Yes" and "Compatibility=1.1 or later".
8Rare. VOICESDK.CHM & CHI from the MSDN has one. Each entry is 16 BYTEs:
OffsetTypeComment/Value
0DWORD0, 4 in some (unknown)
4DWORDOffset in /#STRINGS file. An abbreviation.
8DWORD3 where 1st DWORD is 0, 5 where it is 4 (unknown)
0xCDWORDOffset in /#STRINGS file. An explanation of the abbreviation.
9The version/program that the CHM was compiled by - shown in the version dialog as "Compiled with %s" where %s is what is in the /#SYSTEM file. If compiled with the MS HTML Help Author dll then it will be something like "HHA Version 4.74.8702". I'm speculating that it comes directly from the resource strings of HHA.dll (I saw it there in Unicode, but haven't yet tried altering it >;-). NT
10time_t timestamp (DWORD). Not sure of the start year yet.
11DWORD present in files with "Binary TOC=Yes" and "Compatibility=1.1 or later".
12Number of information types (DWORD).
13The /#IDXHDR file contains exactly the same bytes. See below for more info
14Rare. The ones I saw were from MS Word 2000. My guess is that it is an MSOffice extension (or maybe not) that overrides the names & window types of the navigation tabs. DWORD number of windows to override, 2 ANSI NT strings for each window. The first is the text for the tab & the second is probably the name of the window type to use. (eg 2, "&Answer Wizard\0MsoHelpAWDlg\0&Index\0MsoHelpKeyDlg\0")
These are from the Custom tab variables of the [OPTIONS] section of the hhp file. The resources from MSOHELP.EXE have a weird .reg file that gives the CLSIDs involved in the provision of these dialogs.
15Information type checksum (DWORD). Unknown algorithm & data source.
16Value of Default Font in [OPTIONS] section of the hhp file. NT
17-65535Not yet seen. Please let us know if you see these.

/#IDXHDR

This has exactly the same bytes as the code 13 entry in the /#SYSTEM file and is 4096 bytes long.

OffsetTypeComment/Value
0char[4]T#SM
4DWORDUnknown timestamp/checksum
8DWORD1 (unknown)
0xCDWORDNumber of topic nodes including the contents & index files
0x10DWORD0 (unknown)
0x14DWORDUnknown. Often -1.
0x18DWORD0 (unknown)
0x1CDWORD0 (unknown)
0x20DWORD[8]Unknown. Often -1.
0x40DWORD0 (unknown)
0x44DWORD0/1 (unknown)
0x48DWORDNumber of files in the [MERGE FILES] list
0x4CDWORDUnknown. Often 0.
0x50DWORDsList of offsets in the /#STRINGS file that are the [MERGE FILES] list
+00 BYTEs to the EOF (unknown)

/#WINDOWS

This file contains information on the window types in the CHM. It has the following format:

OffsetTypeComment/Value
0DWORDNumber of entries in the file
4DWORDSize of each of the entries in the file (188 or 196)
8/#WINDOWS entries to the EOF

/#WINDOWS entries are basically HH_WINTYPE structures as specified in htmlhelp.h. Note the first DWORD can be used to specify different versions of this structure. Also note that the HHW docs show a different structure to htmlhelp.h. Therefore many CHM files need to be surveyed to find structures with sizes other than 188 or 196. In the description of /#WINDOWS entries below, Arg n means that that item is argument n of the window definition in the hhp file, either converted to a DWORD or to an offset in the indicated file:

The format of each /#WINDOWS entry.
OffsetTypeComment/Value
0DWORDSize of the entry (188 in CHMs compiled with "Compatibility=1.0", 196 in CHMs compiled with "Compatibility=1.1 or later")
4DWORD0 (unknown) - but htmlhelp.h indicates that this is "BOOL fUniCodeStrings; // IN/OUT: TRUE if all strings are in UNICODE"
8DWORDArg 0. Offset in /#STRINGS file.
0xCDWORDWhich window properties are valid & are to be used for this window. See the table below.
0x10DWORDArg 10.
0x14DWORDArg 1. Offset in /#STRINGS file.
0x18DWORDArg 14.
0x1CDWORDArg 15.
0x20RECTArg 13. Order left, top, right & bottom.
0x30DWORDArg 16.
0x34DWORD0 (unknown) - but htmlhelp.h indicates that this is "HWND hwndHelp; // OUT: window handle"
0x38DWORD0 (unknown) - but htmlhelp.h indicates that this is "HWND hwndCaller; // OUT: who called this window"
0x3CDWORD0 (unknown) - but htmlhelp.h indicates that this is "HH_INFOTYPE* paInfoTypes; // IN: Pointer to an array of Information Types"
0x40DWORD0 (unknown) - but htmlhelp.h indicates that this is "HWND hwndToolBar; // OUT: toolbar window in tri-pane window"
0x44DWORD0 (unknown) - but htmlhelp.h indicates that this is "HWND hwndNavigation; // OUT: navigation window in tri-pane window"
0x48DWORD0 (unknown) - but htmlhelp.h indicates that this is "HWND hwndHTML; // OUT: window displaying HTML in tri-pane window"
0x4CDWORDArg 11.
0x50BYTE[16]0 (unknown) - but htmlhelp.h indicates that this is a RECT that is "RECT rcHTML; // OUT: HTML window coordinates" & the HHW docs say "Specifies the coordinates of the Topic pane."
0x60DWORDArg 2. Offset in /#STRINGS file.
0x64DWORDArg 3. Offset in /#STRINGS file.
0x68DWORDArg 4. Offset in /#STRINGS file.
0x6CDWORDArg 5. Offset in /#STRINGS file.
0x70DWORDArg 12.
0x74DWORDArg 17.
0x78DWORDArg 18.
0x7CDWORDArg 19 (unknown) - but htmlhelp.h indicates that this is "int tabpos; // IN/OUT: HHWIN_NAVTAB_TOP, HHWIN_NAVTAB_LEFT, or HHWIN_NAVTAB_BOTTOM".
0x80DWORDArg 20 (unknown) - but htmlhelp.h indicates that this is "int idNotify; // IN: ID to use for WM_NOTIFY messages"
0x84BYTE[20]0 (unknown) - but htmlhelp.h indicates that this is "BYTE tabOrder[HH_MAX_TABS + 1]; // IN/OUT: tab order: Contents, Index, Search, History, Favorites, Reserved 1-5, Custom tabs"
0x98DWORD0 (unknown) - but htmlhelp.h indicates that this is "int cHistory; // IN/OUT: number of history items to keep (default is 30)"
0x9CDWORDArg 7. Offset in /#STRINGS file.
0xA0DWORDArg 9. Offset in /#STRINGS file.
0xA4DWORDArg 6. Offset in /#STRINGS file.
0xA8DWORDArg 8. Offset in /#STRINGS file.
0xACBYTE[16]0 (unknown) - but htmlhelp.h indicates that this is a RECT that is "RECT rcMinSize; // Minimum size for window (ignored in version 1)"
Everything after here is only present in CHMs compiled with "Compatibility=1.1 or later".
0xBCDWORD0 (unknown) - but htmlhelp.h indicates that this is "int cbInfoTypes; // size of paInfoTypes;"
0xC0DWORD0 (unknown) - but htmlhelp.h indicates that this is "LPCTSTR pszCustomTabs; // multiple zero-terminated strings"
Flags used to specify which values are valid.
ValueValid property
0x00000002Navigation Pane Style.
0x00000004Style Flags.
0x00000008Extended Style Flags.
0x00000010Initial Position.
0x00000020Navigation Pane Width.
0x00000040Show state.
0x00000080Info types.
0x00000100Buttons.
0x00000200Navigation Pane initially closed state.
0x00000400Tab pos.
0x00000800Tab order.
0x00001000History count.
0x00002000Default Pane.
0x?????000The rest of the values either do nothing or are unknown. Please let us know if you find out what the rest are.

/#STRINGS

This file is a list of ANSI NT strings. The first is just a NIL character so that offsets to that file can specify zero & get a valid string. The strings are in this order; "\0", [WINDOWS] (Arg 0, Arg 1, Arg 7, Arg 9, Arg 2, Arg 3, Arg 4, Arg 5, Arg 6, Arg 8) #n..., Contents_0_Entry_title, Index_0_Keyword, Contents_Image_file, Contents_Font, Contents_Default_frame, Contents_Default_window, [MERGE FILES] #n...

/#TOCIDX

Present in files with a non-empty contents file, "Binary TOC=Yes" and "Compatibility=1.1 or later".

/#TOPICS

This file contains information on the topics present. It is sorted by URL.

Each entry has the following format.

OffsetTypeComment/Value
0DWORDUnknown
4DWORDOffset in /#STRINGS file of title. -1 = no title.
8DWORDOffset in /#URLTBL of entry containing offset to /#URLSTR entry containing the URL.
0xCDWORD2 indicates not in contents, 6 indicates that it is in the contents (unknown)

/#URLSTR

Before all the entries is an unknown byte.

Each entry has the following format.

OffsetTypeComment/Value
0QWORD0 (unknown)
8NT string

/#URLTBL

Each entry has the following format.

OffsetTypeComment/Value
0DWORDUnknown
4DWORDUnknown
8DWORDOffset in /#URLSTR file of URL entry containing URL.

/#IVB

This is basically the [ALIAS] section of the HHP file.

OffsetTypeComment/Value
0DWORDSize of the file minus 4 (num entries = (filelen-4)/8)
4/#IVB entries to the EOF

/#IVB entries have the following format.

OffsetTypeComment/Value
0DWORDThe value of the alias
4DWORDOffset in /#STRINGS file of the file to show

/#SUBSETS

This file is present when the [SUBSETS] section is present in the HHP file.

OffsetTypeComment/Value
0WORD0 (unknown)
2WORDNumber of bytes taken up by the subset entries.
4Subset entries.

The subset entries currently seem to be garbage left over from previous usage of the same memory locations. Based on the number of bytes per non-whitespace line in the [SUBSETS] section each subset entry is 12 BYTEs in length.

/$FIftiMain

Empty when "Full-text search=No" or when no HTML files have been indexed. Holds the full-text search information. Absolutely no line numbers/offsets to files are stored! If you have a word longer than 99 characters in a HTML file then it seems the indexing routines will die during indexing of that file and then skip on to the next one. All word sorting, processing and storage is done case-insensitively and is not case-preserving. Note that files with an extension other than .htm/.html will not contribute keywords to this fast-search index. The function of this file seems to be to store the words found in any of the HTML files, so the search code can quickly eliminate those words that are not present.

All the below stuff has only been tested with one input HTML file.

The file begins with a header that is 0x400 bytes in length.

OffsetTypeComment/Value
0BYTE[4]0x00 0x00 0x28 0x00 (unknown)
4DWORDNumber of HTML files indexed after any automatic splitting.
8DWORDOffset to the last word tree block (4096 less than the file length)
0xCDWORD0 (unknown)
0x10DWORDUnknown.
0x14DWORDOffset to the last word tree block (4096 less than the file length)
0x18WORD1/2 (unknown)
0x1ADWORD7 (unknown)
0x1EBYTE2 (unknown)
0x1FBYTEUnknown.
0x20BYTE2 (unknown)
0x21BYTEUnknown.
0x22BYTE2 (unknown)
0x23BYTEUnknown.
0x24BYTE[10]0 (unknown)
0x2EDWORDLength of the word tree blocks (4096).
0x32DWORD0/1 (unknown)
0x36DWORDWord index of the last duplicate.
0x3ADWORDCharacter index of the last duplicate. From the first character of the first word. The whitespace after tags is not included. & type things are counted as one character. Line endings are not counted in this.
0x3EDWORDLength of the longest word in the list not including NT (maximum of 99).
0x42DWORDNumber of words including duplicates.
0x46DWORDNumber of words not including duplicates.
0x4ADWORDThe total length of all the words including duplicates is this DWORD plus the next one. It is unknown how the split is performed.
0x4EDWORDThis one is usually smaller than the previous one.
0x52DWORDTotal length of all the words not including duplicates.
0x56DWORDLength of unused/null bytes at the end of the word block (if only 1 block, more than total if > 1 block - possible some free space in tables).
0x5ADWORD0 (unknown)
0x5EDWORDOne less than the number of HTML files indexed (not entirely sure)
0x62BYTE[24]0 (unknown)
0x7ADWORD0x4E4 (unknown)
0x7EDWORDLCID from the HHP file.
0x82BYTE[894]0 (unknown)

The header is followed by pairs of unknown variable size blocks (presumably a table of urls) and word tree blocks.

The blocks containing the word trees are 4096 bytes in length.

If there is 2 or more word tree blocks then the second last one will have a zero next offset and the last one will only have a WORD header that indicates the length of free space at the end of the current word tree block. Also the last one will have different word entries and there is no table before the last word block.

Normal header
OffsetTypeComment/Value
0DWORDOffset to the next word tree block. 0 if this is the second last word tree block or there is only one word tree block.
4WORD0 (unknown)
6WORDLength of free space at the end of the current word tree block.

This is followed by WORD entries:

Normal entries
OffsetTypeComment/Value
0BYTELength of the word/partial word in this entry including the NT (Don't count on the NT though). Maximum of 100.
1BYTEPosition in the word where characters are placed.
2BYTEsLength bytes make up the word or part of the word. NT (Don't count on the NT though)
+0BYTEUnknown. Some kind of block number?
+1BYTEIndex number
+2BYTEUnknown. Some kind of block number?
+3DWORDUnknown. Some kind of block number?
+7BYTEHow much to increase the index number by for the next entry.

I found a bug in the normal entries of several CHMs, where the length is 1, the position points to the NT and the data is an 0x02 BYTE. This means that the NT will be overwritten & invalid address might be accessed, unless the reader is robust. Besides this no such 0x02 byte occurred in the HTML & if it did it would not be considered part of a word, so perhaps this has another meaning, like the BYTEs being ENCINTs.

Last block entries
OffsetTypeComment/Value
0BYTEOne more than the length of the word/partial word in this entry.
1BYTEPosition in the word where characters are placed (0).
2BYTEsLength bytes make up the word or part of the word. Not NT
+0BYTEUnknown.
+1BYTEUnknown.
+2DWORD0/1 (unknown)

WORDs are made up of the following characters stored as is: 0x01 (buggy), 0-9, a-z, _, 0xDE, 0xFE. The following are converted and stored: A-Z are converted to lower case; 0x8A, 0x9A are converted to s; 0x8C, 0x9C are converted to oe; 0x9F, 0xDD, 0xFD, 0xFF are converted to y; 0xC0-0xC5, 0xE0-0xE5 are converted to a; 0xC6, 0xE6 are converted to ae; 0xC7, 0xE7 are converted to c; 0xC8-0xCB, 0xE8-0xEB are converted to e; 0xCC-0xCF, 0xEC-0xEF are converted to i; 0xD0 is converted to d; 0xD1, 0xF1 are converted to n; 0xD2-0xD8, 0xF0, 0xF2-0xF8 are converted to o; 0xD9-0xDC, 0xF9-0xFC are converted to u; 0xDF is converted to ss. These conversons may depend on the codepage, character set, font and language set in the HHP file (I'm just guessing here). There are a few bugs: An 0x1 in a word causes a space to be placed at the end of the word and then the word is joined to the next word. This bug affects the fields in the header. Weird bug where if the word is 16 characters in length then the word is doubled plus the first 7 chars in length. And probably many more hiddden ones.

Different blocks of bytes. Several types. Methinks it works like a tree. At each letter of a word you can either terminate (if no duplicates at that letter) and specify the rest of the word or you can branch out to each variant of that letter. The advantage of this is that you don't need to search a huge list every time you do a search.
nope first you have the whole word at level zero then you take the rest of the words that begin with the same character & sort them I think there are different blocks for terminal (leaves) & branching nodes (branches)
The words are set to lowercase.
There would also be some sort of table to point to the urls that contain the words.
I think that the word blocks form a tree-like structure 1. slurp all the words out of the HTML 2. sort them 3. weed out duplicates

/$OBJINST

More information is needed on this file.

From the name and the number of GUIDs present I guess it has something to do with ActiveX objects.

OffsetTypeComment/Value
0DWORD0x04000000 (unknown)
4DWORDNumber of entries

This is followed by an listing, and each listing entry is as follows

0DWORDOffset of the entry in this file
4DWORDLength of the entry

The listing is followed by the entries one after another at offsets specified in the listing.

There are 2 known types of entries. The first seems to be made up of up to 3 different sub entries. The second is a 36 BYTE structure.

The first entry
OffsetTypeComment/Value
0GUID{4662DAAF-D393-11D0-9A56-00C04FB68BF7}
0x10DWORD0x04000000 (unknown) Possibly a big-endian version number of the class that the GUID refers to.
0x14DWORDUnknown. Methinks bitflags that somehow affect the size of entries that have the 0x04000000 DWORD, like each bit specifies the presence/absence of a specific subentry.
0x18DWORD0x4e4 (unknown)
0x1CDWORDLCID from the HHP file.
0x20BYTEsunknown
+0Entries

I haven't been able to find any files without the data for bits 0 & 1 so I can't really say exactly how big the header is and which bytes are part of the bit 0 block and which are part of the bit 1 block. Together, though, bits 0 & 1 account for a large bulk of repeatedly increasing byte blocks of 10 bytes each, plus something else at the end. I suspect that the repeats are for bit 0 and the stuff at the end is bit 1. As to the function of these two bits blocks, well there are no GUIDs and no other clues, so who knows.

bit 2. Only present when "Full text search stop list file" has been specified in the HHP.
OffsetTypeComment/Value
0char[4]""(\0
4DWORDLength in bytes of the entries not including the last zero word.
8BYTE[32]0 (unknown)
0x28Entries. The last entry has a zero length word.
bit 2 entries
OffsetTypeComment/Value
0WORDLength of the word
2char[length]ANSI string from the stop list file. Not NT.
bit 3
OffsetTypeComment/Value
0GUID{8FA0D5A8-DEDF-11D0-9A61-00C04FB68BF7}
0x10DWORD0x04000000 (unknown) Possibly a big-endian version number of the class that the GUID refers to.
0x14DWORD1 (unknown)
0x18DWORD0x4e4 (unknown)
0x1CDWORDLCID from the HHP file.
0x20DWORD0 (unknown)
The second entry
OffsetTypeComment/Value
0GUID{4662DAB0-D393-11D0-9A56-00C04FB68B66}
0x10DWORD666 (May represent the version of the class that the GUID refers to)
0x14DWORD0x4e4 (unknown)
0x18DWORDLCID from the HHP file.
0x1CDWORDUnknown. Almost always 10031. Also 66631 (accessib.chm from the MSDN).
0x20DWORD0 (unknown)

/$HHTitleMap

The file begins with a WORD indicating the number of entries.

Each entry has the following format:

OffsetTypeComment/Value
0WORDLength of the file stem.
2BYTEsFile stem. ANSI string. Not NT.
+0DWORDUnknown.
+4DWORDUnknown. Same value as previous DWORD.
+8DWORDLCID of the specified file.

/$TitleMap

The file begins with a WORD indicating the number of entries.

Each entry is 68 BYTEs in length and has the following format:

OffsetTypeComment/Value
0BYTE[25]File stem. ANSI NT fixed length string.
0x19BYTE[25]Unknown. Seems to be RAM litter, but contains paths, file names, zero bytes, DWORDs and mixtures.
0x32WORDAn index number that begins at 1 and is incremented by 1 for each entry.
0x34DWORDUnknown.
0x38DWORDUnknown. Same value as previous DWORD.
0x3CDWORDLCID of the specified file.
0x40DWORDNumber of topic nodes including the contents & index files in the specified file.

This file has a 76 byte header, then 2048 byte entries. A WORD at offset 4 indicates the entry size. The number of entries is written in the header as a DWORD at offset 0x26. A 46|42 byte header then unicode strings (NT) followed by 27|25 bytes of unknown stuff. Followed by NULL bytes up to the end of the entry.

This file contains entries that are 13 bytes in length. All known entries have thus far contained the following bytes: 00000000 05000000 80000000 00.

Begins with a WORD indicating the number of entries. Each entry is 2 DWORDs. The first is an offset of some kind & the second is a consecutively increasing index number.

If there are no ALinks in the CHM then this will be a zero DWORD.

This file has a 76 byte header, then 2048 byte entries. The number of entries is written in the header as a DWORD at offset 0x26.

This file contains entries that are 13 bytes in length. All known entries have thus far contained the following bytes: 00000000 05000000 80000000 00.

Begins with a WORD indicating the number of entries. Each entry is 2 DWORDs. The first is an offset of some kind & the second is a consecutively increasing index number.

If there are no KLinks in the CHM then this will be a zero DWORD.

/Path/file.chm/windowtype

Begins with a DWORD indicating the size of the file in bytes (I have only seen 44). I think it is a cache of bits of the windowtype entry from the /#WINDOWS file of the \Path\file.chm CHM file. Some of the data, sich as window size & position, will be different if the user has customized it.

/Path/file.chm/AdvSearchUI/Keywords

UTF-16 NT string. Each search item is separated by a UTF-16 Line Feed character. The string is followed by an unknown WORD.

/Path/file.chm/AdvSearchUI/Properties

DWORD. Only the lowest 3 bits are used. "Match similar words" is controlled by bit 0. "Search titles only" is controlled by bit 1. "Search previous results" is controlled by bit 2. Note that since previous search results are not stored anywhere as yet HH will uncheck the "Search previous results" checkbox even if its bit is on. IMHO this is a bug: HH should automatically search the whole file if there are no previous results and the checkbox is checked.

/Path/file.chm/Bookmarks/v1/Count

A DWORD indicating the number of favourites stored for the \Path\file.chm CHM file.

/Path/file.chm/Bookmarks/v1/n/Topic

An NT UTF-16 string showing the topic name of bookmark number n (n is zero based).

/Path/file.chm/Bookmarks/v1/n/Url

An NT UTF-16 string showing the URL of bookmark number n (n is zero based). It is a fully qualified path into the \Path\file.chm CHM file.


Please let us know if you find any other internal files or figure out formats of any internal files.