Not too long after the last post, it became aparent that the disk I was analyzing wasn't a valid filesystem. Possibly due to a transfer error, several bytes were missing resulting in disk structures that weren't aligned with the addresses where they should've resided.
After generating a new image I was able to make alot of headway on the analysis. To start off a valid metadata page address became immediately aparent on page 0x1e. Recall that page 0x1e is the first metadata page residing at a fixed / known location after the start of the partition:
bytes 0xA0-A7: 90 19 00 00 00 00 00 00 0xA8-AF: 67 31 01 00 00 00 00 00
Pages 0x1990 and 0x13167 are valid metadata pages containing similar contents. Most likely one is a backup of the other. Assuming the first record is the primary copy (0x1990).
Note this address appears at byte 0xA0 on page 0x1E. Byte 0xA0 is referenced earlier on in the page:
byte 0x50: A0 00 00 00 02 00 00 00 B0 00 00 00 18 00 00 00
So it is possible that this page address is not stored at a static location but at a offset referenced earlier in the page.
The System Table
The word 6 value (previously refered to as virtual page number) of page 0x1990 is '0' indicating this is a critical table. Lets call this the System Table for the reasons found below.
This page contains 6 rows of 24-byte entries, each containing a valid metadata partition, some flags, and a 16-byte unique id/checksum of some sort.
Early on in the page the table header resides:
byte 0x58: 06 00 00 00 98 00 00 00 B0 00 00 00 C8 00 00 00 E0 00 00 00 F8 00 00 00 10 01 00 00
06 is the number of records and each dword after this contains the offset from the very start of the page to each table record:
table offsets: 98, B0, C8, E0, F8, 110
Each table record has a page id, flags, and some other unique qword of some sort (perhaps an object id or checksum),
page ids: corresponding virtual page id values: 2c2 2 22 E 28 D 29 C 2c8 1 2c5 3
These correspond to the latest revisions of the critical system pages highlighted in previous analysis.
We've previously established Virtual Page 0x2 contains the object table and upon furthur examination of the keys (object id's) and values (page id's)' we see object 0x0000000000006000000000000000000 is the root directory (this is consistent across images).
The format of a directory page varies depending its type. Like all metadata-pages the first 0x30 bytes contains the page metadata. This is followed by a attribute of unknown purpose (seems to be related to the page's contents, perhaps a generic bucket / container descriptor).
This is followed by the table header attribute, 0x20 bytes in length.
This attribute contains:
- bytes 0x4-0x7: the total table length of the table. Note this length includes this attribute so 0x20 should be subtracted before parsing
- 0xC-0xD: flags seems to indicate the intent of the table
Table Type Flags:
- 00 02 - directory list
- 01 03 - b+ tree
Table records here work like any other table consisting of
- the length of the record, (4 bytes)
- offset to the key, (2 bytes)
- length of the key, (2 bytes)
- flags, (2 bytes)
- offset to the value, (2 bytes)
- length of the value (2 bytes)
- padding (2 bytes)
The semantics of the record values differ depending on the table type.
Directory lists contain:
- keys: file names
- values: file tables containing file timestamps and data pages
B+ trees contain:
- keys: b+ node id (file names)
- values: directory pages
When iterating over directory list records, the record flags seem to indicate record context. A value of '4' stored in the record flags seems to indicate a historical / old entry, for example an old directory name before it was renamed (eg 'New Folder'). The files / directories we are interested in contain '0' or '8' in the record flags.
The intent of each matching directory list record can be furthur deduced by the first 4 bytes in its key which may be:
0x00000010 - directory information 0x00020030 - subdirectory - name will be the rest of the key 0x00010030 - file - name will be the rest of the key 0x80000020 - ???
In the case of subdirectories, the first 16 bytes of the record value will contain the directory object id. The object table can be used to look this up to access its page.
For B+ trees the record values will contain the ids of pages containing directory records (and possibly more B+ levels though I didn't verify this). Full filesystem traversal can be implemented by iterating over the root tree, subdirs, and file records.
File metadata is stored as a table embedded directly into the directory table which the file is under.
Each file table always starts with an attribute 0xA8 length containing the file timestamps (4 qwords starting at byte 0x28 of this attribute) & file length (starting at byte 0x68 of this attribute).
Note the actual units of time which the timestamps represent are still unknown.
After this there exists several related metadata attributes.
The second attribute (starting at byte 0xA8 of the file table):
20 00 00 00 # length of this record A0 01 00 00 # length of this record + next record D4 00 00 00 # amount of padding after next record 00 02 00 00 # table type / flags ? 74 02 00 00 # next 'insert' address ? 01 00 00 00 # number of records ? 78 02 00 00 # offset to padding 00 00 00 00
The next record looks like a standard table record as we've seen before:
80 01 00 00 # length of this record, note this equals 2nd dword value of last record minus 0x20 10 00 0E 00 # offset to key / key length 08 00 20 00 # flags / offset to value 60 01 00 00 # value length / padding
The key of this record starts at 0x10 of this attribute and is 0x0E length:
60 01 00 00 00 00 00 00 80 00 00 00 00 00 00
The value starts at attribute offset 0x20 and is of length 0x160. This value contains yet another embeded attribute:
88 00 00 00 # length of attribute 28 00 01 00 01 00 00 00 20 01 00 00 20 01 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00 00 00 2C 05 02 00 # length of the file 00 00 00 00 2C 05 02 00 # length of the file 00 00 00 00 # 0's for the rest of this attribute
The file length is represented twice in this attribute (perhaps allocated & actual lengths)
The next attribute is as follows:
20 00 00 00 # length of attribute 50 00 00 00 # length of this attribute + length of next attribute 84 00 00 00 # amount of padding after this attribute 00 02 00 00 # ? D4 00 00 00 # next insert address 01 00 00 00 # ? D8 00 00 00 # offset to padding 00 00 00 00
The format of this attribute looks similar to the second in the file (see above) and seems to contain information about the next record(s). Perhaps related to the 'bucket' concept discussed here
At first glance the next attribute looks like another standard record but the key and value offsets are the same. This attribute contains the starting page # of the file content
30 00 00 00 # length of this record 10 00 10 00 # key offset / length ? 00 00 10 00 # flags / value offset ? 20 00 00 00 # value length / padding ? 00 00 00 00 00 00 00 00 0C 00 00 00 00 00 00 00 D8 01 00 00 # starting page of the file 00 00 00 00 00 00 00 08 00 00 00 00
For larger files there are more records following this attribute, each of 0x30 length, w/ the same record header. Many of the values contain the pages containing the file contents, though only some have the same format as the one above.
Other records may correspond to compressed / sparse attributes and have a different format.
The remainder of this attribute is zero and closes out the third attribute in the file record.
After this there is the amount of padding described by the second attribute in the file (see above) after which there are two more attributes of unknown purpose.
After investigation it seems the ReFS file system driver doesn't clear a page when copying / overwriting shadow pages. Old data was aparent after valid data on newer pages. Thus a parser cannot rely on 0'd out regions to acts as deliminators or end markers.
Using the above analysis I threw together a ReFS file lister that iterates over all directories and files from the root. It can be found on github here.
Use it like so:
ruby rels.rb --image foo.image --offset 123456789
Besides verifying all of the above, the next major action items are to extract the pages / clusters containing file data as well as all file metadata.
Before we begin, I've found these to be the best existing public resources so far concerning the FS, they've helped streamline the investigation greatly.
 blogs.msdn.com - Straight from the source, a msdn blog post on various concepts around the FS internals.
 williballenthin.com - An extended analysis of the high level layout and data structures in ReFS. I've verified alot of these findings using my image locally and expanded upon various points below. Aspects of the described memory structures can be seen in the images locally.
Note in general it's good to be familiar w/ generic FS concepts and ones such as B+ trees and journaling.
Also familiarity w/ the NTFS filesystem helps.
Also note I'm not guaranteeing the accuracy of any of this, there could be mistakes in the data and/or algorithm analysis.
Volume / Partition Layout
The size of the image I analyzed was 92733440 bytes with the ReFS formatted partition starting at 0x2010000.
The first sector of this partition looks like:
byte 0x00: 00 00 00 52 65 46 53 00 00 00 00 00 00 00 00 00 byte 0x10: 46 53 52 53 00 02 12 E8 00 00 3E 01 00 00 00 00 byte 0x20: 00 02 00 00 80 00 00 00 01 02 00 00 0A 00 00 00 byte 0x30: 00 00 00 00 00 00 00 00 17 85 0A 9A C4 0A 9A 32
Since assumably some size info needs to be here, it is possible that:
vbr bytes 0x20-0x23 : bytes per sector (0x0200)
vbr bytes 0x24-0x27 : sectors per cluster (0x0080)
1 sector = 0x200 bytes = 512 bytes
0x80 sectors/cluster * 0x200 bytes/sector = 0x10000 bytes/cluster = 65536 = 64KB/cluster
Clusters are broken down into pages which are 0x4000 bytes in size (see  for page id analysis).
In this case:
0x10000 (bytes / cluster) / 0x4000 (bytes/page) = 4 pages / cluster
0x4000 (bytes/page) / 0x200 (bytes/sector) = 0x20 = 32 sectors per page
VBR bytes 0-0x16 are the same for all the ReFS volumes I've seen.
This block is followed by 0's until the first page.
According to :
"The roots of these allocators as well as that of the object table are reachable from a well-known location on the disk"
On the images I've seen the first page id always is 0x1e, starting 0x78000 bytes after the start of the partition.
Metadata pages all have a standard header which is 0x30 (48) bytes in length:
byte 0x00: XX XX 00 00 00 00 00 00 YY 00 00 00 00 00 00 00 byte 0x10: 00 00 00 00 00 00 00 00 ZZ ZZ 00 00 00 00 00 00 byte 0x20: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
bytes 0/1 (XX XX) is the page id which is sequential and corresponds to the 0x4000 offset of the page
byte 2 (YY) is the sequence number
byte 0x18 (ZZ ZZ) is the virtual page number
The page id is unique for every page in the FS. The virtual page number will be the same between journals / shadow pages though the sequence is incremented between those.
From there the root page has a structure which is still unknown (likely a tree root as described  and indicated by the memory structures page on ).
The 0x1f page is skipped before pages resume at 0x20 and follow a consistent format.
Page Layout / Tables
After the page header, metadata pages consist of entries prefixed with their length. The meaning of these entities vary and are largely unknown but various fixed and relational byte values do show consistency and/or exhibit certain patterns.
To parse the entries (which might be refered to a records or attributes), one could:
- parse the first 4 bytes following the page header to extract the first entry length
- parse the remaining bytes from the entry (note the total length includes the first four bytes containing the length specification).
- parse the next 4 bytes for the next entry length
- repeat until the length is zero
The four bytes following the length often takes on one of two formats depending on the type of entity:
- the first two bytes contain entity type with the other two containing flags (this hasn't been fully confirmed)
- if the entity if a record in a table, these first two bytes will be the offset to the record key and the other two will be the key length.
If the entry is a table record,
- the next two bytes are the record flags,
- the next two bytes is the value offset
- the next two bytes is the value length
- the next two bytes is padding (0's)
These values can be seen in the memory structures described in . An example record looks like:
bytes 0-3: 50 00 00 00 # attribute length bytes 4-7: 10 00 10 00 # key offset / key length bytes 8-B: 00 00 20 00 # flags / value offset bytes C-F: 30 00 00 00 # value length / padding bytes 10-1F: 00 00 00 00 00 00 00 00 20 05 00 00 00 00 00 00 # key (@ offset 0x10 and of length 0x10) bytes 20-2F: E0 02 00 00 00 00 00 00 00 00 02 08 08 00 00 00 # -| bytes 30-3F: 1F 42 82 34 7C 9B 41 52 00 00 00 00 00 00 00 00 # |-value (@ offset 0x20 and length 0x30) bytes 40-4F: 08 00 00 00 08 00 00 00 00 05 00 00 00 00 00 00 # -|
Various attributes and values in them take on particular meaning.
- the first attribute (type 0x28) has information about the page contents,
- Bytes 1C-1F of the first attribute seem to be a unique object-id / type which can idenitify the intent of the page (it is consistent between similar pages on different images). It is also repeated in bytes 0x20-0x23
- Byte 0x20 of the first attribute contains the number of records in the table. This value is repeated in the record collection attribute. (see next bullet)
- Before the table collection begins there is an 0x20 length attribute, containing the number of entries at byte 0x14. If the table gets too long this value will be 0x01 instead and there will be an additional entry before the collection of records (this entry doesn't seem to follow the conventional rules as there are an extra 40 bytes after the entry end indicated by its length)
- The collection of table records is simply a series of attributes, all beginning w/ the same header containing key and value offset and length (see previous section)
Particular pages seem to take on specified connotations:
- 0x1e is always the first / root page and contains a special format. 0x1f is skipped before pages start at 0x20
- On the image I analyzed 0x20, 0x21, and 0x22 were individual pages containing various attributes and tables w/ records.
- 0x28-0x38 were shadow pages of 0x20, 0x21, 0x22
- 0x2c0-0x2c3 seemed to represent a single table with various pages being the table, continuation, and shadow pages. The records in this table have keys w/ a unique id of some sort as well as cluster id's and checksum so this could be the object table described in 
- 0x2c4-0x2c7 represented another table w/ shadow pages. The records in this table consisted of two 16 byte values, both which refer to the keys in the 0x2c0 tables. If those are the object id's this could potentially be the object tree.
- 0x2c8 represents yet another table, possibly a system table due to it's low virtual page number (01)
- 0x2cc-0x2cf - consisted of a metadata table and it's shadow pages, the 'ReFs Volume' volume name could be seen in the UTF there.
The rest of the pages were either filled with 0's or non-metadata pages containing content. Of particular note is pages 0x2d0 - 0x2d7 containing the upcase table (as seen in ntfs).
I've thrown together a simple ReFS parser using the above assumpions and threw it upon github via a gist.
To utilize download it, and run it using ruby:
ruby resilience.rb -i foo.image --offset 123456789 --table --tree
You should get output similar to the following:
Of course if it doesn't work it could be because there are differences between our images that are unaccounted for, in which case if you drop me a line we can tackle the issue together!
The next steps on the analysis roadmap are to continue diving into the page allocation and addressing mechanisms, there is most likely additional mechanisms to navigate to the critical data structures immediately from the first sector or page 0x1e (since the address of that is known / fixed). Also continuing to investigate each page and analyzing it's contents, especially in the scope of various file and system changes should go a long ways to revealing semantics.