This is the schema for Digital Forensics XML, version 1.1.0.
To report issues, questions, or feature requests, please either:
* File a Github issue at this repository, seeing first if it is already filed: https://github.com/dfxml-working-group/dfxml_schema
* Email the dfxml@nist.gov mailing list. If you wish to join the mailing list, send an email to dfxml-subscribe@nist.gov (no subject or message body is necessary), and a moderator will grant access.
(A technical aside.) The Dublin Core and XML metadata schema needs to be imported to validate with the 'xmllint' utility. To save on validation-step network transmissions, a copy is included alongside this schema, modified to also fetch the XML Schema .xsd file locally.
Ref: https://mail.gnome.org/archives/xml/2009-November/msg00022.html
The schema of XML is itself imported into this document because XML schema imports are not transitive. This allows usage of special XML attributes, such as xml:lang.
The version of the DFXML schema to which the DFXML file adheres.
"1" implies the file was found in an allocated state. Unallocated discovery can be shown with a "0" here, or using the unalloc element.
An indicator that this volume only contains fileobjects for allocated files, not recovered or deleted files.
The time the file was last accesed.
The time the file was last backed up. (Recorded in HFS+.)
The size of the partition's block unit, in bytes. Note that a block is not necessarily a disk sector; FAT and NTFS both use clusters as their blocks.
A manifest of how the environment was set when the XML generator was compiled. Note that due to restrictions of element repetitions in XML Schema 1.0's "all" and "sequence" specificiers, the build_environment definition requires children appear in the order as generated by an exemplar utility (Fiwalk for now).
A specific location of bytes on a mass storage device. These are grouped in a byte_runs array. Child elements are one or more cryptographic hashes of the run's content. One might use this for sector-level hashes of a file's contents.
This attribute is used to denote whether the file's contents are resident in the file metadata structure. The SleuthKit uses this to denote residency in the NTFS MFT entry, using the corresponding flags "TSK_FS_ATTR_RES" to denote a resident file, and "TSK_FS_BLOCK_FLAG_RES" for the data block.
The command line used to invoke the program.
The date the program was compiled.
The compiler (if any) used to compile the program.
A block of build environment and execution provenance for the XML file. Note that due to restrictions of element repetitions in XML Schema 1.0's "all" and "sequence" specifiers, the creator definition requires children appear in the order as generated by an exemplar utility (Fiwalk for now).
(It is unclear why there is another version attribute here.)
The time the file was created. Sometimes called "Birth time."
The time the file metadata were last modified.
The time the file was recorded as deleted. (Recorded in Ext2 file systems.)
A string describing an error encountered processing a file.
A description of the execution environment when the XML file was generated.
The file name, or full known path of the file relative to the volume root.
A file and its metadata. Byte-location information should be recorded when possible. Note that due to restrictions of element repetitions in XML Schema 1.0's "all" and "sequence" specificiers, the fileobject definition requires children appear in the order as generated by an exemplar utility (Fiwalk for now).
The size of the file in bytes, as reported by the file system.
Address of first block of the file system, in bytes. This appears to be relative to the beginning of the partition; in The SleuthKit's code base, the code "->first_block" only ever appears on the left-hand side of an assignment statement when "0" is on the right-hand-side. (That is, this is always 0 in TSK-based results.)
A numerical encoding of the file system type. The SleuthKit uses a custom enumeration of types known to the code base; the Linux kernel source code uses a different enumeration for recognized file systems.
A human-readable string label of the file system type. Can include annotations, such as using automatic detection to determine the precise type (e.g. leaving it up to the program to distinguish FAT12 from FAT16).
User-group identifier of the file.
A cryptographic hash.
The name of the host machine in which the program was executed.
A unique identifier for the file. It is distinct to both the input data and the process parameters of the generating tool. (That is, it is commonly defined by incrementing a global counter in walk-encounter order of each file and directory.)
The path (absolute or relative) to the input file. Note some utilities operate on device files, some on image files, some on other DFXML files.
The inode number (st_ino from the stat(2) system call). File systems that do not have an "Inode" may use an alternative, distinct identifier. In The SleuthKit, FAT "Inode" numbers are calculated from the directory entry's block address; NTFS's "Inode" numbers are the MFT entry address.
The address of the last block of the file system, relative to the beginning of the partition, in bytes. As reported by file system, after to-byte conversion. Not guaranteed to be in image (for instance, in an incomplete disk image).
The result of running libmagic to identify the file type.
The file to which a soft link refers.
A numeric encoding of the general file type - regular, directory, soft link, etc. Numeric values are particular to The SleuthKit; the name_type element renders the values to short string representations.
File opening mode. This is the inode mode in POSIX file systems, and an encoding of various NTFS file attributes when created by The SleuthKit libraries.
The time the file data were last modified.
A string representation of the general file type - regular, directory, soft link, etc.
Unknown type
Named pipe
Character device
Directory
Block device
Regular file
Symbolic (soft) link
Socket
Shadow inode (Solaris)
Whiteout (OpenBSD)
Special (Used in The SleuthKit for added "Virtual" files, e.g. $FAT1)
The number of hard links to this file's inode.
A file lacking a referencing metadata structure.
The operating system release (reported by uname -r).
The operating system name (reported by uname -s).
The operating system version (reported by uname -v).
The partition in which the file resides. 1-based counter of the partition order.
The offset of the partition from the beginning of the image file, in bytes.
The name of the XML-generating program.
This element encodes a "rusage" C structure, as provided by the "getrusage" function after the file walk is complete. In addition to the "rusage" fields, the element may include an element for elapsed wall clock time in seconds.
The size of a disk sector in this volume. Note that this is not necessarily the same unit as the volume will use for its blocks (see block_size element).
The NTFS sequence number.
The date and time that the program was executed.
The numerical user id.
"1" implies the file was found marked unallocated.
This file's metadata structure has never been used (had an attribute populated), or possibly never been allocated.
This file's metadata structure has at least one attribute populated.
The username under which the program was executed.
The version of the XML-generating program.
A mass storage system volume, which is defined as a collection of byte blocks that are all the same size.
A 0-or-1 Boolean value.
A general structure to represent a xs:dateTime with the precision attribute.
The precision of this timestamp, in seconds.
The hash algorithm that applies to this object.