Most people are unaware that the documents they create and
edit using Microsoft’s Office suite of products contain a large amount of data
related to the documents life-cycle. While usually benign and not very
interesting, this data can become quite valuable in a forensic investigation.
It can help establish timetables of when a file was last accessed or modified.
An examiner can even extract the last few users who edited the file and the
previous locations the document was stored.
Example of some available metadata fields in Office
documents:
- Your name
- Your initials
- Your company or organization
name
- The name of your computer
- The name of the network
server or hard disk where you saved the document
- Other file properties and
summary information
- Non-visible portions of
embedded OLE objects
- The names of previous
document authors
- Document revisions
- Document versions
- Template information
- Hidden text or cells
- Personalized views
- Comments
Extracting metadata
There are several ways to extract the metadata from a
document. The simplest method is to view or modify many of the fields by using
the Office applications themselves. In Word or Excel, under the File menu, the
Properties option will display a dialog window containing many of the editable
metadata fields.
However, not all metadata fields are this easy to access.
Third party tools must be used to extract certain fields. There are many tools
available such as MetaDiscover or MetaViewer by PinPoint Labs to
extract these additional fields. MetaDiscover is able to extract the last 10
authors of a document and the locations it was stored at.
An additional benefit of using a third party extraction
utility is that most open the document in a read-only mode when retrieving the
metadata. By using the native Office applications to view the data, the
document is typically opened in a read-write mode which may cause certain
fields, such as last accessed, to be updated thereby altering the file. In a
forensic investigation, it is imperative that the data remain unaltered.

Figure 1: Metaviewer showing a Word 2003 Metadata
Removing metadata
There are two primary techniques used to ensure metadata is
not included in documents when they are shared or published: limiting the
creation of the metadata and scrubbing or redacting the data before
publication.
The first technique is to configure the Office applications
to not create and store the metadata fields in the document in the first place.
If the data never existed there is no need to remove it at a later point. While
this technique would appear to solve all the issues surrounding the inclusion
of metadata, in practice there are still some fields the Office applications
will create and populate even with all user available options configured. A
good guide on configuring Word 2003 to limit the amount of metadata it stores
in documents can be found here.
The other way to ensure that metadata cannot be extracted
from a document is to run a utility which edits the document and scrubs or redacts
the information. Microsoft has created an add-in for the Office 2003 suite that
is available here that will remove most metadata fields from a document. There are also many
third party programs available such as iScrub by Esquire Innovations.
Comment about Office
2007
The latest version of Microsoft Office, version 2007,
utilizes new file formats. These new file formats still contain metadata but
store it in a different structure than all previous versions of Office.
Therefore, most extraction or scrubber utilities will not operate correctly on
these new formats. However, Microsoft has included a new feature in Office 2007
called the Document Inspector. Microsoft claims this new feature will allow
company’s to control the metadata within the documents they publish. For more
details on its operation Randall Farrar of Esquire Innovations has written a detailed whitepaper on it.
Final Thoughts
The metadata associated with Office documents can
provide many clues into the history of a document. Present with this
information, many organizations may choose to either remove this data or
convert the files to another format (such as PDF) before they publish them.
From a forensic investigation vantage, the metadata can be very important and
should always be reviewed for all documents related to an incident.