Sunday, November 28, 2010

C# Implementation of Microsoft Compound Document File Format

I recently had to read the contents of several Microsoft Compound Documents from disk. After doing some searching I could not find any .NET implementations to tackle this for me so I decided to build a library in C# and publish it for everyone else to use. During my searching I did come across a great document from OpenOffice.org which details the structure of Microsoft Compound Document  File Format with extreme clarity and accuracy. Using this document I created a C# library which can read from an input stream the Compound Document and return a set of directory objects which contain the meta data and the underlying stream object which contains the data that was stored inside the compound document.

The code is self explanitory and I also added documentation in the code. Together with the documentation I believe any .NET developer should be able to easily reuse this library. Please double check your results, I did not have as much time as I would have liked to thoroughly test the output. Also I did not implement reading of the time stamps in the directory entries.

Full code in GitHub repository

For more information about Microsoft's Compound Document visit these sites.
Wiki site
OpenOffice.org's Documentation of the Microsoft Compound Document File Format
Windows Compound Binary File Format Specification