Monday, November 3, 2008

Ms Word's File formats

File extension

Microsoft Word's native file formats are denoted either by a .doc or .docx file extension.

Although the ".doc" extension has been used in many different versions of Word, it actually encompasses four distinct file formats:

  1. Word for DOS
  2. Word for Windows 1 and 2; Word 4 and 5 for Mac
  3. Word 6 and Word 95; Word 6 for Mac
  4. Word 97, 2000, 2002, 2003 and 2007; Word 98, 2001, X, and 2004 for Mac

The newer ".docx" extension signifies Office Open XML and is used by Word 2007 for Windows and Word 2008 for the Macintosh.

Microsoft does not guarantee the correct display of the document on different workstations, even if the two workstations use the same version of Microsoft Word.[14] This means it is possible the document the recipient sees might not be exactly the same as the document the sender sees.

Binary formats (Word 97-2003)

As Word became the dominant word processor in the late 1990s and early 2000s[citation needed], Word document formats (.DOC) became a de facto standard of document file formats due to their popularity. Though usually just referred to as "Word Document Format", this term refers primarily to the range of formats used by default in Word version 97–2003. Word document files using the Word 97-2003 Binary File Format implement OLE (Object Linking and Embedding) structured storage to manage the structure of its file format. OLE behaves rather like a conventional hard drive file system, and is made up of several key components. Each word document is composed of so-called "big blocks" which are almost always (but do not have to be) 512-byte chunks; hence a Word document's file size will always be a multiple of 512. "Storages" are analogues of the directory on a disk drive, and point to other storages or "streams" which are similar to files on a disk. The text in a Word document is always contained in the "WordDocument" stream. The first big block in a Word document, known as the "header" block, provides important information as to the location of the major data structures in the document. "Property storages" provide metadata about the storages and streams in a .doc file, such as where it begins and its name and so forth. The "File information block" contains information about where the text in a word document starts, ends, what version of Word created the document, and other attributes.

Microsoft Office Open XML (Word 2007 and above)

Word 2007 uses Office Open XML as its default format, but retains the older binary format for compatibility reasons. It also supports (for output only) PDF and XPS format. Microsoft has published specifications for the Word 97-2007 Binary File Format[15] and the Office Open XML format.[16] Microsoft has moved towards an XML-based file format for their office applications with Office 2007: Office Open XML. This format does not conform fully to standard XML.[citation needed] It is, however, publicly documented as Ecma International standard 376. Public documentation of the default file format is a first for Word, and makes it considerably easier, though not trivial, for competitors to interoperate. It's been approved as an international standard by ISO (ISO/IEC 29500), but the approval is under review following objections by ISO members South Africa, Brazil, India and Venezuela[17]. Another XML-based, public file format supported by Word 2003 and upwards is the Microsoft Office Word 2003 XML Format.

Attempts at cross-version compatibility

Opening a Word Document file in a version of Word other than the one with which it was created can cause incorrect display of the document. The document formats of the various versions change in subtle and not so subtle ways; formatting created in newer versions does not always survive when viewed in older versions of the program, nearly always because that capability does not exist in the previous version. Rich Text Format (RTF), an early effort to create a format for interchanging formatted text between applications, is an optional format for Word that retains most formatting and all content of the original document. Later, after HTML appeared, Word supported an HTML derivative as an additional full-fidelity roundtrip format similar to RTF, with the additional capability that the file could be viewed in a web browser.

Third party formats

It is possible to write plugins permitting Word to read and write formats it does not natively support, such as OpenDocument. Word is incapable of reading or writing OpenDocument documents without such a plugin.

No comments: