I am about to write a module for JabRef, an open source bibliographic management software to export the bibliographic information for Microsoft Office 2007.
Some references that might be helpful:
- How to use Office 2007 bibliographic tool
- OpenXML Developer
- Blog of Brian Jones, the person behind the Office 2007 open XML
- ECMA Open XML Standard Elaborated Schemas (all documents)
- MSDN article showing how to work with Bibliography (updated March 23, 2007)
But after searching for a day, I could not find a single web page describing the exact or near exact format for bibliographic information in Microsoft Office 2007. So I started digging in myself.
I started adding some bibliographies in Microsoft Office Bibliography Editor. The very first thing I noticed is, if you add some references and don’t use them in the document they are not going be saved. If you use one or more of them in your document, all of them will be saved in “C:\Documents and Settings\<USER>\Application Data\Microsoft\Bibliography\Sources.xml“. I opened the XML file and here’s what I got (figure 1).
Figure 1: Mircosoft Office 2007 Bibliographic Database Format
Obviously I had only one bibliographic source in the “Sources.xml”. I was almost certain that Office will import a copy of this file without any problem. A copy of this file with the information altered and GUID, LCID deleted, just worked as imported bibliography. But wait, where are my previous bibliographic sources?
So I tried to discover what happened and found that Office does NOT really imports bibliography into the “Sources.xml”, it allows you to work on currently opened XML only. All the bibliographic sources in currently opened bibliographic XML file are displayed in the ‘master list‘. You have to “copy” them into your ‘current list‘ to work with it. If you want to merge information from an external XML file into your “C:\Documents and Settings\<USER>\Application Data\Microsoft\Bibliography\Sources.xml” you have to open the external XML file, copy the information into your ‘current list‘, open the “Sources.xml” again and then copy them back into the ‘master list‘ which now points to “Sources.xml“.
I wanted to find out the least possible information required for the XML file to be recognized as a valid bibliographic source by Office 2007. The bare minimum is:<sources xmlns="http://schemas.openxmlformats.org/officeDocument/2006/bibliography"/>
If you want to add information in this base minimum XML don’t use the “b:” tag.
From MSDN (update):
The Guid and LCID elements are optional, but you can provide values for them if you want. The Guid element value should be a valid GUID, which you can generate programmatically outside the Word object model. (See the Microsoft Visual Studio documentation or the Microsoft Windows documentation on MSDN for information about programmatically generating ID.) Word generates GUIDs when users add or edit a source. If you do not add a GUID to the XML and a user then edits a source, Word generates a GUID. This enables Word to determine which source is most recent, based on the value of the GUID, and to prompt whether the user wants Word to update the outdated source to maintain continuity between the master list and the current list.
The LCID specifies the language for the source. (See MSDN for valid language identification values.) Word uses the LCID to know how to display a cited source in a document’s bibliography. For example, one source may be written in French, one in English, and one in Japanese. From the LCID, Word determines how to display names (for example, Last, First for English), what punctuation to use (for example, using comma in one language and a semicolon in another), and what strings to use (for example, whether to use “et al” or another localized form).
Now that I deciphered how bibliographic information can be presented in an XML, so that Office 2007 recognizes it as a bibliographic source, I can now list down all the bits and pieces that can go inside it. Please follow my next post on it.
For those of you who don’t know, Jabrefis an open-source, Java based reference manager that allows you to import citations from a number of sources, in a number of formats, and works natively in Bibtex format, the bibliography format of LaTeX. I’ve been using it extensively in my PhD but only recently have I noticed that you can now directly export the list of references in Jabref directly into a Word 2007 XML format for import into Word’s own reference manager. It’s pretty easy to do but there are a few steps involved.
1. Click File->Export
2. Select MS Office 2007 XML and save file
Once the XML file has been created, you can import any one of the references in the following manner, within Word 2007:
1. Go into the References tab, and select Manage Sources from the menu
2. In the Source Manager dialog box that pops up, click Browse on the left hand side
3. Select the XML file you saved via Jabref, and the references should appear in the ‘Master List’ on the left hand side of the dialog.
4. You can then select any one (or all) of the references in the Master list and by pressing the ‘Copy->’ button the selected references are moved into the ‘Current List’ from which you can use the cite as you write functionality in Word 2007. Any references in this Current List are then written in the bibliography, which you can create by choosing from one of the styles in the Bibliography drop-down in the References tab.
This supersedes an earlier program I posted which did a similar thing.