none
Bug in Excel's handling of metadata? Or at least a deviation from the Open Office XML specification?

    Question

  • I've discovered some very strange behaviour in Excel which seems to be contrary to that documented in the Open Office XML specification (OOXML). I couldn't find a way to report bugs against Excel, so I thought I'd try to explain the issue here.

    In short it seems that Excel is removing, and incorrectly re-indexing the metadata I programatically associate with cells.

    First, a summary of the relevant parts of the specification:

    From OOXML 18.9: There are two types of metadata: "cell metadata" and "value metadata".

    Cell metadata follows the cell as it moves. Value metadata follows the value through formulae etc.

    From OOXML 18.3.1.4: The c (cell) element has cm and vm attributes which are both documented as "The zero-based index of the [cell|value] metadata...in the Metadata Part"

    From OOXML 18.9.17: The valueMetadata is "a collection of block element that each define the value metadata for a particular cell". "Cells in the workbook index into this collection".

    The valueMetadata contains bk elements which in turn contain rc (metadata record) elements

    From OOXML 18.9.15: rc elements have t (type index) and v (value index) attributes. t is a 1-based index into metadataTypes and v is a 0-based index into the futureMetadata element which matches the name of the metadata type.

    Here's an example of what this might look like:

    ...
    
    <c vm="0">  <!-- vm points to the first bk inside valueMetadata below -->
    
    ...
    
    <x:valueMetadata>
      <x:bk>
        <x:rc t="1" v="0" />  <!-- t points to the first metadataType below. v points to the first bk in the futureMetadata below (whose name matches the metadataType to which t points) -->
      </x:bk>
    </x:valueMetadata>
    
    ...
    
    <x:metadataTypes>
      <x:metadataType name="MyMetaType" ... />  <!-- name dictates which futureMetadata valueMetadata's v attribute indexes into -->
    </x:metadataTypes>
    
    ...
    
    <x:futureMetadata name="MyMetaType" ...>
      <x:bk>
        <x:extLst>
          <x:ext xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main" uri="http://example.com/extension" p5:value="test value" xmlns:p5="http://example.com/extension" />
        </x:extLst>
      </x:bk>
    </x:futureMetadata>

    The Problem

    From what I can tell, for values of n > 2, if you associate n cells with metadata, Excel will drop the last piece of metadata, and the one at index 1, and it will do so silently. The indices are then 0..n-3, and the association for all but the first (0 index) will be wrong. This renders the future metadata totally useless.

    For n == 1, Excel just removes the last piece of metadata (index 1). If we try 1-based indexes for the vm attribute on the c element, we get different behaviour. This may not be relevant as it is contrary to the specification, but the slightly better behaviour might indicate an off-by-one error:

    n Deleted Indices (0-based) when using 0-based indices Deleted Indices (0-based) when using 1-based indices
    1 0 None
    2 1 1
    3 1,2 1
    4 1,3 1
    5 1,4 1
    6 1,5 1

    Demonstrating the Problem

    I have some example code[1] that demonstrates the problem. You will need a file called test.xlsx with cells A1..C2 populated.

    Compile the source as AddMetadata.exe then run with the test file as the only parameter:

    > AddMetadata.exe test.xlsx
    

    You can look at test.xlsx in Excel, Visual Studio (with the Open XML Package Editor Power Tool for Visual Studio 2010) or the Open XML SDK 2.0 Productivity Tool for Microsoft Office. Looking at the file before and after running AddMetadata.exe you should be able to reproduce the behaviour documented above.

    Summary

    It would be good to know if this is really an Excel bug or whether we're doing something wrong / unsupported. Any insight would be very much appreciated.

    [1] The Example code:

    namespace AddMetadata
    {
        using System;
        using System.Linq;
    
        using DocumentFormat.OpenXml;
        using DocumentFormat.OpenXml.Packaging;
        using DocumentFormat.OpenXml.Spreadsheet;
    
        public class Program
        {
            // The cells to associate with metadata
            private readonly static CellSpec[] CellSpecs = new[]
                    {
                        new CellSpec{ Sheet = "Sheet1", Column = "A", Row = 1 },
                        new CellSpec{ Sheet = "Sheet1", Column = "B", Row = 1 },
                        new CellSpec{ Sheet = "Sheet1", Column = "C", Row = 1 },
                        new CellSpec{ Sheet = "Sheet1", Column = "A", Row = 2 },
                        new CellSpec{ Sheet = "Sheet1", Column = "B", Row = 2 },
                        new CellSpec{ Sheet = "Sheet1", Column = "C", Row = 2 },
                    };
    
            private static readonly uint NumCells = (uint)CellSpecs.Length;
    
            private const string SPREADSHEET_ML_NS = "http://schemas.openxmlformats.org/spreadsheetml/2006/main";
    
            private const string METADATA_TYPE_NAME = "MyMetaType";
    
            private const string EXTENSION_URI = "http://example.com/extension";
    
            public static void Main(string[] args)
            {
                if (args.Length != 1)
                {
                    Console.Out.WriteLine("AddMetadata <doc.xslx>");
                    Console.Out.WriteLine("  Adds metadata to the specified document to demonstate some strange Excel behaviour");
                    Environment.Exit(1);
                }
                try
                {
                    var doc = SpreadsheetDocument.Open(args[0], true);
                    StripMetadata(doc);
                    AddMetadata(doc);
                    AddMetadataType(doc);
                    
                    AddFutureMetadata(doc);
                    AddMetadataRecords(doc);
    
                    AssociateCellsWithMetadata(doc);
    
                    doc.WorkbookPart.Workbook.Save();
                    doc.Close();
                }
                catch(Exception e)
                {
                    Console.Out.WriteLine(e);
                }
            }
    
            /// <summary>
            /// Strip any existing metadata.
            /// </summary>
            /// <param name="doc">The document</param>
            private static void StripMetadata(SpreadsheetDocument doc)
            {
                var wbPart = doc.WorkbookPart;
                var cellMetadataPart = wbPart.GetPartsOfType<CellMetadataPart>().FirstOrDefault();
                wbPart.DeletePart(cellMetadataPart);
            }
    
            /// <summary>
            /// Add basic metadata part structure.
            /// </summary>
            /// <param name="doc">The document</param>
            private static void AddMetadata(SpreadsheetDocument doc)
            {
                doc.WorkbookPart.AddNewPart<CellMetadataPart>();
                doc.WorkbookPart.CellMetadataPart.Metadata = new Metadata { MetadataTypes = new MetadataTypes() };
            }
    
            /// <summary>
            /// Add the metadata type used by all the metadata we're adding
            /// </summary>
            /// <param name="doc"></param>
            private static void AddMetadataType(SpreadsheetDocument doc)
            {
                var metadata = doc.WorkbookPart.CellMetadataPart.Metadata;
                var metadataType = new MetadataType
                {
                    Name = METADATA_TYPE_NAME,
                    Assign = false,
                    CellMeta = false,
                    ClearContents = false,
                    ClearAll = false,
                    ClearComments = true,
                    ClearFormats = true,
                    Coerce = false,
                    Copy = true,
                    Delete = false,
                    Edit = true,
                    Merge = true,
                    MinSupportedVersion = 0U,
                    PasteAll = true,
                    PasteBorders = false,
                    PasteColWidths = false,
                    PasteComments = false,
                    PasteDataValidation = false,
                    PasteFormats = false,
                    PasteFormulas = false,
                    PasteNumberFormats = false,
                    PasteValues = true,
                    RowColumnShift = true,
                    SplitAll = false,
                    SplitFirst = false
                };
                metadata.MetadataTypes.AppendChild(metadataType);
            }
    
            /// <summary>
            /// Add future metadata blocks which contain the actual metadata for each cell.
            /// They are referenced by the metadata records.
            /// </summary>
            /// <param name="doc">The document</param>
            private static void AddFutureMetadata(SpreadsheetDocument doc)
            {
                var metadata = doc.WorkbookPart.CellMetadataPart.Metadata;
    
                var futureMetadata = metadata.AppendChild(new FutureMetadata());
                futureMetadata.Name = METADATA_TYPE_NAME;
                futureMetadata.Count = NumCells;
    
                // Future metadata area
                for (var i = 0; i < NumCells; i++)
                {
                    // The metadata for each cell will be single FutureMetadataBlock containing an extension list with a single extension.
                    FutureMetadataBlock futureMetadataBlock = futureMetadata.AppendChild(new FutureMetadataBlock());
                    ExtensionList extLst = futureMetadataBlock.AppendChild(new ExtensionList());
                    Extension ext = extLst.AppendChild(new Extension());
                    ext.Uri = EXTENSION_URI;
                    ext.AddNamespaceDeclaration("x", SPREADSHEET_ML_NS);
                    ext.SetAttribute(new OpenXmlAttribute("value", ext.Uri, string.Format("test value {0}", i)));
                }
            }
    
            /// <summary>
            /// Add metadata records which point to each future metadata block.
            /// They are in turn referenced by the cells.
            /// </summary>
            /// <param name="doc">The document</param>
            private static void AddMetadataRecords(SpreadsheetDocument doc)
            {
                var metadata = doc.WorkbookPart.CellMetadataPart.Metadata;
    
                // Value metadata area
                ValueMetadata valueMetadata = metadata.AppendChild(new ValueMetadata());
                for (uint i = 0; i < NumCells; i++)
                {
                    // Type is 1-indexed, index into future metadata is 0-indexed
                    var metadataBlock = valueMetadata.AppendChild(new MetadataBlock());
                    var metadataRecord = metadataBlock.AppendChild(new MetadataRecord());
                    metadataRecord.Val = i;
                    metadataRecord.TypeIndex = (uint)1;
                }
            }
    
            /// <summary>
            /// Associate existing cells with existing metadata.
            /// </summary>
            /// <param name="doc">The document</param>
            private static void AssociateCellsWithMetadata(SpreadsheetDocument doc)
            {
                for (uint i = 0; i < CellSpecs.Length; i++)
                {
                    var cellSpec = CellSpecs[i];
                    var cell = GetCell(doc, cellSpec.Sheet, cellSpec.Column, cellSpec.Row);
                    if (cell == null)
                    {
                        throw new ArgumentException(string.Format("Cell {0} not found in row {1} of sheet {2}", cellSpec.Column, cellSpec.Row, cellSpec.Sheet));
                    }
                    cell.ValueMetaIndex = i;
                }
            }
    
            /// <summary>
            /// Get a cell given the document, sheet name, column name and row index.
            /// </summary>
            /// <param name="doc">The document</param>
            /// <param name="sheetName">The sheet name</param>
            /// <param name="columnName">The column name</param>
            /// <param name="rowIndex">The row index</param>
            /// <returns>The cell</returns>
            private static Cell GetCell(SpreadsheetDocument doc, String sheetName, String columnName, uint rowIndex)
            {
                var row = GetRow(doc, sheetName, rowIndex);
                if (row == null)
                {
                    throw new ArgumentException(string.Format("Row '{0}' not found", rowIndex));
                }
    
                return row.Elements<Cell>().Where(c => c.CellReference.Value.StartsWith(columnName)).FirstOrDefault();
            }
    
            /// <summary>
            /// Get a worksheet part by sheet name.
            /// </summary>
            /// <param name="document">The document</param>
            /// <param name="name">The sheet name</param>
            /// <returns>The worksheet part</returns>
            private static WorksheetPart GetWorksheetPartByName(SpreadsheetDocument document, string name)
            {
                // Get Sheet by name from Sheets in Workbook
                var sheet = document.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>().Where(x => x.Name == name).FirstOrDefault();
                // Lookup WorksheetPart by Id
                return sheet == null ? null : (WorksheetPart)document.WorkbookPart.GetPartById(sheet.Id.Value);
            }
    
            /// <summary>
            /// Get a row given the document, sheet name and row index.
            /// </summary>
            /// <param name="doc">The document</param>
            /// <param name="sheetName">The sheet name</param>
            /// <param name="rowIndex">The row index</param>
            /// <returns>The row</returns>
            private static Row GetRow(SpreadsheetDocument doc, String sheetName, uint rowIndex)
            {
                var worksheetPart = GetWorksheetPartByName(doc, sheetName);
                if (worksheetPart == null)
                {
                    throw new ArgumentException(string.Format("Sheet '{0}' not found", sheetName));
                }
                return worksheetPart.Worksheet.GetFirstChild<SheetData>().Elements<Row>().Where(r => r.RowIndex == rowIndex).First();
            }
        }
    
        struct CellSpec
        {
            public string Sheet;
            public string Column;
            public uint Row;
        }
    }
    

    Thursday, April 26, 2012 5:00 PM

All replies

  • Hello Jon,

    Thank you for your meticulous explanation and the demonstration code and instructions. Running your code with the test.xlsx file worked, but I could not find the package after executing
     doc.WorkbookPart.Workbook.Save();
     doc
    .Close();

    When I opened the Test.xlsx in Open XML SDK 2.0 Productivity Tool... it was the original workbook with the values I entered in the range A!:C3.  The attributes for futureMetaData1 (Count = uInt32Value)6U and for futureMetaDataBlock, extensionList1 and all of the rest adhered to the W3C specifications.  This leads me to conclude that I am looking at the test workbook before using it as the target for the demonstration project.

    If that is wrong please re-assure me. If it isn't wrong please tell me where to find the processed packaged workbook that results from running the code.

    Thanks for your guidance.


    Please remember to mark the replies as answer if they help and unmark them if the provide no help. and click "Vote as Helpful" this and other helpful posts, so other users will see your thread as useful. Chris Jensen

    Tuesday, May 01, 2012 8:34 PM
  • Hello cjatms, and thanks for your response.

    The original test.xlsx won't have any metadata since Excel doesn't add any by default.

    Running AddMetadata test.xlsx then comparing the before & after files using the SDK tool shows the following differences:

    A new metadata.xml part containing

    • 6 future metadata blocks with values "test value 0" to "test value 5"
    • 6 value metadata blocks - all with the same type (t="1") and with references to the corresponding future metadata blocks (v="0" to v="5")

    The sheet1.xml part now has attributes on the six cells (vm="0" to vm="5")

    I believe this is in accordance with the spec.

    The problem comes when you open the file with metadata and save it using Excel (no need to make any changes). Using the SDK tool to compare the previous file with the one saved by Excel shows the following differences:

    The metadata.xml part has changed to contain

    • 4 future metadata blocks with values "test value 0", "test value 2", "test value 3", "test value 4"
    • 4 value metadata blocks with references v="0" to v="3"

    The sheet1.xml part has changed to remove the vm attributes on cells A1 and C1 (previously vm="0" and vm="2" respectively).

    To me it seems like Excel is removing two of the pieces of metadata and messing up the references.

    I hope that's clarified the problem a bit.

    Friday, May 11, 2012 2:02 PM
  • Did you get any solution for this?
    Monday, May 14, 2012 10:26 AM
  • No I'm afraid I'm still hoping for a solution; have you come across something similar?
    Tuesday, May 29, 2012 1:49 PM
  • I can confirm this bug, both running Jon's code and against my own code/large dataset. SO strange and annoying as this functionality is what I needed. Has anyone got a workaround yet? And additionally I've noted it will remove additional entries every save. Seems like a serious bug. Where do we register it?


    • Edited by ZRGN Saturday, June 23, 2012 8:38 PM
    Saturday, June 23, 2012 11:46 AM
  • Has anyone got a workaround yet?


    We found that it worked ok so long as there was only one block entry in the future metadata. Since the blocks are effectively indexed relative to the type, we use a unique type for each block (i.e. lots of duplicate types). The value index is always 0 and it's just the type index which varies. I hope this helps. I would also like to know where to file bugs. It seems strange that there's no mechanism for doing this in such a widespread product.
    Friday, July 13, 2012 5:08 PM
  • Unfortunately, I can always reproduce this bug. The workaround mentioned by Jon doesn't run on my machine.

    I tested with cell metadata rather than value metadata, but the behavior is the same. Excel removes meta data.

    Nobody has opening a support ticket ?


    blog.mexedge.com

    Monday, January 21, 2013 5:29 PM
  • If you create a metadatatype with a single metdata block, and you reference that in your vm/cm cell attribute using a *one* based index, Excel seems to see the link and it honors it when saving the spreadsheet.

    So, I ended up with something like:

    <c ... cm="1"/> (I'm dealing with cell metadata, but the concept is equivalente to value metadata)


    <metadataTypes count="1">
      <metadataType name="MyMetaType" .../>
    </metadataTypes>
    <futureMetadata count="1" name="MyMetaType">
      <bk>
        <extLst><ext uri="http://example" xmlns:x="http://example"><x:val>87</x:val></ext></extLst>
      </bk>
    </futureMetadata>
    <cellMetadata count="1">
      <bk><rc t="1" v="0"/></bk> <!-- this is what gets referenced as cm=1 on the cell -->
    </cellMetadata>

    Hope this helps. 

    Tuesday, February 05, 2013 5:54 PM
  • Bit late to the party but only recently started using OpenXML. Easiest fix is to avoid Metadata and just use extension lists. They can be customised and applied at the cell level.
    Thursday, July 24, 2014 4:31 PM