Stripping Styles Programatically

greenMamba

New Member
Joined
Sep 24, 2015
Messages
1
I'm trying to find out how to properly strip Excel styles out of a workbook. I am using this open source project as a starting point but I'm not sure what' s going wrong here. https://excelfilecleaner.codeplex.com/

Whenever I process certain workbooks through it I open them and get an error that says there is "Unreadable content" contained in the workbook and it asks if I want to attempt to recover it. I click "yes" and the document appears to recover without an issue. When it recovers it gives me a log saying it Removed Records: Style from xl/styles.xml part (Styles)I have tried running through the code to look for any discrepancies in how it is handling the workbook, but can't find any. I have extracted and compared "styles.xml" from the same workbook in its original format, cleaned format, and repaired format, and the only discrepancies I can find are in the number of fonts that are still left in the styles.xml file. The "Cleaned" workbook contains the same as the "Original", but the "Repaired" workbook appears to have much less. Also, in the "Cleaned" workbook there appears to be a "x:" pre-fix before everything. Does anyone have any ideas why this could be happening? Am I missing something on the anatomy of these workbooks?

This is the class for cleaning the workbooks - it's written in c#

Code:
 private void CleanStyles(Stylesheet stylesheet)
        {
            List<CellFormat> cfList = new List<CellFormat>();
            // build a list of cell formats
            // NOTE: In the CellStyle namespace there is a FormatId
            //       the CellStyleFormats are actually an array
            //       of styles and the FormatId in the CellStyle
            //       references the index position of the 
            //       CellStyleFormat. So what we have to do is
            //       load all these into an List and then using
            //       the formatID of the cell style get reference
            //       to the related CellStyleFormat

            Log("Reviewing styles.");
            foreach (CellFormat cf in stylesheet.CellStyleFormats)
                cfList.Add(cf);

            // now get a list of the named Styles
            Dictionary<int,styleCount> styleList = new Dictionary<int,styleCount>();
            foreach (CellStyle cs in stylesheet.CellStyles)
            {
                try
                {
                    // add all styles and the cellformats
                    int formatId = int.Parse(cs.FormatId);
                    styleList.Add(formatId, new styleCount(cs, cfList[formatId], !(cs.BuiltinId == null)));
                }
                catch { } // ignore
            }
            Log("There are " + styleList.Count.ToString() + " styles in the workbook.");

            // now that we have aligned our cellstyleformats with cellformats
            // we need to delete everyone of them that are not related
            // to a specific CellFormat.
            int counter = 0;
            foreach (CellFormat xf in stylesheet.CellFormats)
            {
                try
                {
                    int formatId = int.Parse(xf.FormatId);
                    if (styleList.Keys.Contains(formatId))
                    {
                        styleList[formatId].inUse = true;
                        styleList[formatId].Found = true;
                        styleList[formatId].cellFormat = xf;
                        counter++;
                    }
                }
                catch { } // ignore
            }
            Log("There are " + counter.ToString() + " styles in use.");
            Log("Cleaning unused styles:");

            // now loop though the stylelist and delete anything that
            // is not in use and then log it
            foreach (KeyValuePair<int,styleCount> c in styleList)
            {
                // only those not in use and those that are not built in
                if (c.Value.Found == false && c.Value.builtIn == false)
                {
                    Log("Cleaning style: " + c.Value.cellStyle.Name);
                    c.Value.cellStyleFormat.Remove();
                    c.Value.cellStyle.Remove();
                }
            }

            // clean the list in reverse - deleting styles that were
            // removed in the operation above
            for (int i = (styleList.Count - 1); i >= 0; i--)
            {
                int keyVal = styleList.Keys.ToList<int>()[i];
                if (styleList[keyVal].inUse == false && styleList[keyVal].builtIn == false)
                    styleList.Remove(keyVal);
            }

            Log("Updating style indexes.");
            // now correct all the indexes
            foreach (KeyValuePair<int, styleCount> c in styleList)
            {
                int idx = 0; // keep track of the index
                // now we loop though the collection of the 
                // cellstyleformats (xsf) remaining in the
                // stylesheet. If the cellstyleformat is the
                // same cellstyleformat we are referencing in
                // our collection via styleList, then we will
                // update both the cellStyle xfId and the
                // cellFormat xfId in the stylesheet to 
                // reference the new index location...
                foreach (CellFormat xsf in stylesheet.CellStyleFormats)
                {
                    // CellStyleFormat (refersTo>) CellFormat
                    // CellStyle (refersTo>) CellFormat
                    if (c.Value.cellStyleFormat == xsf)
                    {
                        if (c.Value.Found) // only if matched to a cellformat
                        {
                            // we have a match... index them
                            c.Value.cellStyle.FormatId.Value = (uint)idx;
                            c.Value.cellFormat.FormatId.Value = (uint)idx;
                        }
                        break;
                    }
                    idx++;
                }
            }

            // update counts
            stylesheet.CellStyles.Count.Value = (uint)styleList.Count;
            stylesheet.CellStyleFormats.Count.Value = (uint)styleList.Count;    
    
            // clean
            stylesheet.Save();
            stylesheet = null;
            Log(DIVIDER);
        }
 

Excel Facts

How to total the visible cells?
From the first blank cell below a filtered data set, press Alt+=. Instead of SUM, you will get SUBTOTAL(9,)

Forum statistics

Threads
1,216,075
Messages
6,128,657
Members
449,462
Latest member
Chislobog

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top