Help connecting to an XML Spreadsheet

XL Pro

Board Regular
Joined
Apr 17, 2002
Messages
249
Office Version
  1. 365
Platform
  1. Windows
Hi - just started learning Power Query / Get & Transform this month - what an incredible tool! My background is VBA, so getting data cleaned/shaped without writing any code is incredible!

Has anyone tried reading an XML Spreadsheet? Here's the header of my data:

HTML:
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40">
<Styles>
<Style ss:ID="date"><NumberFormat ss:Format="Short Date"/></Style>
<Style ss:ID="bold"><Font ss:Bold="1"/></Style>
<Style ss:ID="default"></Style>
</Styles>
<Worksheet ss:Name="Data">
<Table>
<Row ss:StyleID="bold"><Cell><Data ss:Type="String">Company</Data></Cell><Cell><Data ss:Type="String">Job</Data></Cell><Cell><Data ss:Type="String">Emp Num</Data></Cell><Cell><Data ss:Type="String">Emp Name</Data></Cell><Cell><Data ss:Type="String">SSN</Data></Cell><Cell><Data ss:Type="String">Period Date</Data></Cell><Cell><Data ss:Type="String">Skill</Data></Cell><Cell><Data ss:Type="String">Hour Type</Data></Cell><Cell><Data ss:Type="String">ST</Data></Cell><Cell><Data ss:Type="String">OT</Data></Cell><Cell><Data ss:Type="String">DT</Data></Cell><Cell><Data ss:Type="String">HW</Data></Cell><Cell><Data ss:Type="String">HP</Data></Cell><Cell><Data ss:Type="String">Skill Category</Data></Cell><Cell><Data ss:Type="String">Skill Category Name</Data></Cell><Cell><Data ss:Type="String">FD</Data></Cell><Cell><Data ss:Type="String">Calc No.</Data></Cell><Cell><Data ss:Type="String">Description</Data></Cell><Cell><Data ss:Type="String">Contribution</Data></Cell><Cell><Data ss:Type="String">VH</Data></Cell><Cell><Data ss:Type="String">Contribution Amount</Data></Cell><Cell><Data ss:Type="String">Has Dues</Data></Cell><Cell><Data ss:Type="String">Dues Amnt</Data></Cell><Cell><Data ss:Type="String">Has PAC</Data></Cell><Cell><Data ss:Type="String">PAC Amount</Data></Cell><Cell><Data ss:Type="String">Has 401k</Data></Cell><Cell><Data ss:Type="String">401K Amount</Data></Cell><Cell><Data ss:Type="String">401K Rate</Data></Cell><Cell><Data ss:Type="String">Premium</Data></Cell><Cell><Data ss:Type="String">Has ROTH</Data></Cell><Cell><Data ss:Type="String">ROTH Amount</Data></Cell><Cell><Data ss:Type="String">ROTH Rate</Data></Cell><Cell><Data ss:Type="String">Roth Premium</Data></Cell><Cell><Data ss:Type="String">Address</Data></Cell><Cell><Data ss:Type="String">Local</Data></Cell><Cell><Data ss:Type="String">Union Skill</Data></Cell><Cell><Data ss:Type="String">Skill Name</Data></Cell><Cell><Data ss:Type="String">Skill Class Prior Rate</Data></Cell><Cell><Data ss:Type="String">Skill Class Current Rate</Data></Cell><Cell><Data ss:Type="String">Skill Class Due</Data></Cell><Cell><Data ss:Type="String">Skill Class HRA</Data></Cell><Cell><Data ss:Type="String">Skill Class 401A</Data></Cell><Cell><Data ss:Type="String">Skill Credit Union Code</Data></Cell><Cell><Data ss:Type="String">Skill Credit Union Rate</Data></Cell><Cell><Data ss:Type="String">Skill 401k Code</Data></Cell><Cell><Data ss:Type="String">Skill 401k Rate</Data></Cell><Cell><Data ss:Type="String">Unassigned</Data></Cell><Cell><Data ss:Type="String">Company Name</Data></Cell><Cell><Data ss:Type="String">CO Addr1</Data></Cell><Cell><Data ss:Type="String">CO Addr2</Data></Cell><Cell><Data ss:Type="String">CO City</Data></Cell><Cell><Data ss:Type="String">CO State</Data></Cell><Cell><Data ss:Type="String">CO Zip</Data></Cell><Cell><Data ss:Type="String">Master Agreement</Data></Cell><Cell><Data ss:Type="String">Year</Data></Cell><Cell><Data ss:Type="String">Period</Data></Cell><Cell><Data ss:Type="String">Check</Data></Cell><Cell><Data ss:Type="String">Tran</Data></Cell><Cell><Data ss:Type="String">Key</Data></Cell><Cell><Data ss:Type="String">Union</Data></Cell><Cell><Data ss:Type="String">Gross</Data></Cell><Cell><Data ss:Type="String">Manual Check</Data></Cell></Row>
</Table>
</Worksheet>
</Workbook>
Data is received monthly with 150K+ rows give or take. I used a small subset of about 2K rows to create my M Code, but when applied to the full dataset, I've waited over 30min without any results and just canceled the process.

Here's the M Code I have (It uses Ken Puls' Parameter Table to get a file- local, not on a shared drive)
Code:
let
    SolutionFile = fnGetParameter("File"),
    Source = Xml.Tables(File.Contents(SolutionFile)),
    #"Filtered Rows" = Table.SelectRows(Source, each not Text.Contains([Name], "Styles")),
    Table = #"Filtered Rows"{0}[Table],
    #"Filtered Rows1" = Table.SelectRows(Table, each ([Name] = "Table")),
    Table1 = #"Filtered Rows1"{0}[Table],
    Table2 = Table1{0}[Table],
    #"Removed Columns" = Table.RemoveColumns(Table2,{"urn:schemas-microsoft-com:office:spreadsheet"}),
    AddedIndex = Table.AddIndexColumn(#"Removed Columns", "Index", 0, 1),
    #"Expanded Cell" = Table.ExpandTableColumn(AddedIndex, "Cell", {"Data"}, {"Cell.Data"}),
    #"Expanded Cell.Data" = Table.ExpandTableColumn(#"Expanded Cell", "Cell.Data", {"Element:Text"}, {"Cell.Data.Element:Text"}),
    #"Renamed Columns" = Table.RenameColumns(#"Expanded Cell.Data",{{"Cell.Data.Element:Text", "Data"}, {"Index", "Row"}}),
    #"Added Index" = Table.AddIndexColumn(#"Renamed Columns", "Index", 0, 1),
    #"Inserted Modulo" = Table.AddColumn(#"Added Index", "Inserted Modulo", each Number.Mod([Index], 63), type number),
    #"Removed Columns1" = Table.RemoveColumns(#"Inserted Modulo",{"Index"}),
    #"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Removed Columns1", {{"Inserted Modulo", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Removed Columns1", {{"Inserted Modulo", type text}}, "en-US")[#"Inserted Modulo"]), "Inserted Modulo", "Data"),
    #"Removed Columns2" = Table.RemoveColumns(#"Pivoted Column",{"Row"}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Removed Columns2"),
    Custom1 = Table.RowCount(#"Promoted Headers")
in
    Custom1
Currently I get great performance if I open the file first and save as an Excel file. A bit better performance if I save it as a CSV. Obviously, with those two formats, it's already in table format, I don't have to traverse the XML.

I don't know if I'm parsing the XML correctly or efficiently. I'd really like to be able to use the XML file directly rather than taking the extra "open/save as" route.

Thanks for any tips/tricks.
 

Excel Facts

Links? Where??
If Excel says you have links but you can't find them, go to Formulas, Name Manager. Look for old links to dead workbooks & delete.
Using Table notation, I can get directly to the data I need:
Data = Source{1}
{0}
{0}


But as I afeared, applying that to the original data set, I get this error:
Expression.Error: Evaluation ran out of memory and can't continue.
 
Upvote 0

Forum statistics

Threads
1,215,025
Messages
6,122,732
Members
449,093
Latest member
Mnur

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top