Summing several columns that vary from file to file - Power Query?

Tuta

Board Regular
Joined
Nov 6, 2008
Messages
85
Office Version
  1. 365
Platform
  1. Windows
I receive files that have a similar structure to the example table below. What I need is a resulting 'total' column that tells me the sum of everything from start to the End including the values in those columns as well. So, I would want to see 2,750 as the result

StartABCEnd
10002502502501000

<tbody>
</tbody>






I may receive the next file with a structure similar to the below table:

StartDuckComputerStickPaperFHEnd
10002502502001007501000500

<tbody>
</tbody>





For this file, I'd want to end up with a 'total' and see the result of 4,050

The data will always be numbers, but the column headers will vary widely, both in number of columns between start/end and their descriptions/labels.



This is all easy enough to accomplish in the individual files themselves, but it seems like this might save some time. (and seems to be an interesting challenge)

I'd love to be able to set up a query, save a new file to a folder location, refresh query, and end up seeing something like:

File DateAnswer
02/03/20172,750
02/17/20174,050

<tbody>
</tbody>






Does this seem possible?
 

Excel Facts

Round to nearest half hour?
Use =MROUND(A2,"0:30") to round to nearest half hour. Use =CEILING(A2,"0:30") to round to next half hour.
Hi Tuta,

It seems entirely possible using Power Query. Just a few questions to better understand the situation:

Do the sheets in each file have the same name?
Are there other columns aside from the ones you want to sum?
If so, are the first and last columns you want to sum always named "Start" and "End", respectively?
Do you want to sum only the first row?
 
Upvote 0
Hi Franz!


answers:


Do the sheets in each file have the same name?

  • no. but there is only one sheet


Are there other columns aside from the ones you want to sum?

  • Yes - both before 'start' and after 'end' -- same variability -- sometimes more/fewer columns




If so, are the first and last columns you want to sum always named "Start" and "End", respectively?

  • Yes (but not actually start and end...)






Do you want to sum only the first row?

  • all data rows, actually. sometimes there will be 1000 rows, some files only 600, etc. there is typically a couple of 'grand total' rows at bottom of data
 
Upvote 0
Try adding the following code to Power Query. Let me know if you have any doubts and I'll try to answer tomorrow cause I don't have much time today.

I hope it helps.

Code:
// DummyFolder
let
    Source = 
        Folder.Files( "YourFoldePath" )
in
    Source


// guiSumStartToEnd
let
    Source = 
        DummyFolder as table,
    BinaryFile = 
        Source{0}[Content] as binary,  
    ExploreFile = 
        Excel.Workbook( BinaryFile ) as table,
    OpenSheet = 
        ExploreFile{0}[Data] as table,
    PromoteHeaders = 
        Table.PromoteHeaders( OpenSheet ) as table,
    AllColumnNames = 
        Table.ColumnNames( PromoteHeaders ) as list,
    StartEndPositions = 
        List.Transform( 
            {"Start", "End"}, 
            each List.PositionOf( AllColumnNames, _ ) 
        ) as list,
    ColumnCount = 
        StartEndPositions{1} - StartEndPositions{0} + 1 as number,
    ColumnsToKeep = 
        List.Range( AllColumnNames, StartEndPositions{0}, ColumnCount ) as list,
    KeepColumns = 
        Table.SelectColumns( PromoteHeaders, ColumnsToKeep ) as table,
    ChangeDataType = 
        Table.TransformColumnTypes( 
            KeepColumns, 
            List.Transform( ColumnsToKeep, each {_, type number} ) as list
        ) as table,
    UnPivot = 
        Table.UnpivotOtherColumns( ChangeDataType, {}, "Header", "Value" ) as table,
    ReplaceErrors = Table.ReplaceErrorValues(UnPivot, {{"Value", 0}}),
    SumValues = 
        List.Sum( ReplaceErrors[Value] ) as number
in
    SumValues


// fnSumStartToEnd
( BinaryFile as binary, Start as text, End as text ) as number =>


let
    ExploreFile = 
        Excel.Workbook( BinaryFile ) as table,
    OpenSheet = 
        ExploreFile{ 0 }[Data] as table,
    PromoteHeaders = 
        Table.PromoteHeaders( OpenSheet ) as table,
    AllColumnNames = 
        Table.ColumnNames( PromoteHeaders ) as list,
    StartEndPositions = 
        List.Transform( 
            { Start, End }, 
            each List.PositionOf( AllColumnNames, _ ) 
        ) as list,
    ColumnCount = 
        StartEndPositions{ 1 } - StartEndPositions{ 0 } + 1 as number,
    ColumnsToKeep = 
        List.Range( AllColumnNames, StartEndPositions{ 0 }, ColumnCount ) as list,
    KeepColumns = 
        Table.SelectColumns( PromoteHeaders, ColumnsToKeep ) as table,
    ChangeDataType = 
        Table.TransformColumnTypes( 
            KeepColumns, 
            List.Transform( ColumnsToKeep, each {_, type number } ) as list
        ) as table,
    UnPivot = 
        Table.UnpivotOtherColumns( ChangeDataType, {}, "Header", "Value" ) as table,
    ReplaceErrors = 
        Table.ReplaceErrorValues( UnPivot, { {"Value", 0 } } ),
    SumFromStartToEnd = 
        List.Sum( ReplaceErrors[Value] ) as number
in
    SumFromStartToEnd


// TestFunction
let
    Source = 
        DummyFolder,
    AddSumValues = 
        Table.AddColumn( 
            Source, 
            "SumValues", 
            each fnSumStartToEnd( [Content], "Start", "End" ), 
            type number 
        )
in
    AddSumValues
 
Upvote 0
Hi FranzV,

Actually I was also taking a look at this topic and my intention was to create a video how to do it with the "Combine Binaries" functionality that was recently updated.
But then I saw your code and got really fascinated, as it is unlike any other code I've seen so far.
Especially the additions "as < type ><type>" are new to me; I perfectly understand how it works, but at the same time I wonder what the added value would be: maybe a "best practice" and what would be the origin of such a best practice?

I played around with your code and noticed that the ReplacedErrors is late: after the unpivot, the errors won't be caught anymore by Table.ReplaceErrorValues.
Some time ago I posted a topic on the TechNet forum with a similar phenomenon.

In this case, errors can be caught before the unpivot, so my proposal for the latter part of guiSumStartToEnd would be:

Code:
...
    ChangeDataType = 
        Table.TransformColumnTypes( 
            KeepColumns, 
            List.Transform( ColumnsToKeep, each {_, type number} ) as list
        ) as table,
    ReplaceErrors = 
        Table.ReplaceErrorValues(ChangeDataType,
            List.Transform( ColumnsToKeep, each {_, 0} ) as list
        ) as table,
    UnPivot = 
        Table.UnpivotOtherColumns( ReplaceErrors, {}, "Header", "Value" ) as table,
    SumValues = 
        List.Sum( UnPivot[Value] ) as number
in
    SumValues

Likewise adjustment for fnSumStartToEnd:

Code:
...
    ChangeDataType = 
        Table.TransformColumnTypes( 
            KeepColumns, 
            List.Transform( ColumnsToKeep, each {_, type number } ) as list
        ) as table,
    ReplaceErrors = 
        Table.ReplaceErrorValues(ChangeDataType,
            List.Transform( ColumnsToKeep, each {_, 0} ) as list
        ) as table,
    UnPivot = 
        Table.UnpivotOtherColumns( ReplaceErrors, {}, "Header", "Value" ) as table,
    SumFromStartToEnd = 
        List.Sum( UnPivot[Value] ) as number
in
    SumFromStartToEnd

I don't think it has much added value anymore to create the video as I intended, unless someone (especially Tuta) prefers a solution that is based on code that is mainly generated by the Power Query UI interface.

Regards,
Marcel</type>
 
Last edited:
Upvote 0
All -- I am only now just getting a chance to test these solutions with my actual data. I'll report back later.

Marcel -- I do think it is worth creating your video content. I feel there are many people that prefer/need to see a solution visually step by step. I think there are a lot of users like me that recognize the potential of new/easier solutions with PowerBI products, but still struggle getting 'over the hump' in their learning.

thanks! I'll post later with how things progress.
 
Upvote 0
First of all, I apologize for just throwing the whole code without any explanation. I'll try to explain it first to help Tuta try it and then answer Marcel's post.

As you can see in the code, it has 4 queries:

  • DummyFolder is a simple 'From Folder' query. If you don't want to use all the files in the folder for your calulation, just filter out the extra files. This is 100% User Interface.
  • guiSumStartToEnd uses DummyFolder as its source and then applies all the steps needed to find the sum of all values in the columns between "Start" and "End". Whenever I write a relatively complex custom function
    • I first do it as a regular query for debugging purposes and
    • then tweak the code to turn it into a function (I believe it is common practice to do it that way),
    • BUT I keep a copy of the original query for reference and debugging purposes (that should be very handy if someone tries to understand my function, since steps can easily be previewed).
  • fnSumStartToEnd is the actual custom function that will be invoked to obtain the desired results. It is a twin of guiSumStartToEnd. The only changes are to turn it into a function that accepts 3 arguments:
    • BinaryFile is the file with the values to be added.
    • Start and End are the names of the first and last columns (Tuta told us that they are not actually "Start" and "End", so I turned them into variables).
  • TestFunction references DummyFolder and adds a column that uses fnSumStartToEnd with [Content] column as its BinaryFile argument.

I'd like to think all the steps are very easy to follow. I prefer many simple digestible steps than a few nested functions that I find harder to read. I haven't heard of any performance differences between both coding styles.

Now answering Marcel's post.

Especially the additions "as < type ><type>" are new to me; I perfectly understand how it works, but at the same time I wonder what the added value would be: maybe a "best practice" and what would be the origin of such a best practice?

This helps me remember what type of value is returned by each step. It might seem obvious, but I find it easier when I am writing code or debugging in Notepad++ (using Matt Masson's custom PQ language). I get less 'Cannot convert value X to type Y' since I started using asertions.

I played around with your code and noticed that the ReplacedErrors is late: after the unpivot, the errors won't be caught anymore by Table.ReplaceErrorValues.
Some time ago I posted a topic on the TechNet forum with a similar phenomenon.

In this case, errors can be caught before the unpivot, so my proposal for the latter part of guiSumStartToEnd would be:
</type>

I get your point and would encourage Tuta to use your version.

PS. The function adds all rows, if you want to exclude a totals row you can use Table.RemoveLastN() or divide the final result by 2.
 
Last edited:
Upvote 0
The video is now available As you can see, a lot of additional steps and coding is required in addition to the standard "Combine Binaries" functionality, just to make the solution dynamic.
So, I would say this approach is less suitable in this case and I would recommend the code provided by FranzV (with the adjustments from post #5).
 
Upvote 0

Forum statistics

Threads
1,214,911
Messages
6,122,198
Members
449,072
Latest member
DW Draft

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top