How to remove duplicates based on several columns, summate values and calculate weightings

chrisab

New Member
Joined
Nov 16, 2016
Messages
3
Hi all,

I've been trying to work out how to do this and have done a lot of google research but can't find the answer to my particular task. So I was hoping for some help please :)

I've got a lot of data (200k lines) that I need to perform some analysis on. This includes removing duplicates based on several columns, summating the values of certain columns in each individual group of rows that were duplicates and then performing some further calculations and adding of additional columns.

I can use the remove duplicates command in Excel (2010) but can't do the additional processes. I think I could use an array sumif on the raw data based on the processed (removed duplicates) data to find the summated values but thought there must be a better way of doing it!

I created a dummy excel file showing before and after but can I not upload it?

I've copied the data from excel below but sorry for the formatting. The remove duplicates needs to be based on columns (Eng,Code,NORM and Job Number), columns NORM thru to units actual (conv) are to be summated for each consolidated row.

I then need to calculate, for each processed row (which for a given Eng, Code and Norm will be differentiated by Job number) the weighted % of units and Desc (G or R) - this is to be weighted by the SUM of units (converted) or SUM of units actual (converted) for the Eng and Code that the job number belongs to.

Hope this makes sense but can put down the calculations (shame I can't upload the file) - I've added in a brief desc of the calcs below.

Many thanks for any help!
BEFORE
EngCodeNORMConversionUnitsUnits (converted)Norm ActualUnits ActualUnits Actual (conv)DescJob Number
Loc 1 TME75210101003880 G1992
Loc 1 TME75210101004880 R1992
Loc 1 TME75210770210100 G1992
Loc 1 TME751584041260 R1991
Loc 1 TME751515751.51680 G1991
Loc 1 TME7531181831919 G1400

<tbody>
</tbody><colgroup><col><col><col span="3"><col><col span="2"><col><col><col></colgroup>

AFTERWeighted by units (conv)Weighted by actual units (conv)
EngCodeNORMConversionUnitsUnits (converted)Norm ActualUnits ActualUnits Actual (conv)Job NumberUnits (conv) %% G% RUnits actual (conv) %% G% R
Loc 1 TME7521027270926260 199267% (this is 270/(270+115+18)63% (this is (100+70)/100+70+100)37%62%69%31%
Loc 1 TME7515231155.528140 199129%65%35%33%58%42%
Loc 1 TME7531181831919 14004%100%0%5%75%25%

<tbody>
</tbody><colgroup><col><col><col span="3"><col><col span="2"><col><col><col><col><col><col><col><col><col></colgroup>



Thanks,
Chris
 

Excel Facts

Which came first: VisiCalc or Lotus 1-2-3?
Dan Bricklin and Bob Frankston debuted VisiCalc in 1979 as a Visible Calculator. Lotus 1-2-3 debuted in the early 1980's, from Mitch Kapor.
I have not thought about the exact calculation you need since i think after this they will be "normal" formulas.

On your raw data add two columns, with one being: =Eng & Code & Norm & Job Number
And the other a countif for that whole column, using as criteria the cell on the left (the lookup on that row), which will give you an idea of how many times you have the values. This formula is not decisive for the rest of the job, but since you're doing an analyses, it's probably a good info to hold.

With this you can do simples sumifs (I've tried it the the units column and works fine). Personally, I would copy that lookup column to a new sheet, remove duplicates (you would need to do this step from time to time but as long as the formulas are functional should work correctly), and then decompose the lookup with lefts and middles and rights to also get the information in that table (on the first formula use "-" as a separator, it would help if you try to follow this advice).
 
Upvote 0
I have not thought about the exact calculation you need since i think after this they will be "normal" formulas.

On your raw data add two columns, with one being: =Eng & Code & Norm & Job Number
And the other a countif for that whole column, using as criteria the cell on the left (the lookup on that row), which will give you an idea of how many times you have the values. This formula is not decisive for the rest of the job, but since you're doing an analyses, it's probably a good info to hold.

With this you can do simples sumifs (I've tried it the the units column and works fine). Personally, I would copy that lookup column to a new sheet, remove duplicates (you would need to do this step from time to time but as long as the formulas are functional should work correctly), and then decompose the lookup with lefts and middles and rights to also get the information in that table (on the first formula use "-" as a separator, it would help if you try to follow this advice).

Hi Ruca13,

Thanks for the reply. I have followed your advice but as expected evern doing the first column of sumif's on 200k of lines is taking ages to calculate (only got to 5% after 10mins). This just isn't going to work unfortunately, even as a one off, as we need the spreadsheet to be edited in future.

There must be a more efficient way of doing it - possibly VBA? The remove duplicates function based on the 4 columns works in seconds so presumably adding in the sums wouldn't add too much more time? I just don't know how to code in VBA!

Thanks,
 
Upvote 0
Maybe the other option is to create the unique identifier columns as you say and then use a pivot table to calculate the sums, then copy that into another sheet and use a lookup to pull the information.

Not very elegant though!
 
Upvote 0

Forum statistics

Threads
1,215,465
Messages
6,124,982
Members
449,201
Latest member
Lunzwe73

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top