A more efficient approach to my data cleansing task

Yamezz

Active Member
Joined
Nov 22, 2006
Messages
336
Office Version
  1. 2019
I have some exported files to work with, collating and summing data for individual SKUs.
A PaymentType file has details of the type of payment(s) made for each order number. Most order numbers have just one payment. The odd one has multiple payments applied to the one order.
Another file has the OrderDetails (SKUs and prices) that constitute those orders. For those orders with just one payment, it's simple to sum the SKUs, however, for the orders that have multiple payments applied, the OrderDetails file records multiple instances of the SKUs (on a new line). If I sold one Apple in an order with one payment, then only one Apple shows in the OrderDetails file, but if three part-payments were made for an order for one Orange, then the OrderDetails file shows three records for one Orange, with the same order number.

i.e.
OrderSKUQuantity
#1135Apple1
#1136Orange1
#1136Orange1
#1136Orange1


The only way I know how to accomplish correcting the summed data is as follows:
Use Application.WorksheetFunction.CountIF to examine the PaymentType file for duplicates and create a temporary table with the order number and number of payments.
Step through each line of the OrderDetails file and use Application.WorksheetFunction.VLookup to check each order number against the temporary table and then divide the values in the SKU column by the number of instances in the temporary table. This would mean, in the example above, I would get 1 Apple/1 occurance = 1 Apple; and each of the three lines for Orange would be changed to 1/3 Orange, so the final sum (with later code) would be 3 * 1/3 = 1 Orange.

Is there a more efficient way for me to achieve the same result?
 

Excel Facts

Fastest way to copy a worksheet?
Hold down the Ctrl key while dragging tab for Sheet1 to the right. Excel will make a copy of the worksheet.
Perhaps something like this?

ABCDEF
1OrderSKUQuantityNo orders2
2#1135Apple4
3#1135Apple4Totals
4#1135Orange2Apple4
5#1135Orange2Banana7
6#1135Banana7Orange10
7#1135Banana7
8#1136Orange8
9#1136Orange8
10#1136Orange8
Sheet1
Cell Formulas
RangeFormula
F1F1=SUMPRODUCT(1/COUNTIF(A2:A10,A2:A10))
F4:F6F4=SUMPRODUCT(C$2:C$10,(B$2:B$10=E4)/COUNTIFS(A$2:A$10,A$2:A$10,B$2:B$10,B$2:B$10))

(But this doesn't sound like the ideal way to calculate quantities. If the payments data varies in any way from what you've described, you may get incorrect results)
 
Upvote 0
Thanks Stephen.

I've tried your formulas on some actual data and have identified an instance of an incorrect result.
Shopify to Xero.xlsm
ABCDEFGH
1OrderSKUQuantityPriceNo orders6
2#1133Apple15.5
3#1134Banana130SKUTotalsShould Be
4#1134Cherry142Apple11
5#1134Durian215Banana11
6#1134Elderberry130Cherry33
7#1135Fig135Durian22
8#1136Cherry242Elderberry11
9#1136Cherry242Fig22
10#1136Grape212Grape3.57
11#1136Grape212Honeydew11
12#1136Honeydew110Kiwi22
13#1136Honeydew110Lemon22
14#1136Kiwi230Mango11
15#1136Kiwi230
16#1136Lemon210
17#1136Lemon210
18#1136Grape512
19#1136Grape512
20#1137Mango122
21#1138Fig135
Sheet2
Cell Formulas
RangeFormula
G1G1=SUMPRODUCT(1/COUNTIF(A2:A21,A2:A21))
G4:G14G4=SUMPRODUCT(C$2:C$21,(B$2:B$21=F4)/COUNTIFS(A$2:A$21,A$2:A$21,B$2:B$21,B$2:B$21))


In this data, order #1136 was paid in two payments, hence the troublesome doubling up of each line. I have no idea why the output I'm dealing with lists two more instances of Grape (rather than two identical lines with Grape - 7). When looking at Grape, the logic should effectively be row 11 is a duplicate of row 10, and Row 19 is a duplicate of row 18, hence the total is 2 + 5 = 7.

Do you think there's a way to code for this ugly data I have to deal with?
 
Upvote 0
Do you think there's a way to code for this ugly data I have to deal with?
Ha ha, hence my comment:
(But this doesn't sound like the ideal way to calculate quantities. If the payments data varies in any way from what you've described, you may get incorrect results)

With this small example, it's obvious by inspection that the answer should be 2 + 5 = 7 grapes, because there are other SKUs in #1136, all duplicated.

But it's not quite as easy to code in a formula. What if the second occurrence was also 2 grapes, rather than 5? How does the formula determine that this is two occurrences, paid twice, vs one occurrence paid four times?

And I'm sure there are many other ways your data will not behave. if I was checking this, I'd be sorting and analysing in some detail to identify any data anomalies. VBA would be useful if this was something that needed to be done regularly. But you'll never have complete certainty, e.g. is the answer 2 or 4 for ... ?

OrderSKUQuantityPrice
#1133Apple211
#1133Apple211

I'm curious, why are you calculating SKU numbers using payment data, rather than order data?

Payment data is all about $ - how much have we been paid, how much is due / overdue etc.

The question you're answering is - out of this list of payments, how many apples/oranges were in the orders, and hence how many have been paid for or part paid for? Why would anyone need to know this?
 
Upvote 0
The intent of all this is to form part of a rather large macro taking Shopify data and cleansing/collating it into a format that the Xero accounting software can read.
I had it all working fine until I realised that some customers had split the bill, so we were ending up with doubled up values where an order# was appearing twice.

My original idea (in post 1) would hopefully take care of the anomalies like #1136 - by counting how many times the order was paid and dividing all rows of data by that count. So for #1136, the Application.WorksheetFunction.SumIFs code I've already written would sum 1,1,2.5,2.5 to get the correct total of 7 Grapes.

Unfortunately no one report out of Shopify contains all the data I need, so I'm working with 3 different reports to do this. Hence why I was hoping there would be a more efficient approach than mine suggested in post #1, which involves temporary lists and calculations for every row of data.
 
Upvote 0

Forum statistics

Threads
1,215,069
Messages
6,122,956
Members
449,096
Latest member
Anshu121

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top