Using FILTER/UNIQUE on text, numbers

danhendo888

Board Regular
Joined
Jul 15, 2019
Messages
142
Office Version
  1. 365
Platform
  1. Windows
I download the below spreadsheet daily from our software. It shows details of invoices paid/booked.
*It doesn't show it below but the first three columns have the little green ribbon on the top left corner of the cell and is aligned to the left of the cell and has General formatting.

Company NumberInvoice NumberOrder NumberOrder DateOrder TaxOrder SubtotalCurrencyPayment Date
1111111111ABC123 y81900901203/12/202410.00100.00AUD
1111111111ABC123 y81900901203/12/202450.00500.00AUD
1111111111ABC123 y81900901203/12/20240.000.00AUD3/14/2024
1111111111ABC123 y81900901203/12/20240.000.00AUD3/14/2024
222222222211122200081900900001/1/20240.0010,000.00AUD


Below is my own spreadsheet. I type in the invoice number and retrieve the Order Number, Order Total, and Payment Date from the downloaded file.
The first column has the little green ribbon at the top left corner also.
The last three columns is the output I wish to achieve
Company NumberInvoice NumberOrder NumberOrder TotalPayment Date
1111111111ABC12381900901206603/14/2024
2222222222111222000819009000010,000.00


Order Number:
To retrieve the Order Number, I use FILTER and UNIQUE. It retrieves it successfully when the invoice number in my spreadsheet has text formatting e.g. ABC123
But it doesn't work when the invoice number in my spreadsheet has number formatting because the invoice numbers in the downloaded file does not have number formatting. How can I make the formula work for both text and numbers.

Also, how can I retrieve the correct order number when the correct invoice number is "ABC123" but it was incorrectly booked as "ABC123 y"

Order Total:
I use two SUMIFS to add the Order Tax and Order Subtotal columns together.
Is there a more efficient way to add these? Or is this okay to use.

Payment Date:
How can I retrieve the payment date for invoice ABC123? The date does not necessarily show in all rows in the extracted data
 
Is your query data in A:B always complete?...that is, you always have a company number and an invoice number on each row? Or do you sometimes have blanks? I think that's why the (INDEX(d, , 1) = cn) * inside FILTER isn't working quite right.
Yes, correct. There will be a company number and an invoice number on each row. And there will be no blanks.

So are you saying that you would like to look for an exact match with inputted company number and a somewhat fuzzy match with inputted invoice number and do the same thing?
Yes, correct. So that if there are two companies (e.g. Tesla, Ford) that have both issued an identical invoice number (ABC123), and my input columns equals cn=Tesla, in=ABC123, the output should display only the invoice details for Tesla and not both Tesla and Ford.
 
Upvote 0

Excel Facts

Show numbers in thousands?
Use a custom number format of #,##0,K. Each comma after the final 0 will divide the displayed number by another thousand
Okay, thanks. So the customer "number" is not always a number? It should be considered as a text data type then?
 
Upvote 0
I've revisited the TEXTJOIN formula and have a version that requires exact matching for Company and fuzzy matching for Invoice Number. I also stepped through each part to address some nagging error conditions, and in doing so, discovered a problem with the original "res" function that uses HSTACK. One of the arguments was "r" so that the "cn" and "in" values would be captured, but that's a problem when more than one order number satisfies the filter criteria. Recall that in "res", we are stacking vertical arrays of uon, son, and pdon (unique order number, sum of common unique order numbers, and latest payment date among the common unique order numbers)...so these arrays have the same length as the number of unique order numbers. Therefore, HSTACK(r... or HSTACK(cn,in... doesn't work. We need a vertical array of each of those values, so some new variables are found in the formula: We determine the count of unique order numbers for each by-row query of customer number/invoice number...that count is variable "con". Then four vertical arrays are formed, each having a length of "con":
  1. an array of delimiters to be used at the end of each row in each order block, called "era", because an array of "//" delimiters is needed for the same reason just mentioned,
  2. an array of blanks used for error trapping, just in case either "cn" or "in" is blank, called "blnks",
  3. an array of "cn" values, called "cna", and
  4. an array of "in" values, called "ina".
Then by changing the error trapping to deal with certain situations (nothing found by FILTER in "f" returns a blank array, and known values 0 or "none" are used by "son" and "pdon", respectively), the formula should be more robust. I also revised the text string manipulation in "sr" somewhat to sequentially tidy up the end of row delimiters that are concatenated with column delimiters during the TEXTJOIN operation.

Here is the data set I've been using for testing:
MrExcel_20240315.xlsx
ABCDEFGH
1Company NumberInvoice NumberOrder NumberOrder DateOrder TaxOrder SubtotalCurrencyPayment Date
2TeslaABC123 y81900901203/12/202410100AUD
3FordABC123 y81900901443/12/202450500AUD3/13/2024
4FordABC123 81900901203/12/2024510AUD3/14/2024
51133OpelABC123 q81900902223/12/2024620AUD3/20/2024
61133OpelABC123 y81900909993/12/20242099AUD3/18/2024
7FiatABC123 81900902223/12/20247100AUD3/15/2024
8777111222z81900900001/1/2024333AUD
9777111222 x81900922221/1/2024444AUD3/16/2024
10VW4411144z81900901111/1/2024555AUD3/17/2024
11VW441114481900908881/1/2024666AUD
12333111552 g81900900001/1/2024777AUD3/8/2024
13888111222
14888111222ord11.512.5AUD3/19/2024
15TeslaABC123 zzz81900901203/12/202410100AUD
16FordABC123 y81900901443/13/2024555AUD3/17/2024
Daily

...and the revised formula:
MrExcel_20240315.xlsx
ABXYZAAABAC
1Company NumberInvoice NumberCompany NumberInvoice NumberOrder NumberOrder TotalPayment Date
2FordABC123FordABC123819009012015.003/14/2024
3777111222FordABC1238190090144610.003/17/2024
4333111552777111222819009000036.00none
5VW4411144777111222819009222248.003/16/2024
6TeslaABC123333111552819009000084.003/8/2024
7xyzVW4411144819009011160.003/17/2024
81133OpelABc123VW4411144819009088872.00none
9FiatABc123TeslaABC1238190090120220.00none
10888111222xyz-none
111133OpelABc123819009022226.003/20/2024
121133OpelABc1238190090999119.003/18/2024
13FiatABc1238190090222107.003/15/2024
14888111222ord114.003/19/2024
15888111222-none
MySummary2
Cell Formulas
RangeFormula
Y2:AC15Y2=LET(daily, Daily!A:H, lrowd, LOOKUP(2, 1 / (INDEX(daily, , 2) <> ""), ROW(INDEX(daily, , 2))), d, Daily!A2:INDEX(Daily!A:H, lrowd, ), mine, A:B, lrowm, LOOKUP(2, 1 / (INDEX(mine, , 2) <> ""), ROW(INDEX(mine, , 2))), m, A2:INDEX(A:B, lrowm, ), intres, BYROW(m, LAMBDA(r, LET( cn, INDEX(r, , 1), in, INDEX(r, , 2), f, FILTER(d, (INDEX(d, , 1) = cn) * ISNUMBER(SEARCH(in, INDEX(d, , 2))), EXPAND("",1,COLUMNS(d),"")), uon, SORT(UNIQUE(INDEX(f, , 3))), con, COUNTA(uon), era, EXPAND("//",con,,"//"), blnks, EXPAND("",con,,""), cna, IFERROR(EXPAND(cn,con,,cn),blnks), ina, IFERROR(EXPAND(in,con,,in),blnks), son, BYROW(uon, LAMBDA(r, IFERROR(SUM(CHOOSECOLS(f, 5, 6) * (CHOOSECOLS(f, 3) = r)), 0))), pdon, BYROW(uon, LAMBDA(r, IFERROR(1 / (1 / MAX(CHOOSECOLS(f, 8) * (CHOOSECOLS(f, 3) = r))), "none"))), res, HSTACK(cna,ina,uon, son, pdon,era), TEXTJOIN("|",0, res) ))), sr, DROP(TEXTSPLIT(CONCAT(SUBSTITUTE(SUBSTITUTE(intres, "|//|", "//"),"|//","//")), "|", "//",0),-1), finres, HSTACK(CHOOSECOLS(sr, 1,2,3), --CHOOSECOLS(sr,4), IFERROR(--CHOOSECOLS(sr,5),"none")), finres )
Dynamic array formulas.
 
Upvote 1
1)
How is cna, IFERROR(EXPAND(cn,con,,cn),blnks) able to determine the correct number of times each cn is displayed in Column Y?
cn is the first column of m. each cn appears only once in m. Also, con > cn.
I'm unsure how these produce Ford to appear twice, 777 to appear twice etc in Column Y

2)
intres, BYROW(m, LAMBDA(r,
. . .
son, BYROW(uon, LAMBDA(r, IFERROR(SUM(CHOOSECOLS(f, 5, 6) * (CHOOSECOLS(f, 3) = r)), 0))),
. . .

Regarding the BYROW within a BYROW
son is a one column array produced by feeding uon into (f, 5, 6)
Is m then fed into son? The part unclear to me is that m is just the cn and in, how does it know what to do when it is fed into son?
 
Upvote 0
Good questions. There is a lot happening in this formula. The outer BYROW is feeding row-by-row from m...so a single row of your inputs in columns A:B (e.g., {"Ford","ABC123"}). Ford is assigned to cn, and ABC123 is assigned to in. Then f is the FILTER function is extract from d (the main data on the Daily worksheet) all entries that satisfy the filter criteria (exact match for cn and fuzzy match for in). Now within this subset of data (f), we may have multiple order numbers associated with the combination of cn and in, but for the final output we need to aggregate by order number, so we begin a series of calculations that draw information from f.

Since the intermediate results need to list by order numbers that are unique to the cn/in combination, we first determine which unique order numbers are present in f...that's uon. So for {"Ford","ABC123"}, there are 3 rows of data in Daily, and among these three lines, there are two unique order numbers (two entries for 8190090144, and one entry for 8190090120). So the number of rows of uon determines the number of output rows for this cn/in combination. Therefore con = COUNTA(uon) is the count needed, and we use con to construct four arrays of the correct size (for era, blnks, cna, ina). In this example, then, con=2...there are two unique order numbers, and ultimately we want vertical column arrays that are two rows high so that we can assemble these vertical arrays for the intermediate results (intres). Recall that Excel interprets array structure based on commas as column separators and semicolons as row separators. So in the event that an array of blanks might be needed, blnks, EXPAND("",con,,"") produces an array {"";""} in this case, and this is treated as a one column array having two rows and all elements are blanks. If you're not familiar with EXPAND, the 1st argument represents the starting array (in this case a single value), and we expand the size of the starting array to a total size of so many rows (2nd argument) and so many columns (3rd argument), and these newly inserted array elements would show up as errors unless we pad them out with some value (4th argument)...so to repeat the same value, we use our starting value as the pad value.
Then cna, is formed in a similar manner... IFERROR(EXPAND(cn,con,,cn),blnks) produces {"Ford";"Ford"}, unless cn is blank, which will cause EXPAND to generate an error, so we trap the error by using the blnks array just mentioned. The same idea is applied to generate the ina and era column arrays. This is a rather long-winded answer to question 1, but it explains how we can take a single cn/in input and expand the "cn" and "in" values the correct number of times "con" so that they can be merged with the unique order numbers and other summary results.

Now within "intres", yes, we feed "m" into the outermost BYROW, and this row {"Ford","ABC123"} (note the comma indicating a two-column array having only one row) is assigned to "r"...and we use that "r" in the assignment of the "cn" and "in" variables. But then we come to "son" and "pdon" where we enter new BYROW functions. An important note here: the "r" that you see within a LAMBDA function is entirely self-contained within that LAMBDA. In other words, the "r" seen inside the "son" expression is a row-by-row assignment of "uon" and that "r" is recognized only out to its LAMBDA where the input row of uon is mapped to the variable r. This "r" is not the same r found in the outer intres (BYROW(m, LAMBDA(r,..." expression, and its not the same r found in the pdon expression. This is a convention I tend to use to distinguish rows (r) from columns (c), but the internal assignment for the mapping inside a LAMBDA is entirely arbitrary. For example, you could change "son" and "pdon" to
son, BYROW(uon, LAMBDA(x, IFERROR(SUM(CHOOSECOLS(f, 5, 6) * (CHOOSECOLS(f, 3) = x)), 0))),
pdon, BYROW(uon, LAMBDA(y, IFERROR(1 / (1 / MAX(CHOOSECOLS(f, 8) * (CHOOSECOLS(f, 3) = y))), "none"))),
without any loss of functionality, as they mean exactly the same thing as in the current formula.
Again, this is probably a long-winded answer to question 2, but it is uon that is fed row-by-row into the son and pdon calculators in order to perform the aggregation sum and identify the max date for each unique order number. Each of these (son and pdon) produce a single result for each row (indeed, BYROW will only deliver a single result), so son is a vertical array of sums that correspond to each unique order number fed to BYROW; and pdon is also a vertical array of dates that correspond to each unique order number fed to it.

Let me know if this clears up the confusion.
 
Upvote 0

Forum statistics

Threads
1,215,077
Messages
6,122,995
Members
449,094
Latest member
masterms

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top