Using FILTER/UNIQUE on text, numbers

danhendo888 · Mar 16, 2024

I download the below spreadsheet daily from our software. It shows details of invoices paid/booked.
*It doesn't show it below but the first three columns have the little green ribbon on the top left corner of the cell and is aligned to the left of the cell and has General formatting.



Company Number	Invoice Number	Order Number	Order Date	Order Tax	Order Subtotal	Currency	Payment Date
1111111111	ABC123 y	8190090120	3/12/2024	10.00	100.00	AUD
1111111111	ABC123 y	8190090120	3/12/2024	50.00	500.00	AUD
1111111111	ABC123 y	8190090120	3/12/2024	0.00	0.00	AUD	3/14/2024
1111111111	ABC123 y	8190090120	3/12/2024	0.00	0.00	AUD	3/14/2024

2222222222	111222000	8190090000	1/1/2024	0.00	10,000.00	AUD

Below is my own spreadsheet. I type in the invoice number and retrieve the Order Number, Order Total, and Payment Date from the downloaded file.
The first column has the little green ribbon at the top left corner also.
The last three columns is the output I wish to achieve



Company Number	Invoice Number	Order Number	Order Total	Payment Date
1111111111	ABC123	8190090120	660	3/14/2024
2222222222	111222000	8190090000	10,000.00

Order Number:
To retrieve the Order Number, I use FILTER and UNIQUE. It retrieves it successfully when the invoice number in my spreadsheet has text formatting e.g. ABC123
But it doesn't work when the invoice number in my spreadsheet has number formatting because the invoice numbers in the downloaded file does not have number formatting. How can I make the formula work for both text and numbers.

Also, how can I retrieve the correct order number when the correct invoice number is "ABC123" but it was incorrectly booked as "ABC123 y"

Order Total:
I use two SUMIFS to add the Order Tax and Order Subtotal columns together.
Is there a more efficient way to add these? Or is this okay to use.

Payment Date:
How can I retrieve the payment date for invoice ABC123? The date does not necessarily show in all rows in the extracted data

danhendo888 · Mar 20, 2024

KRice said:
Is your query data in A:B always complete?...that is, you always have a company number and an invoice number on each row? Or do you sometimes have blanks? I think that's why the (INDEX(d, , 1) = cn) * inside FILTER isn't working quite right.

Yes, correct. There will be a company number and an invoice number on each row. And there will be no blanks.

KRice said:
So are you saying that you would like to look for an exact match with inputted company number and a somewhat fuzzy match with inputted invoice number and do the same thing?

Yes, correct. So that if there are two companies (e.g. Tesla, Ford) that have both issued an identical invoice number (ABC123), and my input columns equals cn=Tesla, in=ABC123, the output should display only the invoice details for Tesla and not both Tesla and Ford.

KRice · Mar 20, 2024

Okay, thanks. So the customer "number" is not always a number? It should be considered as a text data type then?

KRice · Mar 21, 2024

I've revisited the TEXTJOIN formula and have a version that requires exact matching for Company and fuzzy matching for Invoice Number. I also stepped through each part to address some nagging error conditions, and in doing so, discovered a problem with the original "res" function that uses HSTACK. One of the arguments was "r" so that the "cn" and "in" values would be captured, but that's a problem when more than one order number satisfies the filter criteria. Recall that in "res", we are stacking vertical arrays of uon, son, and pdon (unique order number, sum of common unique order numbers, and latest payment date among the common unique order numbers)...so these arrays have the same length as the number of unique order numbers. Therefore, HSTACK(r... or HSTACK(cn,in... doesn't work. We need a vertical array of each of those values, so some new variables are found in the formula: We determine the count of unique order numbers for each by-row query of customer number/invoice number...that count is variable "con". Then four vertical arrays are formed, each having a length of "con":

an array of delimiters to be used at the end of each row in each order block, called "era", because an array of "//" delimiters is needed for the same reason just mentioned,
an array of blanks used for error trapping, just in case either "cn" or "in" is blank, called "blnks",
an array of "cn" values, called "cna", and
an array of "in" values, called "ina".

Then by changing the error trapping to deal with certain situations (nothing found by FILTER in "f" returns a blank array, and known values 0 or "none" are used by "son" and "pdon", respectively), the formula should be more robust. I also revised the text string manipulation in "sr" somewhat to sequentially tidy up the end of row delimiters that are concatenated with column delimiters during the TEXTJOIN operation.

Here is the data set I've been using for testing:

MrExcel_20240315.xlsx

A

B

C

D

E

F

G

H

1

Company Number

Invoice Number

Order Number

Order Date

Order Tax

Order Subtotal

Currency

Payment Date

2

Tesla

ABC123 y

8190090120

3/12/2024

10

100

AUD

3

Ford

ABC123 y

8190090144

3/12/2024

50

500

AUD

3/13/2024

4

Ford

ABC123

8190090120

3/12/2024

5

10

AUD

3/14/2024

5

1133Opel

ABC123 q

8190090222

3/12/2024

6

20

AUD

3/20/2024

6

1133Opel

ABC123 y

8190090999

3/12/2024

20

99

AUD

3/18/2024

7

Fiat

ABC123

8190090222

3/12/2024

7

100

AUD

3/15/2024

8

777

111222z

8190090000

1/1/2024

3

33

AUD

9

777

111222 x

8190092222

1/1/2024

4

44

AUD

3/16/2024

10

VW44

11144z

8190090111

1/1/2024

5

55

AUD

3/17/2024

11

VW44

11144

8190090888

1/1/2024

6

66

AUD

12

333

111552 g

8190090000

1/1/2024

7

77

AUD

3/8/2024

13

888

111222

14

888

111222

ord1

1.5

12.5

AUD

3/19/2024

15

Tesla

ABC123 zzz

8190090120

3/12/2024

10

100

AUD

16

Ford

ABC123 y

8190090144

3/13/2024

5

55

AUD

3/17/2024

Daily

...and the revised formula:

MrExcel_20240315.xlsx

A

B

X

Y

Z

AA

AB

AC

1

Company Number

Invoice Number

Company Number

Invoice Number

Order Number

Order Total

Payment Date

2

Ford

ABC123

Ford

ABC123

8190090120

15.00

3/14/2024

3

777

111222

Ford

ABC123

8190090144

610.00

3/17/2024

4

333

111552

777

111222

8190090000

36.00

none

5

VW44

11144

777

111222

8190092222

48.00

3/16/2024

6

Tesla

ABC123

333

111552

8190090000

84.00

3/8/2024

7

xyz

VW44

11144

8190090111

60.00

3/17/2024

8

1133Opel

ABc123

VW44

11144

8190090888

72.00

none

9

Fiat

ABc123

Tesla

ABC123

8190090120

220.00

none

10

888

111222

xyz

-

none

11

1133Opel

ABc123

8190090222

26.00

3/20/2024

12

1133Opel

ABc123

8190090999

119.00

3/18/2024

13

Fiat

ABc123

8190090222

107.00

3/15/2024

14

888

111222

ord1

14.00

3/19/2024

15

888

111222

-

none

MySummary2

Cell Formulas
Range	Formula
Y2:AC15	Y2	=LET(daily, Daily!A:H, lrowd, LOOKUP(2, 1 / (INDEX(daily, , 2) <> ""), ROW(INDEX(daily, , 2))), d, Daily!A2:INDEX(Daily!A:H, lrowd, ), mine, A:B, lrowm, LOOKUP(2, 1 / (INDEX(mine, , 2) <> ""), ROW(INDEX(mine, , 2))), m, A2:INDEX(A:B, lrowm, ), intres, BYROW(m, LAMBDA(r, LET( cn, INDEX(r, , 1), in, INDEX(r, , 2), f, FILTER(d, (INDEX(d, , 1) = cn) * ISNUMBER(SEARCH(in, INDEX(d, , 2))), EXPAND("",1,COLUMNS(d),"")), uon, SORT(UNIQUE(INDEX(f, , 3))), con, COUNTA(uon), era, EXPAND("//",con,,"//"), blnks, EXPAND("",con,,""), cna, IFERROR(EXPAND(cn,con,,cn),blnks), ina, IFERROR(EXPAND(in,con,,in),blnks), son, BYROW(uon, LAMBDA(r, IFERROR(SUM(CHOOSECOLS(f, 5, 6) * (CHOOSECOLS(f, 3) = r)), 0))), pdon, BYROW(uon, LAMBDA(r, IFERROR(1 / (1 / MAX(CHOOSECOLS(f, 8) * (CHOOSECOLS(f, 3) = r))), "none"))), res, HSTACK(cna,ina,uon, son, pdon,era), TEXTJOIN("\|",0, res) ))), sr, DROP(TEXTSPLIT(CONCAT(SUBSTITUTE(SUBSTITUTE(intres, "\|//\|", "//"),"\|//","//")), "\|", "//",0),-1), finres, HSTACK(CHOOSECOLS(sr, 1,2,3), --CHOOSECOLS(sr,4), IFERROR(--CHOOSECOLS(sr,5),"none")), finres )
Dynamic array formulas.

danhendo888 · Mar 22, 2024

1)
How is cna, IFERROR(EXPAND(cn,con,,cn),blnks) able to determine the correct number of times each cn is displayed in Column Y?
cn is the first column of m. each cn appears only once in m. Also, con > cn.
I'm unsure how these produce Ford to appear twice, 777 to appear twice etc in Column Y

2)
intres, BYROW(m, LAMBDA(r,
. . .
son, BYROW(uon, LAMBDA(r, IFERROR(SUM(CHOOSECOLS(f, 5, 6) * (CHOOSECOLS(f, 3) = r)), 0))),
. . .

Regarding the BYROW within a BYROW
son is a one column array produced by feeding uon into (f, 5, 6)
Is m then fed into son? The part unclear to me is that m is just the cn and in, how does it know what to do when it is fed into son?

KRice · Mar 22, 2024

Good questions. There is a lot happening in this formula. The outer BYROW is feeding row-by-row from m...so a single row of your inputs in columns A:B (e.g., {"Ford","ABC123"}). Ford is assigned to cn, and ABC123 is assigned to in. Then f is the FILTER function is extract from d (the main data on the Daily worksheet) all entries that satisfy the filter criteria (exact match for cn and fuzzy match for in). Now within this subset of data (f), we may have multiple order numbers associated with the combination of cn and in, but for the final output we need to aggregate by order number, so we begin a series of calculations that draw information from f.

Since the intermediate results need to list by order numbers that are unique to the cn/in combination, we first determine which unique order numbers are present in f...that's uon. So for {"Ford","ABC123"}, there are 3 rows of data in Daily, and among these three lines, there are two unique order numbers (two entries for 8190090144, and one entry for 8190090120). So the number of rows of uon determines the number of output rows for this cn/in combination. Therefore con = COUNTA(uon) is the count needed, and we use con to construct four arrays of the correct size (for era, blnks, cna, ina). In this example, then, con=2...there are two unique order numbers, and ultimately we want vertical column arrays that are two rows high so that we can assemble these vertical arrays for the intermediate results (intres). Recall that Excel interprets array structure based on commas as column separators and semicolons as row separators. So in the event that an array of blanks might be needed, blnks, EXPAND("",con,,"") produces an array {"";""} in this case, and this is treated as a one column array having two rows and all elements are blanks. If you're not familiar with EXPAND, the 1st argument represents the starting array (in this case a single value), and we expand the size of the starting array to a total size of so many rows (2nd argument) and so many columns (3rd argument), and these newly inserted array elements would show up as errors unless we pad them out with some value (4th argument)...so to repeat the same value, we use our starting value as the pad value.
Then cna, is formed in a similar manner... IFERROR(EXPAND(cn,con,,cn),blnks) produces {"Ford";"Ford"}, unless cn is blank, which will cause EXPAND to generate an error, so we trap the error by using the blnks array just mentioned. The same idea is applied to generate the ina and era column arrays. This is a rather long-winded answer to question 1, but it explains how we can take a single cn/in input and expand the "cn" and "in" values the correct number of times "con" so that they can be merged with the unique order numbers and other summary results.

Now within "intres", yes, we feed "m" into the outermost BYROW, and this row {"Ford","ABC123"} (note the comma indicating a two-column array having only one row) is assigned to "r"...and we use that "r" in the assignment of the "cn" and "in" variables. But then we come to "son" and "pdon" where we enter new BYROW functions. An important note here: the "r" that you see within a LAMBDA function is entirely self-contained within that LAMBDA. In other words, the "r" seen inside the "son" expression is a row-by-row assignment of "uon" and that "r" is recognized only out to its LAMBDA where the input row of uon is mapped to the variable r. This "r" is not the same r found in the outer intres (BYROW(m, LAMBDA(r,..." expression, and its not the same r found in the pdon expression. This is a convention I tend to use to distinguish rows (r) from columns (c), but the internal assignment for the mapping inside a LAMBDA is entirely arbitrary. For example, you could change "son" and "pdon" to
son, BYROW(uon, LAMBDA(x, IFERROR(SUM(CHOOSECOLS(f, 5, 6) * (CHOOSECOLS(f, 3) = x)), 0))),
pdon, BYROW(uon, LAMBDA(y, IFERROR(1 / (1 / MAX(CHOOSECOLS(f, 8) * (CHOOSECOLS(f, 3) = y))), "none"))),
without any loss of functionality, as they mean exactly the same thing as in the current formula.
Again, this is probably a long-winded answer to question 2, but it is uon that is fed row-by-row into the son and pdon calculators in order to perform the aggregation sum and identify the max date for each unique order number. Each of these (son and pdon) produce a single result for each row (indeed, BYROW will only deliver a single result), so son is a vertical array of sums that correspond to each unique order number fed to BYROW; and pdon is also a vertical array of dates that correspond to each unique order number fed to it.

Let me know if this clears up the confusion.

Using FILTER/UNIQUE on text, numbers

danhendo888

Board Regular

danhendo888

Board Regular

Excel Facts

KRice

Well-known Member

KRice

Well-known Member

danhendo888

Board Regular

KRice

Well-known Member

Similar threads

Forum statistics

Share this page

Using FILTER/UNIQUE on text, numbers

danhendo888

Board Regular

danhendo888

Board Regular

Excel Facts

KRice

Well-known Member

KRice

Well-known Member

danhendo888

Board Regular

KRice

Well-known Member

Similar threads

Forum statistics

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock