Using FILTER/UNIQUE on text, numbers

danhendo888

Board Regular
Joined
Jul 15, 2019
Messages
153
Office Version
  1. 365
Platform
  1. Windows
I download the below spreadsheet daily from our software. It shows details of invoices paid/booked.
*It doesn't show it below but the first three columns have the little green ribbon on the top left corner of the cell and is aligned to the left of the cell and has General formatting.

Company NumberInvoice NumberOrder NumberOrder DateOrder TaxOrder SubtotalCurrencyPayment Date
1111111111ABC123 y81900901203/12/202410.00100.00AUD
1111111111ABC123 y81900901203/12/202450.00500.00AUD
1111111111ABC123 y81900901203/12/20240.000.00AUD3/14/2024
1111111111ABC123 y81900901203/12/20240.000.00AUD3/14/2024
222222222211122200081900900001/1/20240.0010,000.00AUD


Below is my own spreadsheet. I type in the invoice number and retrieve the Order Number, Order Total, and Payment Date from the downloaded file.
The first column has the little green ribbon at the top left corner also.
The last three columns is the output I wish to achieve
Company NumberInvoice NumberOrder NumberOrder TotalPayment Date
1111111111ABC12381900901206603/14/2024
2222222222111222000819009000010,000.00


Order Number:
To retrieve the Order Number, I use FILTER and UNIQUE. It retrieves it successfully when the invoice number in my spreadsheet has text formatting e.g. ABC123
But it doesn't work when the invoice number in my spreadsheet has number formatting because the invoice numbers in the downloaded file does not have number formatting. How can I make the formula work for both text and numbers.

Also, how can I retrieve the correct order number when the correct invoice number is "ABC123" but it was incorrectly booked as "ABC123 y"

Order Total:
I use two SUMIFS to add the Order Tax and Order Subtotal columns together.
Is there a more efficient way to add these? Or is this okay to use.

Payment Date:
How can I retrieve the payment date for invoice ABC123? The date does not necessarily show in all rows in the extracted data
 
is it possible to search not just for the invoice number and but company number AND the invoice number (for instances where the same invoice number is issued by two different companies?
Yes. I was curious about this...whether an invoice was always associated with just one company. I guess not. I'm concerned about some error trapping. Is your query data in A:B always complete?...that is, you always have a company number and an invoice number on each row? Or do you sometimes have blanks? I think that's why the (INDEX(d, , 1) = cn) * inside FILTER isn't working quite right.

And on a related note...this appears in the Daily worksheet:
MrExcel_20240315.xlsx
AB
1Company NumberInvoice Number
21111111111ABC123 y
31111111111ABC123 y
41111111111ABC123
51111111111ABC123 y
6ABC123
72222222222111222000z
Daily

Are there sometimes blanks under the Company Number column? If so, those wouldn't be picked up unless we ignore the "cn" part of the filtering array for blanks and just go by invoice number.
 
Upvote 0
is it possible to search not just for the invoice number but by company number AND the invoice number (for instances where the same invoice number is issued by two different companies?
Some clarification needed here. The current formula takes an inputted invoice number and includes variations of that invoice number, provided the inputted invoice number text is found within the invoice number text on the Daily data table. And this subset of Daily data is then grouped by order number and reported back using whatever company number and invoice number you've specified in "m"...in columns A:B in your input section. So are you saying that you would like to look for an exact match with inputted company number and a somewhat fuzzy match with inputted invoice number and do the same thing?...group that subset of Daily data by order number for final outputs? This was my interpretation, but wanted to clarify.
 
Upvote 0
Is your query data in A:B always complete?...that is, you always have a company number and an invoice number on each row? Or do you sometimes have blanks? I think that's why the (INDEX(d, , 1) = cn) * inside FILTER isn't working quite right.
Yes, correct. There will be a company number and an invoice number on each row. And there will be no blanks.

So are you saying that you would like to look for an exact match with inputted company number and a somewhat fuzzy match with inputted invoice number and do the same thing?
Yes, correct. So that if there are two companies (e.g. Tesla, Ford) that have both issued an identical invoice number (ABC123), and my input columns equals cn=Tesla, in=ABC123, the output should display only the invoice details for Tesla and not both Tesla and Ford.
 
Upvote 0

Excel Facts

Add Bullets to Range
Select range. Press Ctrl+1. On Number tab, choose Custom. Type Alt+7 then space then @ sign (using 7 on numeric keypad)
Okay, thanks. So the customer "number" is not always a number? It should be considered as a text data type then?
 
Upvote 0
1)
How is cna, IFERROR(EXPAND(cn,con,,cn),blnks) able to determine the correct number of times each cn is displayed in Column Y?
cn is the first column of m. each cn appears only once in m. Also, con > cn.
I'm unsure how these produce Ford to appear twice, 777 to appear twice etc in Column Y

2)
intres, BYROW(m, LAMBDA(r,
. . .
son, BYROW(uon, LAMBDA(r, IFERROR(SUM(CHOOSECOLS(f, 5, 6) * (CHOOSECOLS(f, 3) = r)), 0))),
. . .

Regarding the BYROW within a BYROW
son is a one column array produced by feeding uon into (f, 5, 6)
Is m then fed into son? The part unclear to me is that m is just the cn and in, how does it know what to do when it is fed into son?
 
Upvote 0
Hi Kirk,
Thank you for the explanation and apologies for not asking these questions sooner when these formulas would have been fresh in our minds.

1) Is there a way/trick to calculate parts of the formula e.g. f, cna or ina? I'm fine with deleting parts of the formula just to calculate for these items and then CTRL+Z to restore the formula. I tried this but still cannot calculate for f, cna, ina.

2) Regarding the outer BYROW, I can see that it is being applied to cn and in. And that it is not applied in uon. But I believe it is being applied to f, cna, ina since these are using cn and in. Is this correct?
I'm trying to understand what the formula would look like for
f or son or pdon, had you decided not to use the outer BYROW and instead put BYROW in each instance where it is required (like in son and pdon).

3) Suppose I wanted to recreate f in cell J14 below. How would you update the existing cell?

4) lrowd, LOOKUP(2, 1 / (INDEX(daily, , 2) <> "")
Can you please explain your formula for the lookup_vector and why we look for 2?

MySummary2.xlsx
ABCDEFGHIJKLMNOPQ
1DailyMySummary
2Company NumberInvoice NumberOrder NumberOrder DateOrder TaxOrder SubtotalCurrencyPayment DateCompany NumberInvoice Number
3TeslaABC123 y81900901203/12/202410100AUDFordABC123
4FordABC123 y81900901443/12/202450500AUD3/13/2024777111222
5FordABC123 81900901203/12/2024510AUD3/14/2024
61133OpelABC123 q81900902223/12/2024620AUD3/20/2024
71133OpelABC123 y81900909993/12/20242099AUD3/18/2024
8FiatABC123 81900902223/12/20247100AUD3/15/2024
9777111222z81900900001/1/2024333AUDFordABC123 y81900901443/12/202450500AUD3/13/2024
10777111222 x81900922221/1/2024444AUD3/16/2024FordABC123 81900901203/12/2024510AUD3/14/2024
11VW4411144z81900901111/1/2024555AUD3/17/2024FordABC123 y81900901443/13/2024555AUD3/17/2024
12VW441114481900908881/1/2024666AUD
13333111552 g81900900001/1/2024777AUD3/8/2024
14888111222#N/A
15888111222ord11.512.5AUD3/19/2024
16TeslaABC123 zzz81900901203/12/202410100AUD
17FordABC123 y81900901443/13/2024555AUD3/17/2024
Daily
Cell Formulas
RangeFormula
J9:Q11J9=FILTER(A3:H17, (INDEX(A3:H17, , 1) = J3) * ISNUMBER(SEARCH(K3, INDEX(A3:H17, , 2))), EXPAND("",1,COLUMNS(A3:H17),""))
J14J14=FILTER(A3:H17, (INDEX(A3:H17, , 1) = J3:J4) * ISNUMBER(SEARCH(K3:K4, INDEX(A3:H17, , 2))), EXPAND("",1,COLUMNS(A3:H17),""))
Dynamic array formulas.
 
Upvote 0
Let's start with the last question...the easiest. The approach shown for finding the last row number containing data looks for 2 because 2 cannot exist in the expressions 1/(INDEX(daily, , 2)<>"" or 1/(INDEX(mine, , 2)<>"". The logical part of those expressions <>"" (is not blank) results in a TRUE or FALSE, and TRUEs/FALSEs are coerced to 1/0's when used with a mathematical operation. So 1/TRUE is 1/1 or 1, and 1/FALSE is 1/0, which produces a #DIV/0! error. LOOKUP is one of those functions that handles errors, and if it cannot find the searched for value of 2 (or 3 or 4, etc.), it will by default return the last non-error index. There are several other ways the last data row could be determined, so there is nothing crucial about using this particular approach.

Regarding the other questions, let's examine #2:
2) Regarding the outer BYROW, I can see that it is being applied to cn and in. And that it is not applied in uon. But I believe it is being applied to f, cna, ina since these are using cn and in. Is this correct?
I'm trying to understand what the formula would look like for
f or son or pdon, had you decided not to use the outer BYROW and instead put BYROW in each instance where it is required (like in son and pdon).
The outermost BYROW belongs to the intres variable, and it takes the "m" array as its argument, so it is m that is being operated on row-by-row. Each row of m, when it is to be operated on, is assigned to the r variable (a convention I tend to favor to remind myself that r is a (r)ow from some array). If we examine the main formula for "m", we see that m is simply your query inputs for Company Number and Invoice Number. Therefore, for convenience, every time a new row is operated on in m, we assign its first element (the Company Number) to variable cn, and its second element (the Invoice Number) to variable in. Then we execute the FILTER function to extract all entries in the main data where we match "cn" and also find the "in" text string somewhere in the recorded invoice numbers. So FILTER will potentially return a multi-row array, should there be more than one data table entry that satisfies these criteria. This potentially multirow f array is operated on by determining the unique combinations of customer number/invoice number present in the main data table that meet the search criteria, and then sorting the results based on the 3rd column (the Order Number) for convenience...that's what uon does. So it would not be correct to say that BYROW was being applied to f, cna, and ina. Rather the outer BYROW sequentially takes a row of m, and uses that particular combination of customer number and invoice number to extract from the main data table all desired matching/fuzzy matching entries to create the f array. Then to compile the order sums and payment dates, we need to operate on uon row by row.

Let me know if this helps to clear up two of the questions. I'm out of time at the moment, but will follow up about the other questions later.
 
Upvote 0
Feels much clearer now, thank you, except for one thing about f.

The formula is pointing to cn and in which are arrays. But when I copy/pasted just the FILTER formula (to cell J14) and hardcoded the cell ranges using the array range for cn/in, it doesn't seem to work. It only works when I pointed to one Company Number and one Invoice Number (cell J9). What am I overlooking here?
(Please ignore question 3 as it is related to the below)

Regarding question 1, I'm curious as to how you would check that your formula is working while working on it.
For instance, to calculate lrowd, I can type lrowd and delete the rest of the formula as per below. But suppose I wanted to calculate for f or ina, how would you do that?
=LET(daily, Sheet1!A:H, lrowd, LOOKUP(2, 1 / (INDEX(daily, , 2) <> ""), ROW(INDEX(daily, , 2))), lrowd)
 
Upvote 0
The formula is pointing to cn and in which are arrays.
They are not arrays. We have this construction in the formula:
Excel Formula:
 m, A2:INDEX(A:B, lrowm, ),
 intres, BYROW(m, LAMBDA(r,
                       LET( cn,   INDEX(r, , 1),
                            in,   INDEX(r, , 2),
This means that m is a two-column array consisting of your input query pairs (customer number in column A and invoice number in column B). And we pass m to the BYROW LAMBDA helper function where it is used to evaluate the expression(s) found inside the LAMBDA, but importantly, where "it" refers to each row of m taken one row at a time. And when a row is taken on each iteration, the row is treated as a horizontal array and assigned to the dummy variable r. On the first iteration, r would be {"Ford", "ABC123"}, the second iteration {777, 111222}, and so on. The INDEX functions then assign single values to cn and in variables: On the first iteration, cn is Ford, then on the second iteration cn is 777, etc., so cn and in each have only one value at any specific time during this row-by-row analysis.

This means that the f expression is being executed once for every row of m (your A:B query inputs). The f expression operates on the entire source data table d and extracts from it all entries where the customer number in d matches the current single value of cn, AND also where the invoice number in d contains the same text string as the current single value of in. This is important because each of the resulting filter criterion arrays --- (INDEX(d, , 1) = cn) is one, and ISNUMBER(SEARCH(in, INDEX(d, , 2))), EXPAND("",1,COLUMNS(d),"") is another --- must have the same size and dimension. Both are single column vertical arrays whose length matches the length of d, the main data table. If cn were a multi-element array, then FILTER would generate an error because the criterion array's dimension would expand to more than a single column.

I wanted to delay responding to questions 1 and 3 until I had better context, which I think you've provided. If the primary reason is a spot check to confirm the values generated for some of these harder-to-interrogate values, then I would make a few changes to the current formula, but realize that some concessions are being made. On this last point, the intres variable uses the BYROW function, which allows only a single cell output for each row that it operates on, as opposed to a results array that might spill over many cells, perhaps horizontally and vertically, for each row operated on. But to see values for the variables generated inside the intres LET function, you'll need to interrogate at that level and accept the single cell output per row, or change the formula in more significant ways. The implication is that the output may not be as well structured as desired, but the content should still be present for inspection.

The inner LET is shielded from direct interrogation, because an instruction to display f from the outer LET will fail to "see" f in the inner LET. As a workaround, I would instruct the outer LET to display results for intres, and then change the results delivered by intres (that's what the TEXTJOIN function does...it combines multiple, otherwise spilling results into a single output cell) to reflect the desired variable (so instead of applying TEXTJOIN to res, it would be applied to f, uon, con, son, pdon). For example, to see the FILTER results:
  1. within intres... TEXTJOIN("|",0, f ) ))),
  2. then at the end, after finres... finres, HSTACK(CHOOSECOLS(sr, 1,2,3), --CHOOSECOLS(sr,4), IFERROR(--CHOOSECOLS(sr,5),"none")), intres )
Change the f shown above to any of the other inner LET variables to see their value(s) for each row.
MrExcel_20240315.xlsx
ABXY
1Company NumberInvoice NumberCompany Number
2FordABC123Ford|ABC123 y|8190090144|45363|50|500|AUD|45364|Ford|ABC123 |8190090120|45363|5|10|AUD|45365|Ford|ABC123 y|8190090144|45364|5|55|AUD|45368
3777111222777|111222z|8190090000|45292|3|33|AUD||777|111222 x|8190092222|45292|4|44|AUD|45367
4333111552333|111552 g|8190090000|45292|7|77|AUD|45359
5VW4411144VW44|11144z|8190090111|45292|5|55|AUD|45368|VW44|11144|8190090888|45292|6|66|AUD|
6TeslaABC123Tesla|ABC123 y|8190090120|45363|10|100|AUD||Tesla|ABC123 zzz|8190090120|45363|10|100|AUD|
7xyz|||||||
81133OpelABc1231133Opel|ABC123 q|8190090222|45363|6|20|AUD|45371|1133Opel|ABC123 y|8190090999|45363|20|99|AUD|45369
9FiatABc123Fiat|ABC123 |8190090222|45363|7|100|AUD|45366
10888111222888|111222|||||||888|111222|ord1||1.5|12.5|AUD|45370
MySummary3
Cell Formulas
RangeFormula
Y2:Y10Y2=LET(daily, Daily!A:H, lrowd, LOOKUP(2, 1 / (INDEX(daily, , 2) <> ""), ROW(INDEX(daily, , 2))), d, Daily!A2:INDEX(Daily!A:H, lrowd, ), mine, A:B, lrowm, LOOKUP(2, 1 / (INDEX(mine, , 2) <> ""), ROW(INDEX(mine, , 2))), m, A2:INDEX(A:B, lrowm, ), intres, BYROW(m, LAMBDA(r, LET( cn, INDEX(r, , 1), in, INDEX(r, , 2), f, FILTER(d, (INDEX(d, , 1) = cn) * ISNUMBER(SEARCH(in, INDEX(d, , 2))), EXPAND("",1,COLUMNS(d),"")), uon, SORT(UNIQUE(INDEX(f, , 3))), con, COUNTA(uon), era, EXPAND("//",con,,"//"), blnks, EXPAND("",con,,""), cna, IFERROR(EXPAND(cn,con,,cn),blnks), ina, IFERROR(EXPAND(in,con,,in),blnks), son, BYROW(uon, LAMBDA(r, IFERROR(SUM(CHOOSECOLS(f, 5, 6) * (CHOOSECOLS(f, 3) = r)), 0))), pdon, BYROW(uon, LAMBDA(r, IFERROR(1 / (1 / MAX(CHOOSECOLS(f, 8) * (CHOOSECOLS(f, 3) = r))), "none"))), res, HSTACK(cna,ina,uon, son, pdon,era), TEXTJOIN("|",0, f ) ))), sr, DROP(TEXTSPLIT(CONCAT(SUBSTITUTE(SUBSTITUTE(intres, "|//|", "//"),"|//","//")), "|", "//",0),-1), finres, HSTACK(CHOOSECOLS(sr, 1,2,3), --CHOOSECOLS(sr,4), IFERROR(--CHOOSECOLS(sr,5),"none")), intres )
Dynamic array formulas.

For the first query inputs (Ford and ABC123), the FILTER function produces this result for f...
Ford|ABC123 y|8190090144|45363|50|500|AUD|45364|Ford|ABC123 |8190090120|45363|5|10|AUD|45365|Ford|ABC123 y|8190090144|45364|5|55|AUD|45368
...so three entries in the main data table d, are found, combined into a single output cell. Later in the formula, this single cell result is parsed using TEXTSPLIT to clean it up for final output, but for interrogation purposes, this might suffice? You'll see the following three entries in the above string:
Ford|ABC123 y|8190090144|45363|50|500|AUD|45364|
Ford|ABC123 |8190090120|45363|5|10|AUD|45365|
Ford|ABC123 y|8190090144|45364|5|55|AUD|45368
 
Upvote 0

Forum statistics

Threads
1,216,128
Messages
6,129,033
Members
449,482
Latest member
al mugheen

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top