Counting substring matches based on criteria. Is there a better/faster way?

jahsquare

Board Regular
Joined
Jan 22, 2014
Messages
51
Hi, I have the following formula running in Table1 to search for a names from Table2. The formula is running in 20 columns across thousands of records, and it takes about 30 minutes to complete..

=COUNTIFS(INDIRECT("Table2["&$J$1&"]"),("* "&$D4&" *"),INDIRECT("Table2["&$L$1&"]"),J$3)+
COUNTIFS(INDIRECT("Table2["&$J$1&"]"),($D4&" *"),INDIRECT("Table2["&$L$1&"]"),J$3)+
COUNTIFS(INDIRECT("Table2["&$J$1&"]"),("* "&$D4),INDIRECT("Table2["&$L$1&"]"),J$3)+
COUNTIFS(INDIRECT("Table2["&$J$1&"]"),$D4,INDIRECT("Table2["&$L$1&"]"),J$3)

Table2 Col J is a list of strings comprising concatenated names.
Table2 Col L is a "Year" criteria (this formula runs across 20 columns / 20 years)

Each value in Table1 Col D has a single name that may be at the beginning, middle, or end of any of the strings in Table2 Col. J.

I have tried two other solutions but I can't get them to work, and i'm not sure they'd be faster anyway:

1) =COUNTIFS(INDIRECT("Table2["&$J$1&"]"),{("* "&$D4&" *"),($D4&" *"),("* "&$D4),$D4},INDIRECT("Table2["&$L$1&"]"),J$3)

with this I get an error and can't run the formula

2) =SUMPRODUCT(
(INDIRECT("Table2["&$J$1&"]")=("* "&$D4&" *")) +
INDIRECT("Table2["&$J$1&"]")=($D4&" *")) +
INDIRECT("Table2["&$J$1&"]")=("* "&$D4)) +
INDIRECT("Table2["&$J$1&"]")=($D4))) * INDIRECT("Table2["&$L$1&"]")=J$3)

This runs but the values are wrong (much lower than expected)


Any help is much appreciated. Thanks
 
This works, and is very fast. Thank you!

On its own this is very powerful and will help immensely, but I am curious if it would be possible to modify this in a few particular ways:

1) Because my input data is often Table data, could the input range(s) be defined by naming the Table column headers 'Name' and 'Year'? This would allow me to skip the step of copying the columns to a new sheet, and to use use data validations to define the inputs.

2) Instead of numeric Year, could a modified script work on a second list of concatenated input strings? I.e. I could compare Names to Places, where both columns are concatenated values?


As a further expansion (and this one is probably not easy, and I wouldn't ask anyone to do it for free.. )

Much of the data I'm working with is hierarchical, and I need to visualize cumulative values in the full hierarchy from the top down. The issue is that parent Names may not themselves be listed in the input range (i.e. the level 3 value is present but not levels 2 or 1). The full (flattened, sorted) hierarchy including all possible Parent/Child Names and their (numeric) hierarchy level values is provided in a separate range.

Currently I am using formulas to pull the missing hierarchy levels into a new table and sum their cumulative values, but this is... non-trivial.

So I would like to create the output as we have done, but for each Name, look up the hierarchy level from a separate range, and if the parents are missing from the input list, fetch the missing Names, then sum the values in each row to reflect the cumulative values of any Children.

I could provide sample input/output if it would help, but I struggle with the table options in this forum... It might be easier to upload a workbook but not sure best way to safely do that?


Thanks again.
Edit: seriously, let me know if $ would made a difference, even if you have a paypal address to receive a friendly tip..
 
Last edited:
Upvote 0

Excel Facts

How to find 2nd largest value in a column?
MAX finds the largest value. =LARGE(A:A,2) will find the second largest. =SMALL(A:A,3) will find the third smallest
Glad it works & is fast. Yes, modification is possible.

1. Sounds like a simple change. Just need more understanding of specifics to be able to write the, by nature, specific code. (In fact, depending on the data set up, it is possible the current code could be fine with no modification. It would still require a specific set up similar to the original sample data.)
2. That would require work, but is similar to how the names are currently handled. So I expect not such a big deal.
3. That might take a bit of work but if you're doing it somehow now, then for sure code can do it.

Sample data & further explanation is the next step. $ are not part of the equation, thanks.
 
Upvote 0
Sample data & further explanation is the next step. $ are not part of the equation, thanks.


Ok, here is a workbook with some sample data and output. There are 6 sheets:

1) The input data. Same as before with added column for Places, as well as three data validation fields for horizontal axis input (column header), vertical axis (column header), and fill-hierarchy choice (yes/no).

2) The flattened hierarchy of names, with a sort key.

3) Output for input: Date/Names/No

4) Output for input: Date/Names/Yes

5) Output for input: Places/Names/No

6) Output for intput: Places/Names/Yes


Let me know if you have any questions, I think this covers everything from my last post.. Thanks again.

editing to add: It wouldn't make sense to fill the hierarchy across if someone selected names for the horizontal field.. So you could ignore/discount that option. In practice I would not allow a user to select names for the horizontal data validation.
 
Last edited:
Upvote 0
FWIW this is the formula I use for summing up the hierarchy (this would be used in a new column, after inputting my values to a template containing the full Names hierarchy). What I'm hoping to do is just have the hierarchy data on a separate sheet/book and generate the hierarchy including only the summed hierarchy levels above data from my input.

=SUM(OFFSET(E2,,,IFERROR(MATCH(0,N(B2<(B3:B$260742)),),)))

Where E2 is the current row value, and B2 is the rank.
 
Last edited:
Upvote 0
Thanks for that. (I haven't tried to download file, btw.)

Work, which was quiet, has got very busy in the last few days. I may not get to look at this for some time. Like after June.

If you can't wait, then please reply to say that - and maybe start a new thread for others to address. Thanks.
 
Upvote 0
No problem. I'm pretty eager to solve 1) and 2) in the short term, so I'll make a new post. Appreciate your time though.
 
Upvote 0

Forum statistics

Threads
1,215,438
Messages
6,124,873
Members
449,192
Latest member
MoonDancer

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top