Counting substring matches based on criteria. Is there a better/faster way?

jahsquare

Board Regular
Joined
Jan 22, 2014
Messages
51
Hi, I have the following formula running in Table1 to search for a names from Table2. The formula is running in 20 columns across thousands of records, and it takes about 30 minutes to complete..

=COUNTIFS(INDIRECT("Table2["&$J$1&"]"),("* "&$D4&" *"),INDIRECT("Table2["&$L$1&"]"),J$3)+
COUNTIFS(INDIRECT("Table2["&$J$1&"]"),($D4&" *"),INDIRECT("Table2["&$L$1&"]"),J$3)+
COUNTIFS(INDIRECT("Table2["&$J$1&"]"),("* "&$D4),INDIRECT("Table2["&$L$1&"]"),J$3)+
COUNTIFS(INDIRECT("Table2["&$J$1&"]"),$D4,INDIRECT("Table2["&$L$1&"]"),J$3)

Table2 Col J is a list of strings comprising concatenated names.
Table2 Col L is a "Year" criteria (this formula runs across 20 columns / 20 years)

Each value in Table1 Col D has a single name that may be at the beginning, middle, or end of any of the strings in Table2 Col. J.

I have tried two other solutions but I can't get them to work, and i'm not sure they'd be faster anyway:

1) =COUNTIFS(INDIRECT("Table2["&$J$1&"]"),{("* "&$D4&" *"),($D4&" *"),("* "&$D4),$D4},INDIRECT("Table2["&$L$1&"]"),J$3)

with this I get an error and can't run the formula

2) =SUMPRODUCT(
(INDIRECT("Table2["&$J$1&"]")=("* "&$D4&" *")) +
INDIRECT("Table2["&$J$1&"]")=($D4&" *")) +
INDIRECT("Table2["&$J$1&"]")=("* "&$D4)) +
INDIRECT("Table2["&$J$1&"]")=($D4))) * INDIRECT("Table2["&$L$1&"]")=J$3)

This runs but the values are wrong (much lower than expected)


Any help is much appreciated. Thanks
 
* Should also note that the simplified example doesn't provide that some of the names in I2:I13 might have suffices that I do NOT want to match. I.e. searching 'baz' in A2:A5 should return false for 'bazzy' in I2:I13.

Try

A
B
C
D
E
F
G
H
I
J
1
Name​
2012​
2013​
2014​
2015​
2016​
2017​
Names​
Year​
2
Foo​
1​
1​
1​
2​
2​
0​
Foo Bar​
2012​
3
Bar​
1​
1​
1​
1​
1​
1​
Bif Baz​
2013​
4
Bif​
1​
1​
1​
1​
2​
0​
Foo Baz​
2014​
5
Baz​
0​
1​
1​
1​
2​
1​
Foo​
2015​
6
Bif Foo Baz​
2016​
7
Bar​
2017​
8
Bif​
2012​
9
Bar Foo​
2013​
10
Bif Bar​
2014​
11
Baz Bif Foo Bar​
2015​
12
Foo Bar Bif Baz​
2016​
13
Baz​
2017​
14

Formula in B2 copied across and down
=SUMPRODUCT(ISNUMBER(SEARCH(" "&$A2&" "," "&$I$2:$I$13&" "))*($J$2:$J$13=B$1))

M.
 
Upvote 0

Excel Facts

Pivot Table Drill Down
Double-click any number in a pivot table to create a new report showing all detail rows that make up that number
Formula in B2 copied across and down
=SUMPRODUCT(ISNUMBER(SEARCH(" "&$A2&" "," "&$I$2:$I$13&" "))*($J$2:$J$13=B$1))

Not sure if i'm doing something wrong but I'm getting all "0"s in the output range.
 
Upvote 0
There must be something funny with my formatting because when I copied your input values into excel it worked. I'll work on that. Thanks for the tip!
 
Upvote 0
Not sure if i'm doing something wrong but I'm getting all "0"s in the output range.

That's why in a table the headers are strings (text)
Try this
=SUMPRODUCT(ISNUMBER(SEARCH(" "&$A2&" "," "&$I$2:$I$13&" "))*($J$2:$J$13=--B$1))

-- converts to a number

M.
 
Last edited:
Upvote 0
Along the lines of Marcelo's earlier question, if the input is only the source data & nothing else, then a solution can generate the list of all words in the list for the LHS and the values of all years in the list for the headers - as well as the counts.

With some VBA this could be pretty fast.

How long are the new formulas taking?
 
Upvote 0
Along the lines of Marcelo's earlier question, if the input is only the source data & nothing else, then a solution can generate the list of all words in the list for the LHS and the values of all years in the list for the headers - as well as the counts.

With some VBA this could be pretty fast.

How long are the new formulas taking?


It seems to be about the same, really.. Takes quite awhile.

A VBA solution to generate the output table based on the input alone would be extremely helpful. Although will still need to plot years even if there is no data, so would need to set a min/max based on the input data and create columns for every year in between.
 
Upvote 0
So, the formulas are taking ~30 minutes? I'm expecting code should take ~ 1 second or less.

Code:
Sub test()


    Dim i As Long, j As Long, k As Long
    Dim lngMinYear As Long
    Dim lngMaxYear As Long
    
    Dim a As Variant
    Dim aIn As Variant
    Dim aOut As Variant
    Dim dic As Variant
    Dim wksNew As Excel.Worksheet
    
    '-----------------------------------
    
    Application.ScreenUpdating = False
    Application.EnableEvents = False
    
    aIn = Cells.Find(What:="Names").CurrentRegion.Value2 'input data


    lngMinYear = Application.WorksheetFunction.Min(Application.Index(aIn, 0, 2))
    lngMaxYear = Application.WorksheetFunction.Max(Application.Index(aIn, 0, 2))
    
    ReDim aOut(1 To 60000, lngMinYear - 1 To lngMaxYear)
    
    Set dic = CreateObject("Scripting.Dictionary") 'for destination array row numbers stored by name
    
    For i = LBound(aIn, 1) + 1 To UBound(aIn, 1) 'loop through input data rows
        a = Split(aIn(i, 1), " ") 'Array list of names for record i
        For j = LBound(a) To UBound(a) 'Loop though names on record i
            If Not dic.exists(a(j)) Then 'if haven't already got a record number for this name
                k = k + 1 'data for this name will be in row k of output array
                dic.Add a(j), k 'store row number for this name in dictionary object
                aOut(k, lngMinYear - 1) = a(j) 'store name in first column of output array
            End If
            aOut(dic(a(j)), aIn(i, 2)) = aOut(dic(a(j)), aIn(i, 2)) + 1 'accumulate count
        Next j
    Next i
    Erase a
    Set dic = Nothing
    Set aIn = Nothing
    
    'Create output
    Set wksNew = Application.Workbooks.Add(Template:=xlWBATWorksheet).Worksheets(1)
    With wksNew
        .Range("A2").Resize(k, lngMaxYear - lngMinYear + 2).Value2 = aOut
        .Range("A1").Value2 = "Name"
        For j = lngMinYear To lngMaxYear
            .Cells(1, j - lngMinYear + 2).Value2 = j
        Next j
    End With
    Erase aOut
    Set wksNew = Nothing
    
    Application.EnableEvents = True
    
End Sub
 
Upvote 0
Thank you, i can get it to work on the sample data but I'm having 'type mismatch' errors on my data..
on this line:

lngMinYear = Application.WorksheetFunction.Min(Application.Index(aIn, 0, 2))

I can't figure out why though - both ranges appear to be formatted the same..
 
Upvote 0
Ok so it's pointing to that line w/Application.Index for the type mismatch, and it seems to be related to the length of the names values. When there are fewer than ~10 names it works, but if there is even one value with many names it throws the error.

Here's one of my Names values that causes an error (note you'll need to change the delimiter to " | ") :

A61M002102 | A61B00050077 | A61B0005021 | A61B0005024 | A61B00054064 | A61B0005443 | A61B00054809 | A61B00054812 | A61B00057246 | A61M002100 | G06F00193406 | G09B001900 | G16H004063 | H04N000544543 | H04N002142201 | H04N002144218 | H04N0021458 | A61M20210027 | A61M20210044 | A61M2021005 | A61M20210083 | A61M22053375 | A61M22053553 | A61M22053561 | A61M22053584 | A61M22053592 | A61M2205502 | A61M2205505 | A61M223006 | A61M223010 | A61M223018 | A61M223030 | A61M223062


I found a few references to similar issues that may be helpful to someone more knowledgeable than I..

http://dailydoseofexcel.com/archives/2013/10/11/worksheetfunction-index-limitations/
https://stackoverflow.com/questions...ile-error-type-mismatch-array-or-user-defined
 
Last edited:
Upvote 0
Glad it worked on the sample data.

I am just guessing how the data differs from the sample data: maybe the below code works? Maybe not.
If not, please post sample data for the the current data.

Code:
Sub test()

    Dim i As Long, j As Long, k As Long
    Dim lngMinYear As Long
    Dim lngMaxYear As Long
    
    Dim a As Variant
    Dim aIn As Variant
    Dim aOut As Variant
    Dim dic As Variant
    Dim wksNew As Excel.Worksheet
    
    '-----------------------------------
    
    Application.ScreenUpdating = False
    Application.EnableEvents = False
    
    With Cells.Find(What:="Names").CurrentRegion
        aIn = .Value2
        lngMinYear = Application.WorksheetFunction.Min(.Offset(, 1).Resize(, 1))
        lngMaxYear = Application.WorksheetFunction.Max(.Offset(, 1).Resize(, 1))
    End With
    
    ReDim aOut(1 To 60000, lngMinYear - 1 To lngMaxYear)
    
    Set dic = CreateObject("Scripting.Dictionary") 'for destination array row numbers stored by name
    
    For i = LBound(aIn, 1) + 1 To UBound(aIn, 1) 'loop through input data rows
        a = Split(aIn(i, 1), " ") 'Array list of names for record i
        For j = LBound(a) To UBound(a) 'Loop though names on record i
            If Not dic.exists(a(j)) Then 'if haven't already got a record number for this name
                k = k + 1 'data for this name will be in row k of output array
                dic.Add a(j), k 'store row number for this name in dictionary object
                aOut(k, lngMinYear - 1) = a(j) 'store name in first column of output array
            End If
            aOut(dic(a(j)), aIn(i, 2)) = aOut(dic(a(j)), aIn(i, 2)) + 1 'accumulate count
        Next j
    Next i
    Erase a
    Set dic = Nothing
    Set aIn = Nothing
    
    'Create output
    Set wksNew = Application.Workbooks.Add(Template:=xlWBATWorksheet).Worksheets(1)
    With wksNew
        .Range("A2").Resize(k, lngMaxYear - lngMinYear + 2).Value2 = aOut
        .Range("A1").Value2 = "Name"
        For j = lngMinYear To lngMaxYear
            .Cells(1, j - lngMinYear + 2).Value2 = j
        Next j
    End With
    Erase aOut
    Set wksNew = Nothing
    
    Application.EnableEvents = True
    
End Sub
 
Upvote 0

Forum statistics

Threads
1,215,430
Messages
6,124,852
Members
449,194
Latest member
HellScout

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top