Lookup string (with wildcard) from list and return multiple values

MichaelS94

Board Regular
Joined
Aug 26, 2014
Messages
59
Hi,

I'm trying to use the index and small functions (from a formula i found online) to search a list of text and return any values that have a partial match.

E.g. I search for "book" and it returns a corresponding number for "antique books", "assembling of books", "book publishing" etc.

The formula i used is below and i found it online - it doen'st work with wiildcards and only works if there is an exact match.

=IF(ISERROR(INDEX($A$1:$B$15600,SMALL(IF($A$1:$A$15600=$E$1,ROW($A$1:$A$15600)),ROW(1:1)),2)),"",INDEX($A$1:$B$15600,SMALL(IF($A$1:$A156008=$E$1,ROW($A$1:$A$15600)),ROW(1:1)),2))

Here is what my spreadsheet looks like and ideally how I want it to work:

List
Number
Lookup
Book
bla bla
1
Matches:
4
bla bla
2
7
bla bla
3
11
book maker
4
bla bla
5
bla bla
6
antique books
7
bla bla
8
bla bla
9
bla bla
10
book publisher
11
bla bla
12
bla bla
13
bla bla
14
bla bla
15

<tbody>
</tbody>


Could any of you please have a look and offer any recommendations /advice? Essentially I just have a long list and want to be able to search for something and it would return any (often multiple) suitable matches.

Thanks so much - really stuck on this one!
 
I must admit, I am becoming quite lost with this but I would be trying to avoid anything "indirect" with such large data.

1. Could you post some current sample data (say about the amount you did in post 28) together with any formulas you are using (just one row again) and the expected results for that data?

2. Can you post the code of the CheckWords function as you currently have it?
 
Upvote 0

Excel Facts

How to change case of text in Excel?
Use =UPPER() for upper case, =LOWER() for lower case, and =PROPER() for proper case. PROPER won't capitalize second c in Mccartney
I apologize, hopefully this will clear it up a bit. To clarify, the "Lookup Address" is the address I'm trying to find a TaxID match for...

1A) Here is the "Source Data", aka the Range in which Checkwords will be searching for a match
--Note the actual Source Data has 300,000 rows, so to shorten it I used 9 random samples.
--"Row # in Source Data" is the actual row # within the full, 300,000+ Row Source Data

Book1
ABC
1SOURCE DATA - ADDRESSTAXIDROW # IN SOURCE DATA
22010 N NORTH ANVIL LANE 207481712133654442043
32034 NORTH N ANVIL LN LANE 207481712133642942868
42034 S SOUTH ANVIL S SOUTH LN LANE 207481712133767442869
52034 RICKSHAW HWY HIGHWAY 207481712133768842870
62417 UNIVERSITY BLVD 207831717189302349875
75300 W WEST MAPLE SHADE LN LANE 2077217151746445107209
86119 SW SOUTHWEST CRAIN HWY Highway 2077217030200493127120
96119 SW SOUTHWEST CRAIN HWY Highway 2077217030200892127121
1011258 W WEST WESTPORT Drive DR 2072017131480847217002
1116800 VILLAGE DR Drive 2077217030207647267186
Sheet10
Cell Formulas
RangeFormula
A1SOURCE DATA - ADDRESS


1B) COLUMNS A - K: Original Lookup Table, Using all 300,000+ entries as the Range
1B) COLUMNS L - T: Added to define a specific Range that corresponds to the Lookup Addresses' Street #


Book1
ABCDEFGHIJK
1STREET #STREET NAMEDIRCITYZIPADDRESS - W/ DIRADDRESS - NO DIRTAXID - W/ DIRTAXID - NO DIRTAXID #2TAXID #3
216800VILLAGE DRW2077216800 VILLAGE DR W 2077216800 VILLAGE DR 20772 17030207647  
32010NORTH ANVIL LANE207482010 NORTH ANVIL LANE 207482010 NORTH ANVIL LANE 2074817121336544
42417UNIVERSITY BLVDE207832417 UNIVERSITY BLVD E 207832417 UNIVERSITY BLVD 2078317171893023
52034ANVIL LANES207482034 ANVIL LANE S 207482034 ANVIL LANE 2074817121337674
62034ANVIL LANE207482034 ANVIL LANE 207482034 ANVIL LANE 207481712133642917121337674
711258WESTPORT DRW2072011258 WESTPORT DR W 2072011258 WESTPORT DR 2072017131480847
86119SW CRAIN HWY207726119 SW CRAIN HWY 207726119 SW CRAIN HWY 207721703020049317030200892
95300MAPLESHADE LNW207725300 MAPLESHADE LN W 207725300 MAPLESHADE LN 20772
Sheet11
Cell Formulas
RangeFormula
F2=TRIM(CONCATENATE(A2," ",B2," ",C2," ",E2))
G2=TRIM(CONCATENATE(A2," ",B2," ",E2))
H2=CheckWords(Sheet10!$A$2:$A$11,F2,ROWS(H2:H2),1)
I2=IF(H2="",CheckWords(Sheet10!$A$2:$A$11,G2,ROWS(I2:I2),1),"")
J2=IF(H2<>"",CheckWords(Sheet10!$A$2:$A$11,F2,ROWS($1:$2),1),CheckWords(Sheet10!$A$2:$A$11,G2,ROWS($1:$2),1))
K2=IF(H2<>"",CheckWords(Sheet10!$A$2:$A$11,F2,ROWS($1:$3),1),CheckWords(Sheet10!$A$2:$A$11,G2,ROWS($1:$3),1))




Book1
LMNOPQRST
1STREET # - 1ST ROW MATCHSTREET # - LAST ROW MATCHSOURCE DATA RANGEINDIRECT - RESULTINDIRECTVBA - RESULTINDIRECT - F9INDIRECTVBA - F9CHECKWORDS - INDIRECTCHECKWORDS - INDIRECTVBA
2267157267186Sheet10!P267157:P267186#VALUE!16800 BALD EAGLE SCHOOL RD ROAD ROAD RD 2061316800 BALD EAGLE SCHOOL RD ROAD ROAD RD 2061316800 BALD EAGLE SCHOOL RD ROAD ROAD RD 2061317030207647#VALUE!
34198542088Sheet10!P41985:P42088#VALUE!2010 ALBAN LN LANE LANE LN 207162010 ALBAN LN LANE LANE LN 207162010 ALBAN LN LANE LANE LN 2071617121336544#VALUE!
44984049877Sheet10!P49840:P49877#VALUE!2417 59TH AVE Avenue Avenue AVE 207852417 59TH AVE Avenue Avenue AVE 207852417 59TH AVE Avenue Avenue AVE 2078517171893023#VALUE!
54286842870Sheet10!P42868:P42870#VALUE!2034 N NORTH ANVIL N NORTH LN LANE LANE LN N NORTH 207482034 N NORTH ANVIL N NORTH LN LANE LANE LN N NORTH 207482034 N NORTH ANVIL N NORTH LN LANE LANE LN N NORTH 2074817121337674#VALUE!
64286842870Sheet10!P42868:P42870#VALUE!2034 N NORTH ANVIL N NORTH LN LANE LANE LN N NORTH 207482034 N NORTH ANVIL N NORTH LN LANE LANE LN N NORTH 207482034 N NORTH ANVIL N NORTH LN LANE LANE LN N NORTH 2074817121336429#VALUE!
7217001217002Sheet10!P217001:P217002#VALUE!11258 KETTERING PL Place Place PL 2077411258 KETTERING PL Place Place PL 2077411258 KETTERING PL Place Place PL 2077417131480847#VALUE!
8127073127123Sheet10!P127073:P127123#VALUE!6119 39TH PL Place Place PL 207826119 39TH PL Place Place PL 207826119 39TH PL Place Place PL 2078217030200493#VALUE!
9107137107214Sheet10!P107137:P107214#VALUE!5300 39TH AVE Avenue Avenue AVE 207815300 39TH AVE Avenue Avenue AVE 207815300 39TH AVE Avenue Avenue AVE 20781#VALUE!
Sheet11
Cell Formulas
RangeFormula
L2=INDEX(Sheet10!$L$2:$L$13863,MATCH(A2,Sheet10!$J$2:$J$13863,0))
M2=INDEX(Sheet10!$M$2:$M$13863,MATCH(A2,Sheet10!$J$2:$J$13863,0))
N2="Sheet10!"&"P"&L2&":P"&M2
O2=INDIRECT(N2)
P2=INDIRECTvba(N2)
Q2={"16800 BALD EAGLE SCHOOL RD ROAD ROAD RD 20613";"16800 BALD EAGLE SCHOOL RD ROAD ROAD RD 20613";"16800 BEALLE HILL FOREST LN LANE LANE LN 20607";"16800 BEALLE HILL RD ROAD ROAD RD 20601";"16800 BLUE INDIGO CT Court Court CT 20607";"16800 CANDY HILL RD ROAD ROAD RD 20772";"16800 CANDY HILL RD ROAD ROAD RD 20772";"16800 CARIBBEAN WAY WAY 20607";"16800 CLAGETT LANDING RD ROAD ROAD RD 20774";"16800 CLARK TER Terrace Terrace TER 20707";"16800 COVER CT Court Court CT 20607";"16800 CROOM RD ROAD ROAD RD 20613";"16800 CROOM RD ROAD ROAD RD 20613";"16800 DORCHESTER PL Place Place PL 20772";"16800 FEDERAL HILL CT Court Court CT 20716";"16800 GARDEN CT Court Court CT 20772";"16800 GOVERNORS BRIDGE RD ROAD ROAD RD 20716";"16800 HOLLY WAY WAY 20607";"16800 INDIAN HEAD HWY Highway Highway HWY 20607";"16800 MATTAWOMAN LN LANE LANE LN 20601";"16800 MATTAWOMAN LN LANE LANE LN 20601";"16800 OLD MARSHALL HALL RD ROAD ROAD RD 20607";"16800 QUEEN ANNE BRIDGE RD ROAD ROAD RD 20716";"16800 QUEEN ANNE RD ROAD ROAD RD 20772 20774";"16800 QUEEN ANNE RD ROAD ROAD RD 20772 20774";"16800 SCIENCE DR Drive Drive DR 20715";"16800 SCIENCE DR Drive Drive DR 20715";"16800 SLEEPY SPRING CT Court Court CT 20716";"16800 SWANSON ROAD SPUR SPUR 20774";"16800 W West VILLAGE W West DR Drive Drive DR W West 20772"}
R2={"16800 BALD EAGLE SCHOOL RD ROAD ROAD RD 20613";"16800 BALD EAGLE SCHOOL RD ROAD ROAD RD 20613";"16800 BEALLE HILL FOREST LN LANE LANE LN 20607";"16800 BEALLE HILL RD ROAD ROAD RD 20601";"16800 BLUE INDIGO CT Court Court CT 20607";"16800 CANDY HILL RD ROAD ROAD RD 20772";"16800 CANDY HILL RD ROAD ROAD RD 20772";"16800 CARIBBEAN WAY WAY 20607";"16800 CLAGETT LANDING RD ROAD ROAD RD 20774";"16800 CLARK TER Terrace Terrace TER 20707";"16800 COVER CT Court Court CT 20607";"16800 CROOM RD ROAD ROAD RD 20613";"16800 CROOM RD ROAD ROAD RD 20613";"16800 DORCHESTER PL Place Place PL 20772";"16800 FEDERAL HILL CT Court Court CT 20716";"16800 GARDEN CT Court Court CT 20772";"16800 GOVERNORS BRIDGE RD ROAD ROAD RD 20716";"16800 HOLLY WAY WAY 20607";"16800 INDIAN HEAD HWY Highway Highway HWY 20607";"16800 MATTAWOMAN LN LANE LANE LN 20601";"16800 MATTAWOMAN LN LANE LANE LN 20601";"16800 OLD MARSHALL HALL RD ROAD ROAD RD 20607";"16800 QUEEN ANNE BRIDGE RD ROAD ROAD RD 20716";"16800 QUEEN ANNE RD ROAD ROAD RD 20772 20774";"16800 QUEEN ANNE RD ROAD ROAD RD 20772 20774";"16800 SCIENCE DR Drive Drive DR 20715";"16800 SCIENCE DR Drive Drive DR 20715";"16800 SLEEPY SPRING CT Court Court CT 20716";"16800 SWANSON ROAD SPUR SPUR 20774";"16800 W West VILLAGE W West DR Drive Drive DR W West 20772"}
S2=IF(H2="",CheckWords((INDIRECT(N2)),G2,ROWS(1:1),-1),CheckWords((INDIRECT(N2)),F2,ROWS(1:1),-1))
T2=IF(H2="",CheckWords((INDIRECTvba(N2)),G2,ROWS(1:1),-1),CheckWords((INDIRECTvba(N2)),F2,ROWS(1:1),-1))


1C)
COLUMNS L - N: Added to reduce the # of Cells in the Lookup Range for each formula calculation that Checkwords
needs to search through (originally 300,000 each time), I added 2 columns (L & M) to my original sample table which find the 1st Row Position & Last Row Position in the Source Data that matches the Street # of our Lookup Address. From this I added a 3rd column (N), which creates a Text Range Reference with the matching Row #'s from COLUMNS L & M.
  • This limits the Range for the 1st Lookup Address - 16800 Village Dr S - to Rows 267157:267186 in the Source Data - or just 29 Rows.
  • **Columns L, M, & N reference the entire Source Data Table (300k+), not the shortened 9-row version I included above. I needed a larger sample to replicate the errors I'm getting. So the formulas in Columns L - N will reference different ranges, but the 1st/Last Row #s are correct.**

1D) ISSUE: CONVERT CELL'S TEXT REFERENCE TO TRUE CELL REFERENCES (COLUMNS O - T): To utilize the Row Ranges (COLUMN N), I need to be able to convert it to a true reference so that it will work within the CheckWords Formula...the Indirect(Row Range) would be the first arguement within the Checkwords function.
  • Columns O & P show the cell contents after entering the INDIRECT & INDIRECTVBA formulas by themselves. Columns Q & R show the results of the same formulas after Pressing F9
    • ​Checkwords function works with the INDIRECT function (Column S), but not with the INDIRECTVBA function (Column T), for some reason.

THE GOAL:
The goal is simply to minimize the Search Range required in the Checkwords formula. By limiting the Search Range to only those Rows that match the Street # of our Lookup Address, the formula should be much faster. And it is, except for the INDIRECT part.

My main issue is trying to find a way to utilize the smaller Row Reference Range calculated in Columns L & M, instead of the entire 300,000+ Source Data Range.

2) Here is the Checkwords Code I have.

Function CheckWords(rng As Range, lookupwords As String, Optional num As Long, Optional ColOffset As Long) As String

Application.Calculation = xlCalculationManual
Dim lookup() As String
Dim i As Long, Ctr As Long
Dim c As Range
Dim bPassed As Boolean

lookup = Split(lookupwords)
For Each c In rng
bPassed = True
For i = 0 To UBound(lookup)
If InStr(1, " " & c.Value & " ", " " & lookup(i) & " ", vbTextCompare) = 0 Then
bPassed = False
Exit For
End If
Next i
If bPassed Then Ctr = Ctr + 1
If Ctr = num Then
CheckWords = c.Offset(, ColOffset).Value
Exit For
End If
Next c

Application.Calculation = xlCalculationAutomatic
End Function

3) Here is the INDIRECTVBA code I found from another forum post (as mentioned in my previous post). It's in 2 separate Modules.

Public Function INDIRECTVBA(ref_text As String)
INDIRECTVBA = Range(ref_text)
End Function

Public Sub FullCalc()
Application.CalculateFull
End Sub
 
Upvote 0
See if this is any improvement. I have re-written the function to more closely match what you have and what you are trying to do. The function may have lost some of the flexibility of the previous one, but I have tried to improve the problem you were having - speed.

So the function now
- expects the source data to have address es in one column and Tax ID's in the next column to the right.
- expects that the source data is arranged so the all identical street numbers are grouped together. However, the checking start/end rows is done in the function, not in the worksheet as you were doing.
- reads the source data into an array in memory rather than continually referencing cells in the worksheet. (But I haven't tested with large source data like you have. Fingers crossed.)

Anyway, give this a try.

Code:
Function TaxID(rng As Range, lookupwords As String, num As Long) As String
  Dim lookup() As String
  Dim i As Long, j As Long, Ctr As Long, uba As Long
  Dim bPassed As Boolean, bStarted As Boolean
  Dim a As Variant
  Dim sNum As String, sAdr As String
  
  Application.Volatile
  a = rng.Resize(rng.Rows.Count + 1).Value
  uba = UBound(a)
  lookup = Split(lookupwords)
  sNum = lookup(0)
  Do
    i = i + 1
    sAdr = a(i, 1)
    If Left(sAdr, InStr(1, sAdr & " ", " ") - 1) = sNum Then
        bStarted = True
        bPassed = True
        For j = 0 To UBound(lookup)
          If InStr(1, " " & sAdr & " ", " " & lookup(j) & " ") = 0 Then
            bPassed = False
            Exit For
          End If
        Next j
        If bPassed Then
          Ctr = Ctr + 1
          If Ctr = num Then
            TaxID = a(i, 2)
            Exit Do
          End If
        End If
    Else
      If bStarted Then Exit Do
    End If
  Loop Until i = uba
End Function

Sheet10 the same as in your last post

Excel Workbook
FGHIJK
1ADDRESS - W/ DIRADDRESS - NO DIRTAXID - W/ DIRTAXID - NO DIRTAXID #2TAXID #3
216800 VILLAGE DR W 2077216800 VILLAGE DR 20772 17030207647  
32010 NORTH ANVIL LANE 207482010 NORTH ANVIL LANE 2074817121336544
42417 UNIVERSITY BLVD E 207832417 UNIVERSITY BLVD 2078317171893023
52034 ANVIL LANE S 207482034 ANVIL LANE 2074817121337674
62034 ANVIL LANE 207482034 ANVIL LANE 207481712133642917121337674
711258 WESTPORT DR W 2072011258 WESTPORT DR 2072017131480847
86119 SW CRAIN HWY 207726119 SW CRAIN HWY 207721703020049317030200892
95300 MAPLESHADE LN W 207725300 MAPLESHADE LN 20772
Sheet11
 
Upvote 0
Hi Peter, I was finally able to try your new solution to this. The new TaxID formula is in fact faster than the original Checkwords, however it appears that the sheer amount of source data stored in the array memory is causing the calculation time to still process relatively slow.

To give you an idea of the calculation time, I tested 3 formulas:
1) Original Checkwords function
2) New TaxID function
3) Checkwords function w/ Indirect and a helper column w/ Row start/end #'s (Column N from my previous post).

I applied these formulas to 100 Lookup Addresses. The results are as follows:
1) 29.5 seconds
2) 16.75 seconds
3) < 1 second (basically instantaneous)

To get around the slowness caused by Indirect, I added another Sub module that searches for all cells with the Indirect formula, and replaces it with the actual cell reference range. But it requires me to click the Form Control Button each time, and causes the recalculation of the entire list....adding another process that's really not needed. It seems that using the Helper Range Reference column (Column N above) as the "lookup range" in the Checkwords function is most efficient, since the number of rows is greatly reduced.

However, is there a way to have the Checkwords function use the Range Reference created in Column N, without adding Indirect to the mix? Or somehow combine the two modules (Checkwords function & Indirect Remover Sub ) into one, so at the end of the Checkwords calculation, the formula will have actual cell references (and not Indirect)?
 
Upvote 0
is there a way to have the Checkwords function use the Range Reference created in Column N,
Untested, but try this.

1. Change the first part of the CheckWords function as highlighted
Rich (BB code):
Function CheckWords(srng As String, lookupwords As String, Optional num As Long, Optional ColOffset As Long) As String
  Dim lookup() As String
  Dim i As Long, Ctr As Long
  Dim c As Range, rng As Range
  Dim bPassed As Boolean

  Set rng = Sheets(Split(srng, "!")(0)).Range(Split(srng, "!")(1))
  Application.Calculation = xlCalculationManual

2. In the worksheet, use the function like this

=CheckWords(N2,F2,...
 
Upvote 0
I know this is an old string, but I could really use some help. I would like to do this without UDF, and thought the first sollution posted in this string would work, but, I am seeing random results with my data.....

Here is what I changed the table too...

Table from page 1 of this thread with my data....

Formula =IFERROR(AGGREGATE(15,6,B2:B16/(SEARCH("*"&D$1&"*",A2:A16)>0),ROWS(D$2:D2)),"")

Is in cell D3, then dragged down column D
List
Number
Lookup
WASHER
MOTOR:AC; 400HP, 1193 RPM, 5012 FRM, 3/6
100199480

100199480
CHAIN:ROLLER; #40 , 1-STRAND, SS
100199481

100199481
O-RING:SILICONE; Ø95X5.00, CAT04
100199482

100199483
THERMOCOUPLE:J TYPE
100199483

100199485
GUARD:PLASTIC; INDICATOR
100199484

100199487
WASHER:LOCK; 112463
100199485

100199489
VALVE:SOLENOID; 1/4" NPT, VORTEX COOLER
100199486

100199491
REGULATOR:PRESS; 1/2 NPT,SNSUS RED SPRG
100199487
100199493
SER KIT:VALVE; 4", FKM, X 100
100199488
100199495
MOTOR:AC; 400HP, 1193 RPM, 5012 FRM, 3/6
100199489
100199497

<tbody>
</tbody>


It does not, however return the results i would like.

I need to pull a wild card search. In this example, I put the word "motor" in the search criteria (cell D2). What I would like is for my results column to only display the 2 numbers that contain the word "MOTOR." I tried putting the dollar signs in all of the ranges, ie $A$3:$A$17, and that just gives me the first found match, like v-lookup would.

My Data is actually 80,000+ rows. The Columns are formatted such that the text I am trying to wildcard search is in column C, and the number I am looking for can be pulled from either column A or Column O. It is set up that way for VLOOKUP purposes.

Here is how My Data is actually structured....

#NAME?PLANTMATERIAL DESCRIPTIONMANUFACTURER NAMEMPNVENDOR NAMEVENDOR MATERIAL NUMBERSLOCWHNSTOR. BINREORDER POINTMAX. LEVELBUNOLD MATERIAL NO.MATERIAL
10000003P110MEM; 38", 3.8", 31 ML, 1CDS01S21MEM270295EA38-31-1C10000003
10000004P110MEM; 38", 7.9", 31 ML, 1C, FOOD, SANIDS01S21MEM235260EA80-31-1C-F10000004
10000005P110MEM;40",7.9",31 ML,1C,WATER, POLISHERDS01S21MEM108120EA80-31-1C-W10000005
10000006P110MEM; 38", 6.3", 31 ML, 10K, SANITORAYDS01S21MEM288308EA63-31-10K10000006

<colgroup><col><col><col><col><col><col><col><col><col><col><col><col><col><col><col></colgroup><tbody>
</tbody>



I am trying to keep grubby paws out of my data, and not have to teach mechanics how to do anything in excel, besides open it, type in a word, and up pops every option in our system that contains that word. Just like I would do in the raw data with the search function under the filter buttons. It would be great if it would pop up the entire row, and not just the number, but If I could get the number, I could get everything else with VLOOKUPS.


Thanks in advance for any assistance.
 
Upvote 0

Forum statistics

Threads
1,216,084
Messages
6,128,724
Members
449,465
Latest member
TAKLAM

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top