Regular Expression for comparing words

svelma

New Member
Joined
Aug 2, 2011
Messages
14
Can somebody give me a regular expression to do an approximate, atleast 3 letter match between two words.

For example I have "Alex Mike" in one column and "mAlex" in another. There is a space in between Alex and Mike and we are comparing to mAlex. I think its tricky.

I tried using the following expressions:
[\w]*
[\w]{3,}?
[a-zA-Z]*
etc,
But I still don't find a match between some of the records. I have over 4000 records and I am trying to use Regex in demand tools with salesforce.

There is a fuzzy option in demand tools that avoids spaces. So, I even tried using that, but my regular expression needs some correction.
Please help me out.
Thanks,
Velma
 

Excel Facts

Did you know Excel offers Filter by Selection?
Add the AutoFilter icon to the Quick Access Toolbar. Select a cell containing Apple, click AutoFilter, and you will get all rows with Apple
I doubt if I have an option to use code in demand tools. But I can certainly try in excel.

If I am not wrong, do you mean a macro when you say code?
 
Upvote 0
Sorry, yes I did mean an Excel macro. I'm not familiar with Regex. Would an Excel macro work for your situation?
 
Upvote 0
Why not take the lowercase of the first letter of the second word and append it to proper case of the first word and compare to that?

Alex Mike = mAlex
 
Upvote 0
Code:
Sub test()
Dim c As Range, strTest, t
For Each c In Range("A1:A" & Range("A" & Rows.Count).End(xlUp).Row)
    t = Split(c)
    strTest = LCase(Left(t(0), 1)) & StrConv(t(1), vbProperCase)
    If c.Offset(, 1) = strTest Then c.Offset(, 2) = "MATCH"
Next
End Sub
 
Upvote 0
@HOTPEPPER: There are 4000 records and not every record is in the same format. I mean, both my columns may have either one or two words, small or caps. But these are the same names in different format(full name or alias or user name etc). So, I wanted to sort the records that atleast have 3 successive letters matching.
Sorry if this is confusing.

In short, I want to match column A with column B and sort those that have same 3 or more successive letters.

Examples of my data:
ColumnA----------Column B
BasakSomaya-------LEVIN SOMAYA
AndrewCochrane----ACOCHRANE
Alexandra Mininger--AlexMininger
AnnaKrasniewska---akrasniewska
AlexRoss-----------ZROSS
CarstenSchmidt-----PJ Jauhari
.
.
.
etc

@craig.penny: Sure a macro can be helpful, even though I never used it I will definitely try, if I have no other option. But an excel formula would be of great help.

Thank you guys.
-Velma.
 
Upvote 0
This is how I would do it, though surely there are better approaches.

Open the VBA editor by pressing Alt-F11 then paste in this code.

Code:
Sub CompareText()
Dim i As Integer
Dim iRow As Integer
For iRow = 1 To 4000
      For i = 1 To Len(Cells(iRow, 1)) - 2
            If InStr(1, Cells(iRow, 2).Value, Mid(Cells(iRow, 1), i, 3), vbTextCompare) > 0 Then
                  Cells(iRow, 3) = "Match Found"
            End If
      Next i
Next iRow
End Sub

With the cursor positioned somewhere within the block of code listed above, press F5 to run it.

Let me know how/if it works for you. Don't hesitate to ask for more help. If you're unfamiliar with the VBA editor it can be confusing!
 
Upvote 0
Here's a UDF that generates the following results:

Code:
      --------A--------- -----B------ -----C------ -----------D------------
  1   BasakSomaya        LEVIN SOMAYA Somaya       C1 and down: =LCS(A1,B1)
  2   AndrewCochrane     ACOCHRANE    COCHRANE                             
  3   Alexandra Mininger AlexMininger Mininger                             
  4   AnnaKrasniewska    akrasniewska akrasniewska                         
  5   AlexRoss           ZROSS        ROSS                                 
  6   CarstenSchmidt     PJ Jauhari   ar

Code:
Function LCS(ByVal s1 As String, ByVal s2 As String, _
                      Optional bCaseSensitive = False) As String
    ' Returns the Longest Common Substring in s1 and s2
    ' shg 2009
 
    ' Max steps = Len(shorter) * (Len(shorter) + 1) / 2
 
    Dim iComp       As VbCompareMethod
    Dim iLen        As Long     ' match length
    Dim iBeg        As Long     ' where match begins in s1
    Dim sT          As String   ' temp string for swapping
 
    iComp = IIf(bCaseSensitive, VBA.vbBinaryCompare, VBA.vbTextCompare)
 
    ' swap if necessary so s1 is the longer string
    If Len(s1) < Len(s2) Then: sT = s1: s1 = s2: s2 = sT
 
    For iLen = Len(s2) To 1 Step -1
        For iBeg = 1 To Len(s2) - iLen + 1
            LCS = Mid(s2, iBeg, iLen)
            If InStr(1, s1, LCS, iComp) Then Exit Function
        Next iBeg
    Next iLen
    LCS = vbNullString
End Function
 
Upvote 0
@Craig.Penny: That was awesome dude. It worked like a charm. Thank you.

@Shg: I tried to do the same thing what Craig suggested with you function, but it didn't work. Do I need to add something to it?
 
Upvote 0

Forum statistics

Threads
1,216,082
Messages
6,128,709
Members
449,464
Latest member
againofsoul

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top