string comparison

22strider

Active Member
Joined
Jun 11, 2007
Messages
299
Hello Frieds

I need some help in comparing two columns having text strings.

I have two columns containing text strings (item descriptions). I am trying to compare these two description columns so that I can identify match and mismatch.
In addition to that I also need to do compare the two columns and quantify the difference. So that we have a information like it is 50% match or 60% match etc.

Thanks for your time
 

JoeMo

MrExcel MVP
Joined
May 26, 2009
Messages
16,691
Hello Frieds

I need some help in comparing two columns having text strings.

I have two columns containing text strings (item descriptions). I am trying to compare these two description columns so that I can identify match and mismatch.
In addition to that I also need to do compare the two columns and quantify the difference. So that we have a information like it is 50% match or 60% match etc.

Thanks for your time
Maybe something like this (pull formula in C1 down to fill your range):
Excel Workbook
ABCDEF
1AA1%Match71.4%
2BD0
3CC1
4DB0
5EE1
6FF1
7GG1
Sheet12
 

22strider

Active Member
Joined
Jun 11, 2007
Messages
299
Hello JoeMo

Thanks for your reply.
Actually the % Match that I am looking for is percentage of comparison.
For example if string in A1 is exactly same as string in B1 then it is 100% match. If half of the string matches in A1 and B1 then it is 50% match.

Thanks
Rajesh
 

mikerickson

MrExcel MVP
Joined
Jan 15, 2007
Messages
22,895
What is the definition of "percentage of comparison"?

If you are comparing "fish" with "fihs" what is the pct. of comparison?
Is "fish" vs. "fihs" a different percentage than "fish" vs "fins"
What is the value for "fishs" vs. "fishes"? Is it the same as "fishes" vs. "fishs" or "fishes" vs. "fishs "?
 

22strider

Active Member
Joined
Jun 11, 2007
Messages
299
Hello Mikerickson

Comparing "fish" with "Fihs" should be 50% match as first two characters match only.
"fish" vs "fihs" is same percentage as "fish" vs "fins"
"fishs" vs "fishes" is 80% match
"fishes" vs "fishs" is 66.6% match
"fishes" vs "fishs" is 80% match.

Thanks for your time
Rajesh
 

mikerickson

MrExcel MVP
Joined
Jan 15, 2007
Messages
22,895
"fishs" vs "fishes" is 80% match
4 characters match, "fishs" has 5 characters 4/5 = 80%

Does "fishes" vs "fishs" return 67% (=4/6) sinces the first string ("fishes") has 6 characters
or does it = 80% because the shortest test string has 5 characters.

One follow up, what is "fish" vs. "fishs"?

"fishes" vs "fishs" is 66.6% match
"fishes" vs "fishs" is 80% match.
I think there's a typo in here.
 

bosco_yip

Well-known Member
Joined
Dec 2, 2002
Messages
1,938
Office Version
2019
Platform
Windows
A…….……….…..….....…..B…….…...….….…….…...…..…..….…..C……….….….…...….…...D……..
Description A…….…...Description B……….….….…...No of character match……..Match % ……..
Ask MrExcel.com……..Mr. Excel.com Consulting……….…...12…….…….….….…..…..80%……..
I B M……….….……..…...IBM Corporation….…….….….….….…..3…….……..…….……....60%……..
A. Schulman……..…....A Shulman…..….………..…...…..……...8…….….….….….….…..73%……..
Elvis…….…….….…...…..Lives…….……..….….….…...….…..…...3……….…..….….….…...60%……..
fishs…….…….….…...…..fishes…….…...…..……...…..…..….…..5……….…..….….….…...100%……..
fishes…….…….….…..…..fishs…….…...….…….…..…..…..….…..5……….…..….….….…...83%……..
fishes…….……….…...…..fins…….…...….….…….….….…..….…..4……….…….…...….…...67%……..

Try this fuzzy match formula system,

1] C2, formula copy down :

=MAX(MMULT(--ISNUMBER(FIND(MID(A2,COLUMN(INDIRECT("C1:C"&LEN(A2),)),1),B2)),ROW(INDIRECT("1:"&LEN(A2)))^0))-LEN(A2)+LEN(SUBSTITUTE(A2," ",""))

2] D2, formula copy down :

=C2/LEN(A2)

Cells format, select : Percentage

Regards
Bosco
 
Last edited:

22strider

Active Member
Joined
Jun 11, 2007
Messages
299
You are right, it should be 4 out of 6

For "fish" vs. "fishs"; I am not sure. It is not a match for sure. It can be 80% match at the best.
 

mikerickson

MrExcel MVP
Joined
Jan 15, 2007
Messages
22,895
What are you using this percentage for?

If one counts only matching characters then

("abc" vs. "abf") = ("abc" vs. "abzzzzzzzzzzzzzzzzzzz"). Is that what you want?

An alternate approach would be to count the number of character/position mis-matches.

"abc" vs "abd" is 2 mismatched charactrers {c, d} /6 characters total, so 1/3 mis-match >> 2/3 match value (67%)

"abc" vs "abde" is 3 mismatch/7 character; 3/7 mis-match >> 4/7 match (58%)

"fishs" vs "fishes" is 3 mismatch/ 11 character; 3/11 mismatch >> 8/11 mismatch (73%)

"fish" vs "fishs" is 1 / 9 mismatch >> 8/9 match (89%)

Note also that in this system
(A vs B) = (B vs A)

There are a host of ways to measure the "distance" between two strings.

What do you want to do with this number after it is calculated?
 
Last edited:

mikerickson

MrExcel MVP
Joined
Jan 15, 2007
Messages
22,895
You could use something like this.
First let me instroduce some named "constants"

fillerChr =CHAR(5)
fillerStr =REPT(fillerChr,255)
oneTo255 =ROW(INDIRECT("1:255"))

The formula in D4 returns the number of differences
=SUMPRODUCT(--(MID(A4&fillerStr, oneTo255, 1)<>MID(B4&fillerStr, oneTo255, 1)), --((MID(A4&fillerStr, oneTo255, 1)<>fillerChr)+(MID(B4&fillerStr, oneTo255, 1)<>fillerChr)))

E4 returns the difference percentage = D4/(LEN(A4)+LEN(B4))

F4 has the Alike percentage = 1 = E4

In H4, this is all combined into
=1-SUMPRODUCT(--(MID(A4&fillerStr,oneTo255,1)<>MID(B4&fillerStr,oneTo255,1))*((MID(A4&fillerStr,oneTo255,1)<>fillerChr)+(MID(B4&fillerStr,oneTo255,1)<>fillerChr)))/(LEN(A4)+LEN(B4))
<table border=1 cellspacing=0>
<tr align="center" bgcolor=#A0A0A0><td width=25> <td width=125><b>A</b><td width=115><b>B</b><td width=25><b>C</b><td width=25><b>D</b><td width=25><b>E</b><td width=25><b>F</b><td width=25><b>G</b><td width=25><b>H</b></tr>
<tr><td align="center" bgcolor=#A0A0A0><b>1</b><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF>Difference<td align="left" bgcolor=#FFFFFF>Difference<td align="left" bgcolor=#FFFFFF>Alike<td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF>Combined</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>2</b><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF>Count<td align="left" bgcolor=#FFFFFF>Pct<td align="left" bgcolor=#FFFFFF>Pct<td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF>Formula</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>3</b><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF></tr>
<tr><td align="center" bgcolor=#A0A0A0><b>4</b><td align="left" bgcolor=#FFFFFF>IBM <td align="left" bgcolor=#FFFFFF>IBM corp<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>4<td align="right" bgcolor=#FFFFFF>33.3%<td align="right" bgcolor=#FFFFFF>66.7%<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>66.7%</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>5</b><td align="left" bgcolor=#FFFFFF>Sugar Magnolia<td align="left" bgcolor=#FFFFFF>Suger Magnolia<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>2<td align="right" bgcolor=#FFFFFF>7.1%<td align="right" bgcolor=#FFFFFF>92.9%<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>92.9%</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>6</b><td align="left" bgcolor=#FFFFFF>fish<td align="left" bgcolor=#FFFFFF>fishes<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>2<td align="right" bgcolor=#FFFFFF>20.0%<td align="right" bgcolor=#FFFFFF>80.0%<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>80.0%</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>7</b><td align="left" bgcolor=#FFFFFF>fishes<td align="left" bgcolor=#FFFFFF>fishs<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>3<td align="right" bgcolor=#FFFFFF>27.3%<td align="right" bgcolor=#FFFFFF>72.7%<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>72.7%</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>8</b><td align="left" bgcolor=#FFFFFF>fish<td align="left" bgcolor=#FFFFFF>fishs<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>1<td align="right" bgcolor=#FFFFFF>11.1%<td align="right" bgcolor=#FFFFFF>88.9%<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>88.9%</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>9</b><td align="left" bgcolor=#FFFFFF>fish<td align="left" bgcolor=#FFFFFF>fins<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>4<td align="right" bgcolor=#FFFFFF>50.0%<td align="right" bgcolor=#FFFFFF>50.0%<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>50.0%</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>10</b><td align="left" bgcolor=#FFFFFF>fish<td align="left" bgcolor=#FFFFFF>fihs<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>4<td align="right" bgcolor=#FFFFFF>50.0%<td align="right" bgcolor=#FFFFFF>50.0%<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>50.0%</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>11</b><td align="left" bgcolor=#FFFFFF>ab<td align="left" bgcolor=#FFFFFF>c<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>3<td align="right" bgcolor=#FFFFFF>100.0%<td align="right" bgcolor=#FFFFFF>0.0%<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>0.0%</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>12</b><td align="left" bgcolor=#FFFFFF>ab<td align="left" bgcolor=#FFFFFF>abzzzzz<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>5<td align="right" bgcolor=#FFFFFF>55.6%<td align="right" bgcolor=#FFFFFF>44.4%<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>44.4%</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>13</b><td align="left" bgcolor=#FFFFFF>ab<td align="left" bgcolor=#FFFFFF>abzzzzzzz<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>7<td align="right" bgcolor=#FFFFFF>63.6%<td align="right" bgcolor=#FFFFFF>36.4%<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>36.4%</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>14</b><td align="left" bgcolor=#FFFFFF>abc<td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>3<td align="right" bgcolor=#FFFFFF>100.0%<td align="right" bgcolor=#FFFFFF>0.0%<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>0.0%</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>15</b><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>0<td align="left" bgcolor=#FFFFFF>#DIV/0!<td align="left" bgcolor=#FFFFFF>#DIV/0!<td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF>#DIV/0!</tr>
</table>
 
Last edited:

Forum statistics

Threads
1,085,986
Messages
5,387,112
Members
402,042
Latest member
snowballresource02

Some videos you may like

This Week's Hot Topics

Top