string comparison

22strider

Active Member
Joined
Jun 11, 2007
Messages
311
Hello Frieds

I need some help in comparing two columns having text strings.

I have two columns containing text strings (item descriptions). I am trying to compare these two description columns so that I can identify match and mismatch.
In addition to that I also need to do compare the two columns and quantify the difference. So that we have a information like it is 50% match or 60% match etc.

Thanks for your time
 

Excel Facts

Format cells as currency
Select range and press Ctrl+Shift+4 to format cells as currency. (Shift 4 is the $ sign).
Hello Frieds

I need some help in comparing two columns having text strings.

I have two columns containing text strings (item descriptions). I am trying to compare these two description columns so that I can identify match and mismatch.
In addition to that I also need to do compare the two columns and quantify the difference. So that we have a information like it is 50% match or 60% match etc.

Thanks for your time
Maybe something like this (pull formula in C1 down to fill your range):
Excel Workbook
ABCDEF
1AA1%Match71.4%
2BD0
3CC1
4DB0
5EE1
6FF1
7GG1
Sheet12
 
Upvote 0
Hello JoeMo

Thanks for your reply.
Actually the % Match that I am looking for is percentage of comparison.
For example if string in A1 is exactly same as string in B1 then it is 100% match. If half of the string matches in A1 and B1 then it is 50% match.

Thanks
Rajesh
 
Upvote 0
What is the definition of "percentage of comparison"?

If you are comparing "fish" with "fihs" what is the pct. of comparison?
Is "fish" vs. "fihs" a different percentage than "fish" vs "fins"
What is the value for "fishs" vs. "fishes"? Is it the same as "fishes" vs. "fishs" or "fishes" vs. "fishs "?
 
Upvote 0
Hello Mikerickson

Comparing "fish" with "Fihs" should be 50% match as first two characters match only.
"fish" vs "fihs" is same percentage as "fish" vs "fins"
"fishs" vs "fishes" is 80% match
"fishes" vs "fishs" is 66.6% match
"fishes" vs "fishs" is 80% match.

Thanks for your time
Rajesh
 
Upvote 0
"fishs" vs "fishes" is 80% match
4 characters match, "fishs" has 5 characters 4/5 = 80%

Does "fishes" vs "fishs" return 67% (=4/6) sinces the first string ("fishes") has 6 characters
or does it = 80% because the shortest test string has 5 characters.

One follow up, what is "fish" vs. "fishs"?

"fishes" vs "fishs" is 66.6% match
"fishes" vs "fishs" is 80% match.
I think there's a typo in here.
 
Upvote 0
A…….……….…..….....…..B…….…...….….…….…...…..…..….…..C……….….….…...….…...D……..
Description A…….…...Description B……….….….…...No of character match……..Match % ……..
Ask MrExcel.com……..Mr. Excel.com Consulting……….…...12…….…….….….…..…..80%……..
I B M……….….……..…...IBM Corporation….…….….….….….…..3…….……..…….……....60%……..
A. Schulman……..…....A Shulman…..….………..…...…..……...8…….….….….….….…..73%……..
Elvis…….…….….…...…..Lives…….……..….….….…...….…..…...3……….…..….….….…...60%……..
fishs…….…….….…...…..fishes…….…...…..……...…..…..….…..5……….…..….….….…...100%……..
fishes…….…….….…..…..fishs…….…...….…….…..…..…..….…..5……….…..….….….…...83%……..
fishes…….……….…...…..fins…….…...….….…….….….…..….…..4……….…….…...….…...67%……..

Try this fuzzy match formula system,

1] C2, formula copy down :

=MAX(MMULT(--ISNUMBER(FIND(MID(A2,COLUMN(INDIRECT("C1:C"&LEN(A2),)),1),B2)),ROW(INDIRECT("1:"&LEN(A2)))^0))-LEN(A2)+LEN(SUBSTITUTE(A2," ",""))

2] D2, formula copy down :

=C2/LEN(A2)

Cells format, select : Percentage

Regards
Bosco
 
Last edited:
Upvote 0
You are right, it should be 4 out of 6

For "fish" vs. "fishs"; I am not sure. It is not a match for sure. It can be 80% match at the best.
 
Upvote 0
What are you using this percentage for?

If one counts only matching characters then

("abc" vs. "abf") = ("abc" vs. "abzzzzzzzzzzzzzzzzzzz"). Is that what you want?

An alternate approach would be to count the number of character/position mis-matches.

"abc" vs "abd" is 2 mismatched charactrers {c, d} /6 characters total, so 1/3 mis-match >> 2/3 match value (67%)

"abc" vs "abde" is 3 mismatch/7 character; 3/7 mis-match >> 4/7 match (58%)

"fishs" vs "fishes" is 3 mismatch/ 11 character; 3/11 mismatch >> 8/11 mismatch (73%)

"fish" vs "fishs" is 1 / 9 mismatch >> 8/9 match (89%)

Note also that in this system
(A vs B) = (B vs A)

There are a host of ways to measure the "distance" between two strings.

What do you want to do with this number after it is calculated?
 
Last edited:
Upvote 0
You could use something like this.
First let me instroduce some named "constants"

fillerChr =CHAR(5)
fillerStr =REPT(fillerChr,255)
oneTo255 =ROW(INDIRECT("1:255"))

The formula in D4 returns the number of differences
=SUMPRODUCT(--(MID(A4&fillerStr, oneTo255, 1)<>MID(B4&fillerStr, oneTo255, 1)), --((MID(A4&fillerStr, oneTo255, 1)<>fillerChr)+(MID(B4&fillerStr, oneTo255, 1)<>fillerChr)))

E4 returns the difference percentage = D4/(LEN(A4)+LEN(B4))

F4 has the Alike percentage = 1 = E4

In H4, this is all combined into
=1-SUMPRODUCT(--(MID(A4&fillerStr,oneTo255,1)<>MID(B4&fillerStr,oneTo255,1))*((MID(A4&fillerStr,oneTo255,1)<>fillerChr)+(MID(B4&fillerStr,oneTo255,1)<>fillerChr)))/(LEN(A4)+LEN(B4))
<table border=1 cellspacing=0>
<tr align="center" bgcolor=#A0A0A0><td width=25> <td width=125><b>A</b><td width=115><b>B</b><td width=25><b>C</b><td width=25><b>D</b><td width=25><b>E</b><td width=25><b>F</b><td width=25><b>G</b><td width=25><b>H</b></tr>
<tr><td align="center" bgcolor=#A0A0A0><b>1</b><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF>Difference<td align="left" bgcolor=#FFFFFF>Difference<td align="left" bgcolor=#FFFFFF>Alike<td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF>Combined</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>2</b><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF>Count<td align="left" bgcolor=#FFFFFF>Pct<td align="left" bgcolor=#FFFFFF>Pct<td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF>Formula</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>3</b><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF></tr>
<tr><td align="center" bgcolor=#A0A0A0><b>4</b><td align="left" bgcolor=#FFFFFF>IBM <td align="left" bgcolor=#FFFFFF>IBM corp<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>4<td align="right" bgcolor=#FFFFFF>33.3%<td align="right" bgcolor=#FFFFFF>66.7%<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>66.7%</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>5</b><td align="left" bgcolor=#FFFFFF>Sugar Magnolia<td align="left" bgcolor=#FFFFFF>Suger Magnolia<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>2<td align="right" bgcolor=#FFFFFF>7.1%<td align="right" bgcolor=#FFFFFF>92.9%<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>92.9%</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>6</b><td align="left" bgcolor=#FFFFFF>fish<td align="left" bgcolor=#FFFFFF>fishes<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>2<td align="right" bgcolor=#FFFFFF>20.0%<td align="right" bgcolor=#FFFFFF>80.0%<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>80.0%</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>7</b><td align="left" bgcolor=#FFFFFF>fishes<td align="left" bgcolor=#FFFFFF>fishs<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>3<td align="right" bgcolor=#FFFFFF>27.3%<td align="right" bgcolor=#FFFFFF>72.7%<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>72.7%</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>8</b><td align="left" bgcolor=#FFFFFF>fish<td align="left" bgcolor=#FFFFFF>fishs<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>1<td align="right" bgcolor=#FFFFFF>11.1%<td align="right" bgcolor=#FFFFFF>88.9%<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>88.9%</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>9</b><td align="left" bgcolor=#FFFFFF>fish<td align="left" bgcolor=#FFFFFF>fins<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>4<td align="right" bgcolor=#FFFFFF>50.0%<td align="right" bgcolor=#FFFFFF>50.0%<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>50.0%</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>10</b><td align="left" bgcolor=#FFFFFF>fish<td align="left" bgcolor=#FFFFFF>fihs<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>4<td align="right" bgcolor=#FFFFFF>50.0%<td align="right" bgcolor=#FFFFFF>50.0%<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>50.0%</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>11</b><td align="left" bgcolor=#FFFFFF>ab<td align="left" bgcolor=#FFFFFF>c<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>3<td align="right" bgcolor=#FFFFFF>100.0%<td align="right" bgcolor=#FFFFFF>0.0%<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>0.0%</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>12</b><td align="left" bgcolor=#FFFFFF>ab<td align="left" bgcolor=#FFFFFF>abzzzzz<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>5<td align="right" bgcolor=#FFFFFF>55.6%<td align="right" bgcolor=#FFFFFF>44.4%<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>44.4%</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>13</b><td align="left" bgcolor=#FFFFFF>ab<td align="left" bgcolor=#FFFFFF>abzzzzzzz<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>7<td align="right" bgcolor=#FFFFFF>63.6%<td align="right" bgcolor=#FFFFFF>36.4%<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>36.4%</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>14</b><td align="left" bgcolor=#FFFFFF>abc<td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>3<td align="right" bgcolor=#FFFFFF>100.0%<td align="right" bgcolor=#FFFFFF>0.0%<td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>0.0%</tr>
<tr><td align="center" bgcolor=#A0A0A0><b>15</b><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF><td align="right" bgcolor=#FFFFFF>0<td align="left" bgcolor=#FFFFFF>#DIV/0!<td align="left" bgcolor=#FFFFFF>#DIV/0!<td align="left" bgcolor=#FFFFFF><td align="left" bgcolor=#FFFFFF>#DIV/0!</tr>
</table>
 
Last edited:
Upvote 0

Forum statistics

Threads
1,218,899
Messages
6,145,082
Members
450,590
Latest member
Naneng

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top