Listing text differences between 2 cells

Mr Wilson

New Member
Joined
Jun 25, 2021
Messages
3
Office Version
  1. 365
Platform
  1. Windows
Hello

I am matching product descriptions data between 2 sets of data that I know will not match. What I am after is some way of showing what the character differences between the 2 cells are, e.g. comparing the below 2 products.
FELI-WAY REFILL PK 48ML
OPTIMUM DIFFUSER PACK FELI WAY 48ML

What I think I need is a formula or VBA to have 2 inputs and to list what characters input 2 has that are not contained in input 1. I would then run it both ways, e.g.
(compare 1 to 2) would show "-REFILL"
(compare 2 to 1) would show "OPTIMUM DIFFUSER AC "

This is based on it being able to compare words and match or part match (PK vs PACK having AC as a difference) which I'd prefer but I doubt is possible. It may need to be done by individual character so showing:
(compare 1 to 2) would show "-LL"
(compare 2 to 1) would show "OPTMUM DIFUSR AC "

I am comparing 2 companies product descriptions (across thousands of products) and trying to match them. I've searched for key terms and used those to find a match but as there is potentially many to many matches for these I want a way to highlight how good a match the data is (it would also possible allow picking out key words to add the initial matching exercise). From what I've seen the above example covers the various issues I've seen:
-Details matching but in a different order
-Abbreviations
-Spaces not being used (not in the above but FELIWAY48ML being used for example)

If there is a better ways to do this I'm open to suggestions but please bear in mind I don't control and cannot request either of the base data be changed.
 

Excel Facts

Test for Multiple Conditions in IF?
Use AND(test, test, test, test) or OR(test, test, test, ...) as the logical_test argument of IF.

mrshl9898

Well-known Member
Joined
Feb 6, 2012
Messages
1,951
Might want to have a look into fuzzy lookups. It's an add in that sees how similar words are.

 

Mr Wilson

New Member
Joined
Jun 25, 2021
Messages
3
Office Version
  1. 365
Platform
  1. Windows
That's an interesting tool.

I've tried it with some of my data however and on its own it won't be much help as it isn't showing what matched and what didn't so if it matches information except 5ML or 50ML it marks as >95% but for my purposes that makes it a very different product and shouldn't be matched. There are also a lot of products at around the 80-90% mark that are completely different in practice.
I may be able to combine it with some other checks to get some use out of it (it will involve a lot of playing with the levels and manual checking) and if not this task it will be useful for some others so thank you for the response MrSHL9898, its always good to learn new things.

As it doesn't quite do what I'm after so if there are other ways I'd appreciate more suggestions.
 

mrshl9898

Well-known Member
Joined
Feb 6, 2012
Messages
1,951
Trying to match strings is always a painful process, addresses are most commonly a struggle.

Being products, the most obvious solution would be UPC, being companies with these products i'd ask the obvious and see if you can get these, you'd hope they would have them.

Outside of that, ML as mentioned could be a good filter. You could remove the non numeric values perhaps.

1624998121297.png


B1 =TEXTJOIN("",TRUE,IFERROR(MID(A1,SEQUENCE(LEN(A1)),1)+0,""))


Also if there are a bunch of common words to look for you could search for that may help. I sometimes run them across the top row and use FIND to search for them, eg:

1624997928772.png


B2 =IFERROR(FIND(B$1,$A2),"")
 

Attachments

  • 1624997869068.png
    1624997869068.png
    6.2 KB · Views: 1
Last edited:

Mr Wilson

New Member
Joined
Jun 25, 2021
Messages
3
Office Version
  1. 365
Platform
  1. Windows
As it happens that's the other method I referred to that I've been using on both sets of data to build an artificial comparison string (substituting to remove space before ML then taking the units and ML into the cell instead of just a number for yes). It's worked well for where there are a lot of the key data in them (I have about 10 I'm picking out) but there are a lot of products in the list who's key words would only appear in a couple of products and so many words that could be abbreviated (I've used substitute for the main ones).
I've got a few issues with that process to but relatively minor and I think that would need to be a different thread.
 

Forum statistics

Threads
1,141,616
Messages
5,707,419
Members
421,509
Latest member
someinternetuser

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Top