Back to Dates in Excel archive index

Back to archive home

Thank you for your help earlier in the year with the formula below. The formula works well however it has problems with words that have a double letter in common. For 'doll' and 'roll' it should return a value of 3 letters in common (position and identity independent), however the formula returns a value of 2. Sorry for not making the identity independence parameter clear in my initial request for help. Any ideas? There were a couple of other similar formulas suggested previously however all have the same problem. Thanks, Carl.

{=SUM(IF(FREQUENCY(CODE(MID(UPPER(A3),ROW(INDIRECT("1:"&LEN(A3))),1)),ROW($64:$90))*FREQUENCY(CODE(MID(UPPER(B3),ROW(INDIRECT("1:"&LEN(B3))),1)),ROW($64:$90))*MID(0&REPT(1,26)&0,ROW($1:$28),1),1))}

It would be helpful if you could provide some cases

and their expected result. For example, "roof" and

"rotor"... by your new requirement are there 3 or 4

letters in common?

and their expected result. For example, "roof" and

"rotor"... by your new requirement are there 3 or 4

letters in common?

Sorry for not being clear about this. What I am interested in the amount of letter overlap. examples:

ROOF ROTOR 3

PIT TIT 2

ROLL DOLL 3

LOLL ROLL 3 (see below)

I have been working on a two step process that puts the letters in each string into separate columns using MID, and then uses a formula to count the number of letters in the first string that are in the second. The problem is that it does not work for pairs where the first letter string has more occurrences of a particular letter than the second, eg for LOLL and ROLL it gives 4 not 3 as I would like (I am dealing with both real words and non-words).

Thanks.

So, what logic did you use to determine that the

overlap for "roof" and "rotor" was 3? Did you

only use "roto" for the comparison? Will longer

words always be truncated to length of the

a shorter one?

overlap for "roof" and "rotor" was 3? Did you

only use "roto" for the comparison? Will longer

words always be truncated to length of the

a shorter one?

In the case of "loll" and "roll", which of the

letters in "loll" are included in the count of 3?

And, why? So, what logic did you use to determine that the

letters in "loll" are included in the count of 3?

And, why? So, what logic did you use to determine that the

...of your posting I now understand that the 1st

word is used as the standard for the comparison.

I think the morning coffee is starting to "kick in". So, what logic did you use to determine that the

word is used as the standard for the comparison.

I think the morning coffee is starting to "kick in". So, what logic did you use to determine that the

The current set up, which has turned into a collaborative effort, is as follows:

S1 appears in column A and S2 appears in column C. The MID function (thanks Steve) then separates S1 into individual letters appearing in columns G:K depending on its length (at present we are only looking at letter strings with a max of 5 letters but this may change); and S2 is represented by its individual letters in columns M:Q. The following formula is then applied:

IF(AND(OR(G2=M2,G2=N2,G2=O2,G2=P2,G2=Q2),

G2<>""),1,0).

This formula is used in the cells S:W (in row 2 of course in this case) with G2 being replaced by H2 in the formula as you move from column S to column T and so on until you reach column W. So what the formula is doing as you can probably tell is assigning the value 1 to a cell for each appearance of a letter in S1 that also appears in S2. Empty cells in S1 (G:K) or letters in S1 that are not in S2 get coded 0. We then sum from S:W to get the amount of orthographic overlap in column Y.

S1 appears in column A and S2 appears in column C. The MID function (thanks Steve) then separates S1 into individual letters appearing in columns G:K depending on its length (at present we are only looking at letter strings with a max of 5 letters but this may change); and S2 is represented by its individual letters in columns M:Q. The following formula is then applied:

IF(AND(OR(G2=M2,G2=N2,G2=O2,G2=P2,G2=Q2),

G2<>""),1,0).

This formula is used in the cells S:W (in row 2 of course in this case) with G2 being replaced by H2 in the formula as you move from column S to column T and so on until you reach column W. So what the formula is doing as you can probably tell is assigning the value 1 to a cell for each appearance of a letter in S1 that also appears in S2. Empty cells in S1 (G:K) or letters in S1 that are not in S2 get coded 0. We then sum from S:W to get the amount of orthographic overlap in column Y.

So ROOF:ROTOR will give an output in cells S:W of 1 1 1 0 0 summing to 3 which is correct. But LOLL:DOLL will give 1 1 1 1 summing to a value of 4 which is incorrect. So as mentioned the formula at present works for all strings except in the case where there are more exemplars of a particular letter in S1 than in S2. As far as the actual column assignment is concerned from A:Y we are using the columns not mentioned for other menial things like letter counts and spacing etc. Also as I requested in my first message it would be best if we could come up with a formula that required only columns A:C, with S1 in ColA and S2 in ColB and the formula in ColC. Hope I'm making things clearer. I wouldn't be surprised if my colleage Steve chips in to help my clarity problem.

Carl.

Hi, Carl has explained the approach we have taken so far, but to make it absolutely clear what we mean by "orthographic overlap", here's what you would do if you did it by hand: Take two letter strings e.g. ROLL and LOLL. Choose one word and cross out letters, a pair at a time, that appear in both words, the number of pairs you cross out is the amount of orthographic overlap of the two strings. For example,first you cross out one L from LOLL, then one L from ROLL, that's a pair.Then you cross out O from LOLL and O from ROLL, that's another pair. Then you cross out the next L from LOLL and L from ROLL. Then there are no more letters in common between the two words, as only L and R are left, and they don't match, so you stop. You then count the number of pairs you have crossed out, and that is 3, and that is the amount of orthographic overlap. This may give you other ideas about how to approach it. Ideally it should work for any pairs of strings e.g. WWWFF and KWFWQ.Here the overlap is 3. Or WWWFF and KFW. Here the overlap is 2.Don't want much, huh? steve

steve

: In the case of "loll" and "roll", which of the : letters in "loll" are included in the count of 3? : And, why?

This archive is from the original message board at www.MrExcel.com.

All contents © 1998-2004 MrExcel.com.

Visit our online store to buy searchable CD's with thousands of VBA and Excel answers.

Microsoft Excel is a registered trademark of the Microsoft Corporation.

MrExcel is a registered trademark of Tickling Keys, Inc.

All contents © 1998-2004 MrExcel.com.

Visit our online store to buy searchable CD's with thousands of VBA and Excel answers.

Microsoft Excel is a registered trademark of the Microsoft Corporation.

MrExcel is a registered trademark of Tickling Keys, Inc.