Cosine similarity

That line should be

Code:
Const D2R = 180# / 3.14159265358979
 
Upvote 0

Excel Facts

Highlight Duplicates
Home, Conditional Formatting, Highlight Cells, Duplicate records, OK to add pink formatting to any duplicates in selected range.
I haven't looked at this for a while but I am asking if someone knows how to calculate the baseline for cosine similarity?
 
Upvote 0
I am asking if someone knows how to calculate the baseline for cosine similarity?

What do you mean by "baseline"? And how do you intend to use the "baseline"?

In https://stackoverflow.com/questions/53796642/how-to-calculate-the-baseline-for-cosine-similarity, someone (you?) wrote:

I am calculating Cosine Similarity and as a first step I am calculating the SUMPRODUCT and I am using the range of (B2:H2) as my baseline.
=SUMPRODUCT($B$2:$H$2;B2:H2)
As a second step I am calculating the Square Root of the Sum of the Squares.
=SQRT(SUMSQ($B$2:$H$2))*SQRT(SUMSQ(B2:H2))
In this calculation, I am also using the range of (B2:H2) as my baseline. I have two questions:
How should I interpret the values of the baseline?
How should I determine what the values for the baseline should be?

That makes no sense to me. That calculation always results in 1, regardless of the values in B2:H2. Proof:
Code:
SQRT(SUMSQ(B2:H2)*SQRT(SUMSQ(B2:H2)) = SQRT(SUMSQ(B2:H2))^2 = SUMSQ(B2:H2)

SUMSQ(B2:H2) = B2^2 + C2^2 +...+ H2^2 = SUMPRODUCT(B2:H2,B2:H2)

So SUMPRODUCT(B2:H2,B2:H2) / SQRT(SUMSQ(B2:H2))^2 = SUMPRODUCT(B2:H2,B2:H2) / SUMPRODUCT(B2:H2,B2:H2) = 1
Is there some significance to your using $B$2:$H$2 v. B2:H2?

For example, do you intend to copy the formula down a column so that the second row becomes:

SUMPRODUCT($B$2:$H$2,B3:H3) / ( SQRT(SUMSQ($B$2:$H$2)) * SQRT(SUMSQ(B3:H3)) )

which can also be written:

SUMPRODUCT($B$2:$H$2,B3:H3) / SQRT(SUMSQ($B$2:$H$2)) / SQRT(SUMSQ(B3:H3))

If so, I can understand what you might mean by calling B2:H2 a "baseline".

But even so, it does not make sense to me to ask "what the values for the baseline should be?". It is what it is!

Although the cosine similarity can be defined for any pair of vectors B2:H2 and B3:H3, the articles that I pointed to in posting #2 use cosine similarity to compare text.

Suppose you have 2 sentences that each uses some combination (zero or more) of 7 words. B2:H2 is the count of each word in sentence #1, and B3:H3 is the count of each word in sentence #2. The cosine similarity calculated above is a measure of similarity, where 1 is identical and 0 is completely different (no shared words).

In that context, to call sentence #1 (represented by B2:H2) the "baseline" simply means that all other sentences are compared to it. It has no numeric value.
 
Last edited:
Upvote 0
Hi joeu2004, thank you for your reply.

You are correct, I am copying the formula down as in your example so that it becomes
Code:
S[COLOR=#333333]UMPRODUCT($B$2:$H$2,[/COLOR][COLOR=#ff0000]B3:H3[/COLOR][COLOR=#333333]) / ( SQRT(SUMSQ($B$2:$H$2)) * SQRT(SUMSQ([/COLOR][COLOR=#ff0000]B3:H3[/COLOR][COLOR=#333333])) )[/COLOR]
.

I am not using it to compare sentences but to compare cities and what I am asking is that if I put City1 as the baseline, the interpretation would be of how similar the other cities are to the baseline City1?
 
Upvote 0
Correct: how each pair of "cities" (represented by their vector of characteristics) compares. If one of the pair is always city1, it is how each other city compares to city1.
 
Last edited:
Upvote 0
I am using 5 metrics, would the results be different if I change Metric1 from 0-100 to Great, Above average, Average, Below average, Poor?
 
Upvote 0
You can use SUMPRODUCT and SUMSQ only with numbers. But you can assign the values 1 through 5 to the metrics Great through Poor (or vice versa).
 
Upvote 0

Forum statistics

Threads
1,214,979
Messages
6,122,551
Members
449,088
Latest member
davidcom

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top