R-Squared/Correlation Question (w/ scatter plot)

Jennlax27iga0

New Member
Joined
May 4, 2011
Messages
21
Not sure if this is the correct forum, if not any suggestions on where to post would be welcome. I have graphed a scatter plot and trendline in Excel and have been trying to analyze the significance of the r-squared value. The graph covers 4 years so there are 4 points for each team graphed.

In looking at the graph I thought the group circled in red was impacting the r-squared value the most due to it's variance/distance from the trendline, however removing that group lowered the r-squared value. The two groups that actually had the greatest impact/ lowered the r-squared value the most are circled in green (the group with the third greatest impact is circled in purple). If someone can help give me some insight into why this is the case, I would greatly appreciate it (not sure if I'm not understanding the Excel calculation/formula, or the general concept). Thanks.
Picture2.jpg
 

Excel Facts

Highlight Duplicates
Home, Conditional Formatting, Highlight Cells, Duplicate records, OK to add pink formatting to any duplicates in selected range.
Not saying this is the best place to go for help, but it is a good start.

http://en.wikipedia.org/wiki/R-squared_value

A low R2 is saying there is too much variability in your model. The future outcomes are not predictable.

Let's say you have a restaurant. In order to figure out how much staff you need, you would use patrons or maybe even meals served as your workload factor because they are most likely the best predictor of how much staff is required.

The more patrons/meals, the higher the staff required >> predictable

The fewer patrons/meals, the fewer the staff required >> predictable

But instead of patrons or meals you build your model on number of windows in the restaurant. What does that have to do with how much staff is required?

It is not a good predictor.

Maybe the areas you have circled are what can be called outlliers. They fall outside of a certain number of standard deviations. Maybe for good reason, but they should not be part of the model. Could be they need their own model.

Another example, we had a graph like yours where there were two different sets of data points. Turned out, it was because they were two different aircraft types; therefore, they needed their own linear regression model.

Hope some of this helps...
 
Upvote 0
Upvote 0

Forum statistics

Threads
1,212,059
Messages
6,105,659
Members
447,974
Latest member
misspancake

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top