Breaking down the polynomial

duration17 · Sep 26, 2006

Hello!

I am using INDEX() &LINEST() to get the coefficients for a quadratic polynomial regression (using =INDEX(LINEST(y:y,x:x^{1,2}),1,3) kind of formulas). I would like to understand every step of the calculation. Does anyone know how to break down the calculation, using SLOPE() and INTERCEPT() ?

Many thanks,

pgc01 · Sep 26, 2006

Hi duration17
Welcome to the board

According to the smiley, you don't seem to like the x^{1,2}.

Instead of a quadratic polynomial regression think of it as an OLS with 2 explaining variables. In that case you would get 2 slopes and the independent term, right?
Well, x^2 is your second explaining variable.

Now for the x^{1,2}

The function LINEST has two main parameters, the first is the Y values, and the second is a range with the values of all the explaining variables.

Assuming Y in A2:A6, with 2 explaining variables the Xs values could be B2:C6. That's what is in the example I post. I can use

In E4:
=LINEST(A2:A6,B2:C6)

However, the third column has, in fact, the square values of the second column because we are trying to explain Y with X and X^2.

It seems a bit dumb to write a third column with values that can be calculated from the second column. That's why we can use

In E3
=LINEST(A2:A6,B2:B6^{1,2})

The B2:B6^{1,2} means raise each value in B5:B6 first to the power of 1 and then to the power of 2. This calculates the values of B and C and so we don't need the column C.
This way we can calculate the OLS without writing the third column.

I'm sure this is not a good explanation but I hope it contributes to your better understanding of this OLS calculation.

Kind regards
PGC

Book1

A

B

C

D

E

F

G

H

1

Y

X

X^2

2

6

1

m2

m1

b

3

12

2

4

20

3

5

30

4

16

6

42

5

25

7

8

Sheet1

duration17 · Sep 26, 2006

Hi pgc01,
Thank yo very much for reply, it is very insightful.
Do you have any idea how LINEST() calculates the slope and the intercept with two columns (x & x^2)?
Many thanks,

duration17 · Sep 27, 2006

pgc01 · Sep 27, 2006

Hi again

First one remark: ask one friend that did economics for his/her book of econometrics. This is usually a course in the first year at the university and explains in detail all about the OLS and its associated statistics.

The calculation of the parameters of the OLS is a simple calculus problem of finding a minimum. You take the partial derivatives of the coefficients and set them to zero. In the case of 2 explanatory variables you get a system with 3 linear equations.

For the case of 2 explanatory variables, you are trying to find the coefficients for an equation

Z = aX + bY +c

that minimise the sum of the squares of the differences between the estimated values using the equation and the real value in the sample. For one point the difference is

a*x_i + b*y_i + c - z_i

The sum of the squares of the differences:

SUM ( a*x_i + b*y_i + c - z_i )^2

now to find the minimum you take the partial derivatives to a, b and c and set them to zero. For example the case of "a"

SUM 2 * x_i * ( a*x_i + b*y_i + c - z_i ) = 0

or

SUM(x_i^2) * a + SUM( x_i * y_i) * b + SUM(x_i) * c = SUM(x_i * z_i)

Remember that we have all the x_i, y_i and z_i in our sample and so this is a simple equation like

10 * a + 23 * y + 14 * c = 34

We do the same for the other 2 coefficients and we have a system of 3 linear equations. From there we take the values of a, b and c.

In the next web page you have this in the section "Multiple linear regression". I think it's worth that you read the whole page

http://en.wikipedia.org/wiki/Multiple_regression

Hope this helps you understanding the mechanics of the calculation od the OLS coefficients
PGC

duration17 · Sep 27, 2006

PGC,

Thank you very much for your kind reply, you are very knowledgeable!
I have a few friends who did some econometrics but they have forgotten everything

I am going through the article from wikipedia, and could already calculate a simple linear regression manuall in excel.
Just a small question: what do "z" and "i" stand for?

pgc01 · Sep 27, 2006

Hi again

I did a (dumb) example where you would try to explain the size of the jeans given the height (cm) and the weight (Kg) of a man.

In my last post X and Y are the explanatory variables, in this case height and weight. Z was the variable you were trying to explain, in this case the size of the jeans. "i" was the order of the triplet in the data sample. In the case of this example, for instance, using x, y and z of my last post for i=4

x_4 = 180
y_4 = 85
z_4 = 35

and using LINEST(A2:A11,B2:C11) we find

Z = -0.16 * X + 0.22 * Y + 45.2

In column D I calculated the values estimated and if this was a real example it would not be that bad (of course we haven't looked at the stats).

Hope this carifies some of the last post.
PGC

Book1

A

B

C

D

E

F

G

H

I

1

Size

Height

Weight

Estsize

2

32

180

70

31.8

Linest

3

33

180

75

4

34

180

80

34.0

5

35

180

85

35.1

6

36

180

90

36.2

7

31

185

70

31.0

8

32

185

75

32.1

9

33

185

80

33.2

10

34

185

85

34.3

11

36

185

90

35.4

12

13

Sheet5

tusharm · Sep 27, 2006

In addition to PGC's explanations, you may also want to see
Trendline Coefficients and Regression Analysis
http://www.tushar-mehta.com/excel/tips/trendline_coefficients.htm

duration17 · Sep 27, 2006

PGC,

I understand much better thanks to your explanations, thank you very much. Just one question: how do you get the -0.16, 0.22 and 45.2 values?

Tusharm,
Thank you for the website, I am reading and it is very well done!

pgc01 · Sep 27, 2006

Hi again

If you mean how would you do it by hand, without LINEST, it would require some work. In this case, however, since there are only 2 explanatory variables I can show you how, with the values in the example in my last post.

I had derived the first of the 3 equations, by calculating the partial derivative with respect to "a".

SUM(x_i^2) * a + SUM( x_i * y_i) * b + SUM(x_i) * c = SUM(x_i * z_i)

You have to do the same thing for the other 2 coefficients and you get the other 2 equations.

You have already seen them in the "Multiple linear regression" section in http://en.wikipedia.org/wiki/Multiple_regression

They are

SUM( x_i * y_i) * a + SUM(y_i^2) * b + SUM(y_i) * c = SUM(y_i * z_i)

SUM(x_i) * a + SUM( y_i) * b + n = SUM(z_i)

Now we have to calculate all this sums. I post the calculations. You can see that they are very simple (but boring).

Substituting the values:

333125 * a + 146000 * b + 1825 * c = 61310
146000 * a + 64500 * b + 800 * c = 26990
1825 * a + 800 * b + 10 * c = 336

Solving the system:
a = -0.16
b = 0.22
c = 45.2

As we already knew from LINEST.

I hope this clarifies the calculations.
PGC

Book1