# Creating dummy variables in excel for regression

#### mstunji

##### New Member
I am desperate. Im trying to meet a deadline today and need your help. I am trying to do a multiple regression in Microsoft excel is : Median home value for owner occupied units = median age structure was built per tenure (x1) + State (GA) (x2) + Median household income (x3) + median number of rooms per tenure (x4). i am examining variables that seem to be related to median housing values in 2 jurisdictions and whether those relationships differ between the 2 jurisdictions.

I will be using the same x and y variables for each jurisdiction, but of course each jurisdictions will have their individual observations for each variable. However, the challenge for me is this: I need to differentiate the 2 jurisdictions in each observation by creating a dummy variable to represent each jurisdiction (Georgia and Virginia). GA have 158 counties as observations and VA has 135 counties as observations. I know i have to code the dummy variables, but Im not sure how to do it. I tried using the IF function but I dont know if i wrote the right formula. I collected separate data for each jurisdiction, and each jurisdiction has their own counties (as observations). I also don't know to input the dummy variables into the Regression Data analysis tool.

I attempted to assign a code by using the IF function. This is what I did, =IF(A4=1,1,0) meaning A4 is the cell that has “Georgia” as the state name, where if its true its 1 and if its false its 0. Was i supposed to create a function for each county (as they are the observations)? Someone please help me asap!

### Excel Facts

Difference between two dates
Secret function! Use =DATEDIF(A2,B2,"Y")&" years"&=DATEDIF(A2,B2,"YM")&" months"&=DATEDIF(A2,B2,"MD")&" days"

#### mstunji

##### New Member
y x1 x2 x3 x4
Median Year Built by tenure Median
STATE COUNTY/CITY Median Housing Value Georgia (Owner occupd) number of room by tenure MED_HSHLD INC
Georgia Appling County, Georgia 51,300 0 1979 5.7 30266
Georgia Atkinson County, Georgia 32,500 0 1981 5.5 26470
Georgia Bacon County, Georgia 46,400 0 1976 5.7 26910
Georgia Baker County, Georgia 50,200 0 1978 5.5 30338
Georgia Baldwin County, Georgia 70,400 0 1980 5.8 35159
Georgia Banks County, Georgia 81,800 0 1985 5.7 38523
Georgia Barrow County, Georgia 98,700 0 1989 5.9 45019
Georgia Bartow County, Georgia 92,700 0 1986 5.9 43660
Georgia Ben Hill County, Georgia 51,700 0 1975 5.8 27100
Georgia Berrien County, Georgia 53,900 0 1980 5.7 30044
Georgia Bibb County, Georgia 82,700 0 1970 6.3 34532
Georgia Bleckley County, Georgia 60,600 0 1977 5.9 33448
Georgia Brantley County, Georgia 39,400 0 1984 5.4 30361
Georgia Brooks County, Georgia 55,400 0 1980 5.8 26911
Georgia Bryan County, Georgia 94,900 0 1989 6 48345
Georgia Bulloch County, Georgia 80,300 0 1983 6 29499
Georgia Burke County, Georgia 51,800 0 1982 5.6 27,877
Georgia Butts County, Georgia 86,100 0 1985 5.7 39879

##### Well-known Member
Welcome to the board!

Try the formula =IF(A4="Georgia",1,0)

P.S. Excel would not be my first choice for multiple regression, especially if you have any missing values. You may want to take a look here before doing any analysis:

http://www.practicalstats.com/xlsstats/excelstats.html

#### mstunji

##### New Member
I appreciate your suggestions. I will definitely consider using another software in the future. however right now this is the software i was instructed to use. i tried your IF function and below are the results: is the variable one results supposed to look like that? and why is the intercept so large, given my median home values were only six digits?

my y intercept is very large, seven digits, even though i was using six digit home values, and my x variable, which is the dummy variable, the value are all 0's and #NUM!. is that to be expected? I was trying to paste the picture but i cant.

##### Well-known Member
As an example using the data you posted, I changed the state from Georgia to Virginia for the last 9 rows of data so I could test it out. Then using the regression option in the Analysis Toolpak Add-in (if you haven't loaded this add-in, go to Tools -> Add-ins (or if using Excel 2007, click on the Office button and click the Excel Options button at the bottom and then click on the Add-ins button, then the Go button and check the boxes for Analysis Toolpak Add-in)

Then on the Tools menu you should now have a Data Analysis option (in 2007 it should appear on the Data ribbon). Select Regression from the Data Analysis window, select the Median House Price as the Y and State, YearBuilt, Tenure, Income as the X's.

I got a Adjusted R Square value of 0.889132 and the following coefficients:

Coefficients Standard Error
Intercept -2599583.242 1219752.51
State 1822.329388 3493.596202
Year 1165.826381 597.6539729
Tenure 53477.01658 12101.37395
Income 1.370802988 0.473004399

which I double checked using JMP software (a SAS product). The "large" value for the intercept is due (I think) to differing scales of the X's.

Last edited:

#### mstunji

##### New Member
i got similar results, however my confusion is now answering the following question: How is the effect of the x variables on home values differ in each state when variables for both states are included in the regression equation? or will it make more sense to include only 1 dummy variable per regression equation so that I can compare both states. when I did that the results for state were the following

coefficient standard error tstat p-value
0 0 65535 #NUM!

Is that to be expected? Or do I "have to" include 2 dummy variables (1 (VA) and 0 (GA)) in one regression equation for the results to make sense? Or does it make sense?

I hope you can help becuase this is definitely my last confusion. Thank you in advance!

Replies
0
Views
201
Replies
0
Views
2K
Replies
1
Views
570
Replies
1
Views
718
Replies
4
Views
353

Excel contains over 450 functions, with more added every year. That’s a huge number, so where should you start? Right here with this bundle.

### Forum statistics

1,151,978
Messages
5,767,422
Members
425,411
Latest member
sarmientojayexce03

### We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.

### Which adblocker are you using?

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

### Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

### Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back