20 million cell arrays - regression

rwheeler

Board Regular
Joined
Apr 22, 2010
Messages
105
Hello,
I have a sheet with 3 sets of 20 million cells. Each set has 1 million cells in each column(Total 60 columns). I want to do a multi-variate linear regression. I would usually use data analysis for this but I don't know if I can use multi-dimensional arays or if it can even handle that amount of data.

Any ideas?
 

Excel Facts

What did Pito Salas invent?
Pito Salas, working for Lotus, popularized what would become to be pivot tables. It was released as Lotus Improv in 1989.
Hello,
I have a sheet with 3 sets of 20 million cells. Each set has 1 million cells in each column(Total 60 columns). I want to do a multi-variate linear regression. I would usually use data analysis for this but I don't know if I can use multi-dimensional arays or if it can even handle that amount of data.

Any ideas?
It shouldn't be any problem doing regression with data this size.

Assuming your computer has adequate specs, which probably needn't be that high, particularly if you've got reasonably well-conditioned data.

If you provide a bit more specifics, and indicate what information you want outputted, and indicate if you need any specific help, then this could be forthcoming.
 
Upvote 0
tweedle - I have 3 sets of 20 column. Each column has 1 million rows. Each set is a variable.

Ideally I would have 3 columns with 20 million cells so that I could just run a linest() function on the y vs x1 and x2.

The issue is that the dependant variable has 20 columns that, when combined, make up the entire array. Can you think of a way to combine these columns in an array so that I can run the regression?
 
Upvote 0
tweedle - I have 3 sets of 20 column. Each column has 1 million rows. Each set is a variable.

Ideally I would have 3 columns with 20 million cells so that I could just run a linest() function on the y vs x1 and x2.

The issue is that the dependant variable has 20 columns that, when combined, make up the entire array. Can you think of a way to combine these columns in an array so that I can run the regression?
Could you be a bit more specific on your data layout?

Do you mean 20 columns each of 1million elements stacked on top of each other?

An Excel worksheet obviously can't do this, but it's readily enough done in the computer memory.

Maybe if you post a simple example of say 10 rows and a few columns to show just what your data setup looks like ...
 
Upvote 0
Each variable has 20 columns of 1million elements in each row (20 million elements for each variable in total). There are 3 variables (one dependent and 2 independant). If excel allowed, I would just make one column for each varialbe that had 20 million rows, but as you stated, that is not possible. Becuause the LINEST function uses arrrays, i need to find a way to combine the 20 columns of each variable programically. Does this make sense?
 
Upvote 0
Each variable has 20 columns of 1million elements in each row (20 million elements for each variable in total). There are 3 variables (one dependent and 2 independant). If excel allowed, I would just make one column for each varialbe that had 20 million rows, but as you stated, that is not possible. Becuause the LINEST function uses arrrays, i need to find a way to combine the 20 columns of each variable programically. Does this make sense?
I think that the LINEST function can't be directly used this way.

If you look at the properties/uses of this function from Microsoft help
The range of known_x's can include one or more sets of variables. If only one variable is used, known_y's and known_x's can be ranges of any shape, as long as they have equal dimensions. If more than one variable is used, known_y's must be a vector (that is, a range with a height of one row or a width of one column).

It could probably be done by using a stepwise regression, getting LINEST to do the donkey work, but this is getting a bit far away from the otherwise convenience of the prebuilt Excel functions. In such case it would be better to go for one of the many prebuilt regression programs that are available elsewhere.

I also doubt that the regression in the data analysis addin of Excel would do this, since like the direct LINEST it seems to require a single column vector of data for each of the variables used.

It's not that hard to write some VBA code to do a regression on the data setup you describe. But while getting the coefficients is easy enough, the other regression statistics take successively longer to write code for, and there seems little point anyone doing this where there are so many other dedicated regression programs readily available, of which some at least should be capable of handling the data setup you describe.

Using Excel outside of its comfortable limits is, ... well, just not that comfortable.

I hope these comments are of some help to you.
 
Last edited:
Upvote 0

Forum statistics

Threads
1,224,599
Messages
6,179,831
Members
452,947
Latest member
Gerry_F

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top