Need advice on externally referencing a massive dataset (100gb)

Tamjamin

New Member
Joined
Jul 21, 2013
Messages
2
I'm in the finance industry, and I need some advice.

I have 100gigs of raw historical option price data in .csv format.

It's essentially 10 years worth of daily closing prices for every single option traded in North America.

When I got the data, each .csv file contained a single day's worth of historical prices (Roughly 12 columns by half a million rows in one worksheet, approx. 15mb).

I compiled these into ten seperate workbooks. One for each of the ten years of data. Each workbook(10gb) contained 250 worksheets, and each worksheet stored 12 columns & half a million rows.

I found that this became too computationally intensive, and my 32gb DDR3 i9 3770 computer takes much time even opening one of those files (Roughly 30 mins).

I am conducting statistical analysis on these prices, and wish to build a model that can simulate my models with virtual money. I plan to have one master analytic file which externally references the database I described above.
I want to build this master file such that I can change a variable or two in my model, click the "run" button/macro, and see the overall outcome of my strategy in virtual dollar terms.

This kind of backtesting/historical simulation is vital to my work, but I have hit a couple technical difficulties.

What is the best way to reference such a database? Should I use Microsoft access? Should I break down the files into (12 x 1 month) instead of (1 x 1 year)?

I just started learning how to use excel last year, and would appreciate your advice in this matter.


Ben
 

Excel Facts

Shade all formula cells
To shade all formula cells: Home, Find & Select, Formulas to select all formulas. Then apply a light fill color.
As Smitty wrote even Access might be overwhelmed. PowerPivot might do the job but it is available only with Excel 2010 (in Excel 2013 it is available only with certain SKUs.)

If I were in your shoes I'd check what others in your line of work use. I suspect they use software that is designed for statistical analysis, e.g., R.

I'm in the finance industry, and I need some advice.

I have 100gigs of raw historical option price data in .csv format.

It's essentially 10 years worth of daily closing prices for every single option traded in North America.

When I got the data, each .csv file contained a single day's worth of historical prices (Roughly 12 columns by half a million rows in one worksheet, approx. 15mb).

I compiled these into ten seperate workbooks. One for each of the ten years of data. Each workbook(10gb) contained 250 worksheets, and each worksheet stored 12 columns & half a million rows.

I found that this became too computationally intensive, and my 32gb DDR3 i9 3770 computer takes much time even opening one of those files (Roughly 30 mins).

I am conducting statistical analysis on these prices, and wish to build a model that can simulate my models with virtual money. I plan to have one master analytic file which externally references the database I described above.
I want to build this master file such that I can change a variable or two in my model, click the "run" button/macro, and see the overall outcome of my strategy in virtual dollar terms.

This kind of backtesting/historical simulation is vital to my work, but I have hit a couple technical difficulties.

What is the best way to reference such a database? Should I use Microsoft access? Should I break down the files into (12 x 1 month) instead of (1 x 1 year)?

I just started learning how to use excel last year, and would appreciate your advice in this matter.


Ben
 
Upvote 0
Thanks, I am in the process of getting powerpivot and will give it a shot. Appreciate the advice, I'll let ya know how it works.

Cheers!
 
Upvote 0

Forum statistics

Threads
1,214,812
Messages
6,121,702
Members
449,048
Latest member
81jamesacct

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top