Random Normal Distribution [=NORM.INV(RAND(),mean,standard deviation)]

Houman

New Member
Joined
Jun 16, 2011
Messages
13
Hi all,

I am trying to generate random daily (stock market) returns with a normal distribution from annual return and standard deviation figures. Let's say my annual return figure is 15% and my annual standard deviation figure is 30%. Assuming there are 250 trading days in the year, I am calculating my daily mean return as 15%/250 = 0.06% and my daily standard deviation as 30%/SQRT(250)=1.89737%. I then use this formula: =NORM.INV(RAND(),0.0006,0.0189737) to generate 10,000 random returns. If I then take the standard deviation of the generated returns, it is generally within 5% of the target standard deviation, however the mean of the generated numbers varies widely from under -100% to over +100% of the target mean return. Could you help me understand why the average of the generated numbers is not close to the inputted target mean?

For reference, below are the results of 10 different iterations of the 10,000 numbers generated using the above formula (target μ=0.06%, target SD=1.89737%):

Iteration 1Iteration 2Iteration 3Iteration 4Iteration 5Iteration 6Iteration 7Iteration 8Iteration 9Iteration 10
Iteration μ0.12%0.08%-0.01%-0.07%0.14%-0.02%0.04%0.06%0.07%0.11%
Iteration SD1.84%1.88%1.81%1.84%1.81%1.94%1.89%1.94%1.93%1.86%
μ Delta % vs. Target105.75%34.50%-112.22%-213.25%135.24%-137.83%-38.38%3.73%16.73%79.64%
SD Delta % vs. Target-3.14%-0.98%-4.75%-3.06%-4.80%1.98%-0.60%1.99%1.80%-1.95%

<tbody>
</tbody>


Many thanks,
Houman
 
Last edited:

Excel Facts

Save Often
If you start asking yourself if now is a good time to save your Excel workbook, the answer is Yes
I am trying to generate random daily (stock market) returns with a normal distribution from annual return and standard deviation figures. Let's say my annual return figure is 15% and my annual standard deviation figure is 30%. Assuming there are 250 trading days in the year, I am calculating my daily mean return as 15%/250 = 0.06% and my daily standard deviation as 30%/SQRT(250)=1.89737%. I then use this formula: =NORM.INV(RAND(),0.0006,0.0189737) to generate 10,000 random returns. If I then take the standard deviation of the generated returns, it is generally within 5% of the target standard deviation, however the mean of the generated numbers varies widely from under -100% to over +100% of the target mean return. Could you help me understand why the average of the generated numbers is not close to the inputted target mean?

I see nothing wrong with your results, generally. For reference, download "houman norm dist.xls" (click here) [1].

Nitpick: I would use 252 trade days instead of 250. Not only is 252 the correct average, but also it is divisible by 12.

As shown below, given the confidence interval (F10), we expect the sample mean to be within about +/-62% of the target mean 95% of the time. And the sample mean will exceed those limits 5% of the time.

In my own test of 100 trial samples (B2:B10001), the ("orig") sample mean was about -80% to 75% of the target mean. The snapshot below is one such sample. Nevertheless, the ("orig") sample distribution is "almost" normal, as shown by the chart below.

The 100 trials are generated by executing the genTrials macro.

In the uploaded file, the formulas in C2:C10001 demonstrate how to convert the original sample into a sample with exactly the target mean and sd, if you wish. But that might not fit your purpose.


houman_norm_dist1.jpg

Code:
Formulas:
B2:  =NORMINV(A2,$F$1,$F$2)
F1:  =15%/250
F2:  =30%/SQRT(250)
F3:  =AVERAGE(B2:B10001)
F4:  =STDEVP(B2:B10001)
F8:  =F2/SQRT(F7-1)
F10: =TINV(1-F9,F7-1)*F8
G3:  =F3/F1-1
G4:  =F4/F2-1
G10: =F10/F1
H3:  =F3-F1
A2:A10001 are generated by executing the genRand macro (press alt+F8, click on Run).
Copy B2 down through B10001.

houman_norm_dist2.jpg




-----
[1] https://app.box.com/s/pgkut63i4894ck92nwwq0gnvwv0ww39m
 
Last edited:
Upvote 0
Thanks very much for your helpful reply joeu2004 (including the suggestion to use 252 trading days). I had completely overlooked the expected variability around the mean. With the benefit of your explanation I managed to use the below solution (which I am including in case anyone else finds themself in my situation), confident that there is nothing wrong with the random numbers I generated.

My aim was to generate daily returns for various asset classes given annual return and standard deviation figures. Because this is in the context of investments, where risk and return are (or should be) closely related, it was important that my final return was closely aligned to the target parameters, that is I couldn’t have an asset with higher risk have a lower return than an asset with lower risk, however my initial approach often led to this very situation.

As mentioned above, the SD figure was always pretty close to the target, but the mean return varied widely. However my focus should not have been on the mean return, but rather on the final return that, when annualized, would be close to my target input.

I started over this time generating 2,520 random returns (see note below) to represent ten years worth of data. I then calculated the compound return that a $1 investment would have earned had it been invested in this asset class for the whole 10 years [(1*(1+ Day 1 Return)*(1+ Day 2 Return) and son on]. Let’s say the results of this calculation was 2.158925, which is the dollar return over the 10 years, including the original investment. To get the annual return I did 2.158925^(1/10)-1 which gave me 8% annual return, which is far away from the 15% I was hoping for. However with all these calculations already in Excel cells and the RAND() function in place, every time I pressed F9 (recalculate) all the random numbers were recalculated, and within a few tries, I had a data series with a an annual return of 15.1% and an SD of 29.8% which worked perfectly fine for me. I copy paste valued the data series as asset class A, and moved on to my next asset class, this time with an expected annual return of 10% and SD of 15%. Once I had gone down to asset class E, I started testing various optimization strategies on my freshly generated random numbers.

I hope this is helpful to others and thanks again to joeu2004 for the kind guidance.

All the best,
Houman

Note: For the SD I continued to use 0.0189737, however for the mean I decided to use 1.15^(1/252)-1 = 0.000555, which is the daily return that, if compounded for 252 days, would give me the 15% annual return I was looking for.
 
Upvote 0
I started over this time generating 2,520 random returns [...] to represent ten years worth of data. I then calculated the compound return that a $1 investment would have earned had it been invested in this asset class for the whole 10 years [(1*(1+ Day 1 Return)*(1+ Day 2 Return) and son on]. Let’s say the results of this calculation was 2.158925, which is the dollar return over the 10 years, including the original investment. To get the annual return I did 2.158925^(1/10)-1 which gave me 8% annual return, which is far away from the 15% I was hoping for. However with all these calculations already in Excel cells and the RAND() function in place, every time I pressed F9 (recalculate) all the random numbers were recalculated, and within a few tries, I had a data series with a an annual return of 15.1% and an SD of 29.8%

I think there is a more reliable and direct way to ensure that random daily returns have an annual geometric mean of 15% and a geometric sd of 30%.

There are many details to explain, if you are interested. But since I do not know that you are still monitoring this discussion, initially I will just provide an outline of the approach. Feel free to ask for details.

Note: I return to your original objective of generating 40 years of normally-distributed of daily returns. Thus, using 252 trade days per year, there are 10,080 daily returns.

First, we generate random annual returns that are normally-distributed with a geometric mean of exactly 15% and a geometric sd of exactly 30%.

Second, for each year, we generate daily returns that are normally-distributed with a geometric mean that compounds to the annual return for that year.

Download the file "houman norm dist2.xls" (click here) [1].

Since the two-step process might be tedious to repeat for several assets or asset classes, we might implement the details in a macro. But for now, I use Excel formulas to demonstrate the concept.

For the first step, we might have the following design in the 'annual' worksheet:

houman_annual_distrib.jpg

Rich (BB code):
Formulas:
A2:  =RAND()
B2:  =NORMINV(A2,$G$2,$G$3)
C2:  =$G$5*STANDARDIZE(B2,$G$6,$G$7) + $G$4
E2:  =EXP(C2)-1
G4:  =LN(1+G2)
G5:  =LN(1+G3)
G6:  =AVERAGE(B2:B41)
G7:  =STDEVP(B2:B41)
G8:  { =PRODUCT(1+E2:E41) - 1 }
G9:  =(1+G8)^(1/40)-1
G10: { =EXP(STDEVP(LN(1+E2:E41))) - 1 }
H9:  =G9-G2-0
H10: =G10-G3-0
Copy A2:C2 down through A41:C41
Copy E2 down through E41

"-0" in the formulas in H9:H10 thwart a dubious heuristic implemented in Excel whereby Excel
sometimes replaces actual arithmetic results with exactly zero if the result is otherwise
"close enough" to zero.

Although I show =RAND() in column A, I actually prefer to generate the constant RAND results in a macro so they do not change unintentionally.

Note that the formulas in G8 and G10 are array-entered. Do not type the curly braces. Excel will display them when we press ctrl+shift+Enter instead of just Enter.

The normal distribution is a distribution of data around a specified arithmetic mean with a specified arithmetic sd. The (natural) log of the geometric mean and sd of a compounded series of returns over time is the arithmetic mean and sd of the (natural) log of the returns.

So in G4 and G5, we calculate the "log mean" and "log sd" of the target geometric mean and sd (G2 and G3). And in column B, we generate a random sample from the normal distribution of log returns based on G4 and G5.

Since that is a random sample, its mean and sd usually does not exactly match the target log mean and log sd.

So we calculate the original log mean and log sd in G6 and G7. And in column C, we map the original random sample into an exact random sample by the fact that the original z-score = (B2-$G$6)/$G$7 = (C2-$G$4)/$G$5 = the exact z-score.

Finally, in column E, we convert the exact random sample of log returns into the exact random sample of annual returns.

The formulas in G8:G10 demonstrate that the random sample in column E has a compounded total return (G8) with a geometric mean of 15% (G9) and geometric sd of 30% (G10).

The infinitesimal differences in H9:H10 demonstrate that G9:G10 truly are about the same as target parameters in G2:G3.

I suspect you actually want an arithmetic sd of 30%. But that would mean the sd of the log returns is 30%, not the sd of the annual returns. That is a detail that we can fix easily, if you wish.

For the second step, we might have the following design in the 'daily' worksheet:

houman_daily_distrib.jpg

Rich (BB code):
Formulas:
A2:  =RAND()
B2:  =NORMSINV(A2)
C2:  =1 + INT((ROWS($B$2:B2)-1)/252)
D2:  =INDEX($G$4:$AT$4,C2)
      * STANDARDIZE(B2,INDEX($G$7:$AT$7,C2),INDEX($G$8:$AT$8,C2))
      + INDEX($G$5:$AT$5,C2)
E2:  =EXP(D2)-1
G1:AT1: 1 through 40
G2:  =INDEX(annual!$E$2:$E$41,G1)
G3:  =(1+G2)^(1/252) - 1
G4:  =annual!$G$3 / SQRT(252)
G5:  =LN(1+G3)
G6:  =LN(1+G4)
G7:  =AVERAGE(OFFSET($B$2,252*(G1-1),0,252))
G8:  =STDEVP(OFFSET($B$2,252*(G1-1),0,252))
G9:  { =PRODUCT(1+OFFSET($E$2,252*(G1-1),0,252)) - 1 }
G10: { =PRODUCT(1+E2:E10081) - 1 }
G11: =(1+G10)^(1/40) - 1
G12: { =(EXP(STDEVP(LN(1+E2:E10081))) - 1) * SQRT(252) }
Copy A2:E2 down through A10081:E10081
Copy G2:G9 across through AT2:AT9

Again, although I show =RAND() in column A, I actually prefer to generate the constant RAND results in a macro so they do not change unintentionally.

And the formulas in G9, G10 and G12 are array-entered. Do not type the curly braces. Excel will display them when we press ctrl+shift+Enter instead of just Enter.

In G3, the target mean is the daily geometic mean of the target annual return in G2.

In G4, the target sd is intended to be the estimated daily geometric sd, based on the target annual geometric sd. I use your estimate, namely =annual!$G$3/SQRT(252). But I think the square-root-of-time rule applies to the arithmetic sd of the log returns. So the formula should be =EXP(LN(1+annual!$G$3)/SQRT(252)-1.

However, mathematically, we can actually use any daily geometric sd, notwithstanding the often-cited square-root-of-time rule.

In column B, we generate a trial random sample from the normal distribution. We do it differently than the method in the 'annual' worksheet. But the effect is equivalent.

G5:G8 are calculated and used in a manner similar to G4:G7 in the 'annual' worksheet, which is explained above.

And the exact distribution of log returns in column D and the actual daily returns in column E are calculcated in a manner similar to columns C and E in the 'annual' worksheet, which is also explained above.

The formulas in G9:AT9 demonstrate that each year's random sample of daily returns does indeed have a compounded total return that is the same as the random annual return.

And the formulas in G10:G11 demonstrate that the 40-year random sample of daily returns has a compounded total return with an annual geometric mean of 15%.

Note that the annual geometric sd of the compounded total return (G12) is not exactly 30%. I believe that is to be expected. In fact, if we choose a random target geometric sd for each year, G12 will be very different.

The annual geometric sd should be calculated from the random annual returns, not from the random daily returns.


-----
[1] https://app.box.com/s/pgkut63i4894ck92nwwq0gnvwv0ww39m
 
Upvote 0
Hi joeu2004,

Wow, thank you for the additional guidance. I'm definitely still interested. I'll take a few days to go over your post in detail and get back you.

All the best,
Houman
 
Upvote 0
Errata and more food for thought....
[Referring to the 'daily' worksheet....]
In G4, the target sd is intended to be the estimated daily geometric sd, based on the target annual geometric sd. I use your estimate, namely =annual!$G$3/SQRT(252). But I think the square-root-of-time rule applies to the arithmetic sd of the log returns. So the formula should be =EXP(LN(1+annual!$G$3)/SQRT(252)-1.

The latter formula should be:

=EXP(LN(1+annual!$G$3)/SQRT(252)) - 1

And with that, the array-entered formula in G12 would be:

=EXP(STDEVP(LN(1+E2:E10081)) * SQRT(252)) - 1

All sources that I find agree that the square-root-of-time rule applies (if at all) to the sd of log returns, not the returns themselves.

I suspect you actually want an arithmetic sd of 30%. But that would mean the sd of the log returns is 30%, not the sd of the annual returns

We can calculate an arithmetic sd of a distribution with a geometric mean. After all, the sd is just a mathematical formula.

But we cannot use the STDEVP function to do so, since that calculates an arithmetic mean internally. Instead, we must calculate:

=SQRT(SUMPRODUCT((data - mean)^2) / COUNT(data))

However, the mathematical sd might not have the properties we rely on (e.g. the square-root-of-time factor) if the data is not normally distributed around an arithmetic mean.

My aim was to generate daily returns for various asset classes given annual return and standard deviation figures. Because this is in the context of investments, where risk and return are (or should be) closely related, it was important that my final return was closely aligned to the target parameters, that is I couldn’t have an asset with higher risk have a lower return than an asset with lower risk

To that end, it is equally important to retain the correlation factor between assets or asset classes. For example, historically (albeit less so recently), stock prices and bond prices have been negatively correlated.

I cannot take the time now to go into any further detail and to suggest how you might (if we even can) incorporate correlation into your model. I just want to point out a very important factor that seems to be missing.
 
Upvote 0
Hi joeu2004,


I'm trying to download the updated spreadsheet you created (houman norm dist2.xls) but somehow the link takes me to the original spreadsheet (houman norm dist.xls). Could you please post a link to the new one again?


Thanks,
Houman
 
Upvote 0
I'm trying to download the updated spreadsheet you created (houman norm dist2.xls) but somehow the link takes me to the original spreadsheet (houman norm dist.xls). Could you please post a link to the new one again?

Sorry, my bad! For "houman norm dist2.xls", click here [1].

-----
[1] https://app.box.com/s/91ok8xmceo7p7jni890xdc76ss4uuhry
 
Upvote 0
Hi joeu2004,

Thanks very much for the additional post, it is great. I've learnt a lot both conceptually (e.g. geometric SD) and technically (e.g. the { =PRODUCT(1+range)-1 } formula).

Regarding the correlation, I was closing a blind eye to it because I couldn't wrap my head around how to model it in Excel. For an asset class to be validly defined, it has to have a low correlation with other asset classes. Given that the expected correlation of randomly generated sets of numbers is zero, I was willing to go with it. However I would very much like to be able to incorporate correlation into the model too, for example, say, a correlation of -0.3 between stocks and bonds. So I looked around the web and came across the following method: first generate two sequences of uncorrelated normal distributed random numbers (A1…An and B1…Bn) then define a new sequence such that C1 = CORR*A1 + SQRT(1-CORR^2)*B1, through Cn. The new C sequence will have a correlation of CORR with sequence A (I used equal probability random numbers and not normal distributed random numbers to be consistent with your numbers, not sure if this causes a problem). I tried it out, and it works, kind of (please download the updated spreadsheet here and see the newly created ‘correlated daily‘ tab. Column C is your random numbers, and column D is just RAND()). When I use it for the regular version of data (i.e. non-LN), the resulting correlation using the CORREL() function is close, but not equal to the target correlation (CORR) (unless it is set to 1 or -1). However by using Excel solver and setting the actual final correlation as the target cell (e.g. = -0.3) and the initial target correlation (CORR) as the variable cell (GRG), I can get the final correlation I was looking for. With the LN version, I can’t go all the way to -1 or 1 (min =-0.96 and max = 0.98) and at either extreme the scatter plot becomes an S shape (as opposed to the straight line at CORR = 1 or -1 for the non-LN). This tells me something is not right but this might be workable approach for the two sequence (asset class) scenario, the multiple asset class scenario could perhaps be subsequently tackled with the use of a Cholesky or Eigenvector decomposition.

May I ask you please for feedback on the following?

1. In the ‘annual’ tab, cell G12, are we doing { =EXP(STDEVP(LN(1+E2:E41))) - 1 } because by only doing { =STDEVP(LN(1+E2:E41)) } we would get the arithmetic SD of the geometric series and we are interested in the geometric SD?

2. In the ‘daily’ tab we use the =NORMSINV() function (another new thing I learned). You note that it produces equivalent results to the ‘annual’ tab method. Would it in fact be more efficient to use =NORMSINV(RAND()) to get the z-score directly and then mould it into the desired mean and SD through: Periodic Return = z*σ + μ?

3. Related to the previous point, if I calculate the mean and SD of the standardized numbers in column K of the ‘correlated daily’ tab, I get a mean of exactly zero and an SD of exactly one (using the =STANDARDIZE(x, mean , sd) . However in column G using the NORMSINV() function we get -0.0127327 and 1.0041497 for the mean and SD. What drives this difference?

4. A very minor point of clarification: should we be using the sample SD formula (STDEV) or the population SD formula (STDEVP)?

5. In the ‘correlated daily‘ tab, other than generating new random numbers and trying to correlate them, I’m using all the inputs from the ‘annual’ tab but my “annl g mean B” = 22.2% instead of 15.0%. Could you please point out where I have gone wrong?

6. Could you please expand more on square-root-of-time rule and any alternatives to it, why it is expected that the geometric SD of the compounded total return is not exactly 30%, and where we would use the =SQRT(SUMPRODUCT((data - mean)^2) / COUNT(data))?

Also thanks for providing the basic formulas for everything (I saw that you have also provided the random number generator macros). It is easier for me to follow a new concept through traditional formulas.

All the best,
Houman
 
Upvote 0
Hi,

The text label of some of the cells that referred to sequence C had mistakenly been marked as referring to sequence B. The corrected spreadsheet can be downloaded here.

Thanks,
Houman
 
Upvote 0

Forum statistics

Threads
1,214,520
Messages
6,120,003
Members
448,935
Latest member
ijat

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top