Random Normal Distribution [=NORM.INV(RAND(),mean,standard deviation)]

Houman

New Member
Joined
Jun 16, 2011
Messages
13
Hi all,

I am trying to generate random daily (stock market) returns with a normal distribution from annual return and standard deviation figures. Let's say my annual return figure is 15% and my annual standard deviation figure is 30%. Assuming there are 250 trading days in the year, I am calculating my daily mean return as 15%/250 = 0.06% and my daily standard deviation as 30%/SQRT(250)=1.89737%. I then use this formula: =NORM.INV(RAND(),0.0006,0.0189737) to generate 10,000 random returns. If I then take the standard deviation of the generated returns, it is generally within 5% of the target standard deviation, however the mean of the generated numbers varies widely from under -100% to over +100% of the target mean return. Could you help me understand why the average of the generated numbers is not close to the inputted target mean?

For reference, below are the results of 10 different iterations of the 10,000 numbers generated using the above formula (target μ=0.06%, target SD=1.89737%):

Iteration 1Iteration 2Iteration 3Iteration 4Iteration 5Iteration 6Iteration 7Iteration 8Iteration 9Iteration 10
Iteration μ0.12%0.08%-0.01%-0.07%0.14%-0.02%0.04%0.06%0.07%0.11%
Iteration SD1.84%1.88%1.81%1.84%1.81%1.94%1.89%1.94%1.93%1.86%
μ Delta % vs. Target105.75%34.50%-112.22%-213.25%135.24%-137.83%-38.38%3.73%16.73%79.64%
SD Delta % vs. Target-3.14%-0.98%-4.75%-3.06%-4.80%1.98%-0.60%1.99%1.80%-1.95%

<tbody>
</tbody>


Many thanks,
Houman
 
Last edited:
For reference, download "houman norm dist3.xls" (click here) from https://app.box.com/s/131sj90xd1ms66afsx4dffz9ihrz7pw5.

First, I must apologize for the following typos in the original "houman norm dist2.xls" file. They probably caused some confusion. My bad!

1. The formula in annual!B2 should be:
=NORMINV(A2,$G$4,$G$5)
Copy down the column.

Not very important, since any normal distribution will suffice for our purposes. But column B is misleading as-is.

2. The formula in daily!D2 should be:
=INDEX($G$6:$AT$6,C2)*STANDARDIZE(B2,INDEX($G$7:$AT$7,C2),
INDEX($G$8:$AT$8,C2))+INDEX($G$5:$AT$5,C2)
Copy down the column.

This is a critical mistake.

Regarding the correlation, [....] first generate two sequences of uncorrelated normal distributed random numbers (A1…An and B1…Bn) then define a new sequence such that C1 = CORR*A1 + SQRT(1-CORR^2)*B1, through Cn.

But columns A and B are not "normally distributed". Instead, they are uniformly distributed, effectively =RAND().

See the "annual 2" worksheet for an example of how to use the method you describe.

I have never had much success with that method. My notes indicate that it works best when the sd is the same for both uncorrelated normal distributions. Of course, that is not practical. However, I don't know an alternative.

the multiple asset class scenario could perhaps be subsequently tackled with the use of a Cholesky or Eigenvector decomposition.

Yes, I have read about that, too. I'd be interesting in seeing your work with that. I just do sequences of pairwise correlations. Probably not as good.

(And again, without much success in practice.)

1. In the ‘annual’ tab, cell G12, are we doing { =EXP(STDEVP(LN(1+E2:E41))) - 1 } because by only doing { =STDEVP(LN(1+E2:E41)) } we would get the arithmetic SD of the geometric series and we are interested in the geometric SD?

{ =STDEVP(LN(1+E2:E41)) } is the arithmetic SD of the __log_returns__. If that's all you want, that's all you need.

I assumed you wanted an SD to go with the geom mean. In that case, I would use the geom SD of the returns (not log returns), which is { =EXP(STDEVP(LN(1+E2:E41))) - 1 }.

2. In the ‘daily’ tab we use the =NORMSINV() function (another new thing I learned). You note that it produces equivalent results to the ‘annual’ tab method. Would it in fact be more efficient to use =NORMSINV(RAND()) to get the z-score directly and then mould it into the desired mean and SD through:
Periodic Return = z*s + µ?

I used NORMINV(RAND(),mean,sd) in the 'annual' worksheet because I think that is what most people expect, based on help by others in these forums and other web sites.

There is no reason not to use NORMSINV(RAND()) there as well. Theoretically, the internal implementation could be more efficient. But I suspect that any benefit is infinitesimal at best, and probably imperceptible. In fact, for NOMRSINV(RAND()), Excel might simply call the internal code that would evaluate NORMINV(RAND(),0,1).

I think the more significant benefit from reducing the number of formula dependencies.

3. Related to the previous point, if I calculate the mean and SD of the standardized numbers in column K of the ‘correlated daily’ tab, I get a mean of exactly zero and an SD of exactly one (using the =STANDARDIZE(x, mean , sd). However in column G using the NORMSINV() function we get -0.0127327 and 1.0041497 for the mean and SD. What drives this difference?

Remember that NORMSINV(RAND()) is a random sample from the standard normal distribution. It is unlikely that the mean and sd of a random sample is exactly the mean and sd of the population, 0 and 1 in this case.

With =STANDARD(x,mean,sd), which is =1*STANDARD(x,mean,sd)+0, we are mapping the imperfect random sample to a distribution with exactly mean 0 and sd 1. As I wrote in response #4, we rely on the equation:

origZ = (origX-origMean)/origSD = (exactX-targMean)/targSD = exactZ

Solving for exactX = targSD*origZ + targMean

4. A very minor point of clarification: should we be using the sample SD formula (STDEV) or the population SD formula (STDEVP)?

Knowledgeable people might differ on this point. It should not matter much as long as we are consistent.

I use STDEV when we want to estimate the population SD from a sample SD. That is not the case here. Instead, we want to calculate the z-score of a data point; and that relies on the actual SD of the sample distribution, which is STDEVP.

I think it is unfortunate that we call these "sample SD" and "population SD". The latter is just a mathematical measure of dispersion, namely: Sqrt(Sigma((x-mean)^2, i=1,...,n)/n).

5. In the ‘correlated daily‘ tab, other than generating new random numbers and trying to correlate them, I’m using all the inputs from the ‘annual’ tab but my “annl g mean B” = 22.2% instead of 15.0%. Could you please point out where I have gone wrong?

Perhaps because column C isn't a normal distribution in the first place. See the response to Question #1 above.

6. Could you please expand more on square-root-of-time rule and any alternatives to it

The square-root-of-time principle of "volatility" (sd) is a staple in financial math. You might start your reading with http://www.macroption.com/why-is-volatility-proportional-to-square-root-of-time.

Notwithstanding attempts to rationalize it in practical terms, it stems from probability theory that I learned 20 years and long-since forgot. But rather than being an algebraic relationship as it is often described, I suspect it is a limit that "very large" normal distributions "tend toward".

It is easy to discredit. Consider the daily average S&P500 index for 40 years from 1975 through 2014. The sd of the annual returns (or log returns) is about 16.6%, and the actual daily sd is about 0.87%. But based on the square-root-of-time rule, the estimated daily sd is about 16.6%/SQRT(252) = 1.04%, a 20% error.

Moreover, a "mental experiment" can discredit it. The annual sd can be zero (no variance), but the daily sd might be huge. It's the "strobe effect": the daily prices can bounce around wildly, while the year-over-year prices remain consistent because the next year's daily prices bound around with the same wild pattern.

Of course, that is unlikely stochastically. But that's the point: I believe there is a probability associated with the square-root-of-time rule.

We can demonstrate that further by changing daily!G4:AT4 in the new "dist3" Excel file. Instead of ='annual 1'!$G$3/SQRT(252), use =RANDBETWEEN(5,20)/1000 to generate random daily "target g sd" beween 0.5% and 2%. But the annual log returns remain the same by construction; so the sd of the annual log returns are unchanged. QED: there is no direct relationship between the annual sd and daily sd.

why it is expected that the geometric SD of the compounded total return is not exactly 30%

It goes back to the fact that square-root-of-time rule is only an estimate. Again, there is no direct relationship between the daily sd and the annual sd.

where we would use the =SQRT(SUMPRODUCT((data - mean)^2) / COUNT(data))?

... instead of STDEVP. Answer: when "mean" is not the arith mean of the "data".

Perhaps the question is: __why__ would we use the SUMPRODUCT formula instead of STDEVP?

The mathematical formula is simply a measure of dispersion. If "mean" is the geom mean, we __can__ measure the arith sd of the data around the geom mean. I don't recommend it; it is not useful, IMHO. But someone might want to.

Again, the point was simply: STDEVP calculates an arith mean internally; it does not use the geom mean.
 
Last edited:
Upvote 0

Excel Facts

Does the VLOOKUP table have to be sorted?
No! when you are using an exact match, the VLOOKUP table can be in any order. Best-selling items at the top is actually the best.
Clarification....
3. [...] if I calculate the mean and SD of the standardized numbers in column K of the ‘correlated daily’ tab, I get a mean of exactly zero and an SD of exactly one (using the =STANDARDIZE(x, mean , sd) . However in column G using the NORMSINV() function we get -0.0127327 and 1.0041497 for the mean and SD. What drives this difference?

Remember that NORMSINV(RAND()) is a random sample from the standard normal distribution. It is unlikely that the mean and sd of a random sample is exactly [...] 0 and 1 [...].

With =STANDARD(x,mean,sd), which is =1*STANDARD(x,mean,sd)+0, we are mapping the imperfect random sample to a distribution with exactly mean 0 and sd 1. As I wrote in response #4, we rely on the equation:

origZ = (origX - origMean) / origSD = (exactX - targMean) / targSD = exactZ

Solving for exactX = targSD*origZ + targMean

It might have been clearer if I had not algebraically simplified the last equation, and wrote instead:

exactX = targSD*(origX - origMean) / origSD + targMean

The point is, origZ is not the same as NORMSINV(RAND()); origX is. Both are z-scores. They differ because origMean and origSD are not exactly 0 and 1, again since origX is a random sample.
 
Upvote 0

Forum statistics

Threads
1,215,184
Messages
6,123,533
Members
449,106
Latest member
techog

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top