Generating a skewed normal distribution of random numbers

Graham.Johnson

New Member
Joined
Jun 29, 2010
Messages
3
Hi,

I need to randomly generate a set of 5000 numbers using just the following three inputs:
>Min
>Mean
>Max
They have to be normally distributed, but as the mean is never exactly half way between the min and max, the distribution will be skewed.

I am using =(NORMSINV(RAND())*0.13)+0.5 to give me 5000 random numbers that are normally distributed with a mean of 0.5 and (almost always) have a min of 0 and a max of 1.

I am then mapping these to the distribution that I require using
If random number < 0.5, f(x)=min+2*rand*(mean-min)
If random number = 0.5, f(x)=mean
If random number > 0.5, f(x)=mean+2*(rand-0.5)*(max-mean)

As expected this is giving me numbers within the predefined bounds, however the mean is way off what I was expecting.

For example, a min of 50,000, mean of 225,000 and a max of 1,000,000 give the following output:
>Min ~ 50,000-60,000 (this is close enough)
>Mean ~ 288,000-290,000 (this is no where near)
>Max ~ 950,000-1,000,000 (this is close enough)

I am not sure if there is a problem with my mapping formula, if it is to do with how I am generating the random numbers in the first place, or something else, but I am now stuck.

If anyone has any experience of this, or can offer any advice, then I would be most grateful!

Many thanks in advance,

Graham
 

Excel Facts

Who is Mr Spreadsheet?
Author John Walkenbach was Mr Spreadsheet until his retirement in June 2019.
Update:
So, I've worked out that because of the skew of the distribution, the "mean" that I have been specifying is actually the mode. When I grouped the random numbers generated in the example and grouped them, the mode is 225,000 or there abouts.

So, all I now need to do, I think(!), is generate a formula that will take my desired mean, and convert it into a mode that when plugged in to the mapping that I stated, give me a set of data with my "desired mean" as the mean...
 
Upvote 0
Generating numbers randomly based on the normal distribution is a common question.<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:eek:ffice:eek:ffice" /><o:p></o:p>
<o:p></o:p>
Random Number based on Normal Distribution
http://www.mrexcel.com/forum/showthread.php?t=465705
Also, this has info on:
Named Formulas for chart ranges
Conditional Formatting Formulas

This has also come up with the lognormal distribution.
Re: Lognormal Distribution being applied to IQR<o:p></o:p>
http://www.mrexcel.com/forum/showthread.php?t=473530<o:p></o:p>
<o:p></o:p>
I am not familiar with your method of making a skewed normal distribution by 'mapping' the numbers from the normal to the skewed normal.<o:p></o:p>
<o:p></o:p>
Here is how I would generate numbers randomly based on the skewed normal distribution.<o:p></o:p>
<o:p></o:p>
Use the Probability Distribution Function (PDF) for the skewed normal distribution, with the layout as shown below and in the above reference.<o:p></o:p>
<o:p></o:p>
=NORMDIST(D5,B$2,B$1,0)*NORMDIST(D5*skew,B$2,B$1,1)*2<o:p></o:p>
<o:p></o:p>
The skew can range from plus or minus infinity and if skew = 0 then it is the plain normal distribution, as shown below for STDin = 4 and AVGin = 0, and 'Select' 100 at 20 points, or bins.<o:p></o:p>
<o:p></o:p>
Input normal PDF<o:p></o:p>
skew = 0<o:p></o:p>


Now change skew from 0 to 3:

Excel Workbook
ABCDEF
1STDin4Start-5STDout2.3928564
2AVGin0End15AVGout3.0526316
3Select100# Bins20NormSum0.9499057
4NormDistNormalizeCumulativeBinCountDraw
58.075E-068.5E-060-501.3157895
60.00018820.00019828.5E-06-3.94736805.5263158
70.00229720.00241830.0002067-2.89473702.3684211
80.01498910.01577960.002625-1.84210501.3157895
90.05416620.05702270.0184045-0.78947458.6842105
100.11509090.12116040.07542730.2631579113.4210526
110.15837980.16673220.19658771.3157895170.2631579
120.16106380.16955760.36331982.3684211237.6315789
130.13765790.14491750.53287753.4210526160.2631579
140.10668070.11230660.67779494.473684283.4210526
150.07680520.08085560.79010155.526315856.5789474
160.05157770.05429770.87095726.578947482.3684211
170.03231850.03402280.92525497.631578934.4736842
180.01889570.01989220.95927778.684210531.3157895
190.01030860.01085220.97916999.736842111.3157895
200.00524760.00552430.990022110.78947404.4736842
210.00249260.0026240.995546411.84210501.3157895
220.00110470.0011630.998170512.89473701.3157895
230.00045690.0004810.999333413.94736802.3684211
240.00017630.00018560.99981441504.4736842
250.2631579
262.3684211
271.3157895
284.4736842
292.3684211
30-0.789474
314.4736842
322.3684211
332.3684211
341.3157895
353.4210526
362.3684211
373.4210526
380.2631579
392.3684211
402.3684211
413.4210526
422.3684211
435.5263158
444.4736842
...

The Cell Formula in Cell F5 is copied down through F104.
Cell Formulas
RangeFormula
A5=NORMDIST(D5,B$2,B$1,0)*NORMDIST(D5*3,B$2,B$1,1)*2
B5=A5/F$3
C5=SUM(B4:B$5)*(ROW()>5)
D5=D$1+(ROW()-5)*(D$2-D$1)/(D$3-1)
E5=COUNTIF(F$5:INDEX(F:F,B$3+4),D5)
F5=VLOOKUP(RAND(),C$5:INDEX(D:D,D$3+4),2)

Cell Formulas
RangeFormula
F1=STDEVP(F5:INDEX(F:F,B3+4))
F2=AVERAGE(F5:INDEX(F:F,B3+4))
F3=SUM(A5:INDEX(A:A,D3+4))

Input skew normal PDF<o:p></o:p>
skew = 3


Here is a set of output numbers with 'Select' = 100


Here is a set of output numbers with 'Select' = 5000


Keep in mind that if you generate a set of numbers at random, the mean of the output will vary with each new set of numbers.
But with a large set of numbers the mean may not change much.


I found that I got strange results, like a cut-off skewed distribution, if AVGin<> 0.

It seems to me that you should be able to create a distribution that you want, and map it by shifting it horizontally to give the desired average.

References for skew normal distribution:
Wikipedia
And also:
http://azzalini.stat.unipd.it/SN/

 
Upvote 0
A standard normal distribution goes, *by definition,* from -infinity to +infinity. Multiplying by 0.13 changes the variability from 1 to 0.13 and adding 0.5 shifts the mean from 0 to 1 but you still get values between +/- infinity!

Using your mapping function, 1/2 of the times (when your normal random number < 0.5) your formula min+2*rand*(mean-min) will generate a *uniformly* distributed random number between min and 2*mean-min. Similarly, you will generate a different random number that too will be uniformly distributed when your first normal random variable is > 0.5

When you generate random numbers from a specified distribution, the distribution represents the population and the resulting numbers represent a sample. The sample statistics (like mean, variance, etc.) are *representative* of the population statistics and the two do not necessarily match. In fact, if they were to match, I would be suspicious of how the sample was created.

There are ways to generate a normally distributed sample with a specified min. and max. It's called a truncated normal. You can search google or bing for more on how to generate random variables from a truncated normal distribution.
Hi,

I need to randomly generate a set of 5000 numbers using just the following three inputs:
>Min
>Mean
>Max
They have to be normally distributed, but as the mean is never exactly half way between the min and max, the distribution will be skewed.

I am using =(NORMSINV(RAND())*0.13)+0.5 to give me 5000 random numbers that are normally distributed with a mean of 0.5 and (almost always) have a min of 0 and a max of 1.

I am then mapping these to the distribution that I require using
If random number < 0.5, f(x)=min+2*rand*(mean-min)
If random number = 0.5, f(x)=mean
If random number > 0.5, f(x)=mean+2*(rand-0.5)*(max-mean)

As expected this is giving me numbers within the predefined bounds, however the mean is way off what I was expecting.

For example, a min of 50,000, mean of 225,000 and a max of 1,000,000 give the following output:
>Min ~ 50,000-60,000 (this is close enough)
>Mean ~ 288,000-290,000 (this is no where near)
>Max ~ 950,000-1,000,000 (this is close enough)

I am not sure if there is a problem with my mapping formula, if it is to do with how I am generating the random numbers in the first place, or something else, but I am now stuck.

If anyone has any experience of this, or can offer any advice, then I would be most grateful!

Many thanks in advance,

Graham
 
Upvote 0
I tried out the following, and the results are good. If you have larger numbers the mean is usually just a few percentage points off.

$B$1 = max value
$B$2 = min value
$B$3 = desired mean

=$B$2+($B$1-$B$2)*POWER(MIN(MAX(-3,STANDNORMINV(RAND())),3)/6+0.5,POWER(($B$1-B$3)/(B$3-$B$2),0.7))

How this works:
- In the middle is the Random Number from excel, evenly distributed
- From this I get the inverse of the standard normal distribution (result from minus to plus infinity)
- this I then reduce to an interval from -3 to +3 by using Excel MIN/MAX
- then I divide by six to get to an interval from -0.5 to +0.5
- (Result #1) then I add 0.5 to get to an interval from 0 to 1
- for the skew I use the second POWER term in where I divide the upper interval from my mean to my max by the lower interval from my min to my mean; the result is the relation between these two intervals
- I tried this out for the mean being somewhere between 25% and 75% of the interval between my max and my min
- (Result #2) the relation is then my base number; from trial-and-error I use 0.7 as the exponent; that brought me a small difference between the desired mean and the mean from the distribution
- (Result #3) finally I use Result #1 as base and Result #2 as exponent in another POWER statement
- this Result #3 I then use to calculate a part of the difference between my max and my min and add this part to my min

Here is how the distribution looks like:
https://docs.google.com/open?id=0Bwe0SO2ta65DOXlHMkFuakM5Rjg

And here are the results if I draw 10000 random numbers.
https://docs.google.com/open?id=0Bwe0SO2ta65DTE5rdlhIZ0R4MWc


Hi,

I need to randomly generate a set of 5000 numbers using just the following three inputs:
>Min
>Mean
>Max
They have to be normally distributed, but as the mean is never exactly half way between the min and max, the distribution will be skewed.

I am using =(NORMSINV(RAND())*0.13)+0.5 to give me 5000 random numbers that are normally distributed with a mean of 0.5 and (almost always) have a min of 0 and a max of 1.

I am then mapping these to the distribution that I require using
If random number < 0.5, f(x)=min+2*rand*(mean-min)
If random number = 0.5, f(x)=mean
If random number > 0.5, f(x)=mean+2*(rand-0.5)*(max-mean)

As expected this is giving me numbers within the predefined bounds, however the mean is way off what I was expecting.

For example, a min of 50,000, mean of 225,000 and a max of 1,000,000 give the following output:
>Min ~ 50,000-60,000 (this is close enough)
>Mean ~ 288,000-290,000 (this is no where near)
>Max ~ 950,000-1,000,000 (this is close enough)

I am not sure if there is a problem with my mapping formula, if it is to do with how I am generating the random numbers in the first place, or something else, but I am now stuck.

If anyone has any experience of this, or can offer any advice, then I would be most grateful!

Many thanks in advance,

Graham
 
Upvote 0

Forum statistics

Threads
1,213,507
Messages
6,114,029
Members
448,543
Latest member
MartinLarkin

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top