The normal (or Gaussian) distribution is ubiquitous throughout probability theory for various reasons, including the central limit theorem, the fact that it is realistic for many practical applications, and because it satisfies nice properties making it amenable to mathematical manipulation. It is, therefore, one of the first continuous distributions that students encounter at school. As such, it is not something that I have spent much time discussing on this blog, which is usually concerned with more advanced topics. However, there are many nice properties and methods that can be performed with normal distributions, greatly simplifying the manipulation of expressions in which it is involved. While it is usually possible to ignore these, and instead just substitute in the density function and manipulate the resulting integrals, that approach can get very messy. So, I will describe some of the basic results and ideas that I use frequently.
Throughout, I assume the existence of an underlying probability space . Recall that a real-valued random variable X has the standard normal distribution if it has a probability density function given by,
For it to function as a probability density, it is necessary that it integrates to one. While it is not obvious that the normalization factor is the correct value for this to be true, it is the one fact that I state here without proof. Wikipedia does list a couple of proofs, which can be referred to. By symmetry, and have the same distribution, so that they have the same mean and, therefore, .
The derivative of the density function satisfies the useful identity
This allows us to quickly verify that standard normal variables have unit variance, by an application of integration by parts.
Another identity satisfied by the normal density function is,
This enables us to prove the following very useful result. In fact, it is difficult to overstate how helpful this result can be. I make use of it frequently when manipulating expressions involving normal variables, as it significantly simplifies the calculations. It is also easy to remember, and simple to derive if needed.
Theorem 1 Let X be standard normal and be measurable. Then, for all ,
Proof: Using identity (2), we can evaluate as,
Here, the substitution was used. This proves the second line of (3). For the first line, we put in the above to obtain . ⬜
The above result clearly applies more generally to complex valued functions , by linearity. It just needs to be checked that or, equivalently, is integrable in order for the expressions to make sense.
Although (3) is simple enough as it is, I find it easier to understand as two separate statements. The first of these is that, when computing the expected value of the product of and an arbitrary function of X, then it is equal to the product of the expectations. We just need to remember to shift X by an amount inside the second expectation,
The second statement is the identity which, being the moment generating function, completely determines the standard normal distribution just as well as its probability density. In fact, this expression generalizes to complex values of .
Theorem 2 Let X be a standard normal. Then, is integrable for all and,
Proof: For real , the result is given by theorem 1 and, in particular, it shows that is integrable. For complex values, we have which, again, is integrable. By dominated convergence, the left hand side of (4) is differentiable with,
As characteristic functions are defined for all probability distributions, and they uniquely determine the distribution, a particularly common case of (4) is obtained by taking imaginary . This gives the characteristic function of the standard normal,
Although this is true for all complex values of , for the characteristic function it is usually taken to be real.
Another application of the moment generating function (4) is, as the name suggests, to generate the moments. We expand out the exponentials as power series,
Comparing coefficients of powers of gives the moments.
Corollary 3 A standard normal variable X has odd moments equal to zero and even moments
Now, let’s move on and consider normal distributions with arbitrary mean and variance. Let and be real numbers. For a standard normal variable Y, then will have the normal distribution with mean and variance . We denote this distribution as , and will write . For , then X is just equal to the constant value , otherwise for it has a continuous probability density.
Lemma 4 For , the distribution has probability density
Proof: Write for standard normal Y. For nonnegative measurable , the expected value of is given by
as required. Here, I substituted in . ⬜
The manipulations above for standard normal random variables carries across to general normal distributions, without much trouble.
Theorem 5 If X is normal then is integrable for all and,
for all .
Proof: When X is standard normal, this is just the same as (4). However, it remains true if we multiply X by a real value , since it is just the same as multiplying by on both sides of (5). Also, adding a real value to X scales both sides of (5) by , so it remains true and, hence, holds for all normal X. ⬜
This result should be very easy to remember. As a very naive, or first order approximation, we might expect that is well approximated by . This cannot hold, because of the convexity of the exponential and, instead, we just need to remember the adjustment term of half the variance,
This is (5) for and the more general expression is obtained by scaling X by .
Theorem 5 implies the following simple characterisation of normal distributions.
Corollary 6 A real-valued random variable is normal if and only if its characteristic function is of the form for a quadratic .
Proof: First, if X is normal, then its characteristic function is of the stated form by theorem 5. Conversely, suppose that the characteristic function is of the stated form for a quadratic . That is,
Taking gives and, hence, we can take . Then, since flipping the sign of u replaces the left hand side by its complex conjugate, the same holds on the right hand side. From this, we see that a and b are both real. Then, we see from theorem 5 that this is the characteristic function of a normal with mean b and variance a and, hence, X is normal. ⬜
Theorem 1 also carries across in the same way to arbitrary normal random variables.
Theorem 7 Let X be normal and be measurable. Then,
for all .
Proof: For standard normal X, this is just (3). Then, it remains true if we multiply X by a real value , since this is the same as multiplying by and replacing by on both sides of (6). Similarly, it remains true if we add a real value to X, as this is the same as multiplying both sides of (6) by and replacing by . So, it holds for all normal X. ⬜
I actually find this result slightly easier to remember in a modified form. For two random variables X and Y with finite variances, their covariance is defined as
Note that this is bilinear and symmetric in X and Y and that .
Theorem 8 Let X be a normal random variable and for some . Then,
for all measurable functions .
Proof: This is immediate from (6) with , since . ⬜
So, the expectation of the product of and is equal to the product of their expactations, but we need to shift X by its covariance with Y. While it may not seem obvious, identity (7) certainly makes sense. If X andY have positive covariance, then the term will tend to more highly weight larger values of X and apply a lower weight when X is small. So, if we take it out of the expectation then, to compensate, we should shift X by an amount that depends on their covariance. In fact (7) holds much more generally, as it applies to any joint normal random variables X and Y. However, I am not covering joint normality in this post so do not prove this.
I find theorem 8 more intuitive when understood in terms of changes of measure. For a nonnegative random variable Z with mean 1, we can use this as a weighting to define a new probability measure on the same underlying measurable space,
for all measurable sets A. Note that the probability of the whole space under the new measure is , explaining why we require Z to have mean 1. I will write this as , where the weight Z is called the Radon-Nikodym derivative, and is alternatively written as . Expectation with respect to this new measure will be denoted by , and satisfies
for all nonnegative random variables X. If instead we are given a nonnegative random variable Z whose mean is not necessarily equal to 1, but is nonzero and finite, then we can normalize by dividing through by its mean. This defines the new probability measure , which I denote by to avoid having to explicitly write out the normalization every time. Expectation with respect to is then given by
for nonnegative random variables X.
Normal random variables are not themselves nonnegative, so cannot be used directly for measure changes. However, their exponentials, known as lognormal random variables, are nonnegative. I now rewrite theorem 8 in terms of measure changes, which is my preferred form.
Theorem 9 Let X be a normal random variable and for some . Under the probability measure then, X is normal with the same variance as under but with mean .
Proof: Set , which is normal with the same variance as X but with mean . For any measurable function , (7) gives
as required. ⬜
So, when we apply lognormal measure changes, the original normal random variable remains normal with the same variance, but with a shifted mean.
I now apply the ideas described above to the Black-Scholes formula for financial option pricing. I use for the normal distribution function, which is defined by
for a standard normal X.
Example 1 (Black-Scholes formula) Suppose that S is lognormal with mean and nonzero log-variance . Then, for all ,
While (8) is not difficult to prove by writing out the expectation as an integral, and directly applying changes of variables to this, it can get rather messy. Instead, start by expanding into the difference and using linearity of expectations,
where we substituted in the probability measure . If we set then, by definition, this is normal with variance . Also, applying (5),
giving the mean of X as . Hence, for a standard normal Y, and we obtain,
The calculation for is the same except that, now, theorem 9 says that . Replacing by in the equality above gives,
The approach to the Black-Scholes formula here is entirely mathematical, involving the manipulations described above for normal variables. The method, including the use of a measure change, can also described financially in terms of option cashflows Suppose that we want to value a payout at some future time given in terms of a dollar amount. Say, V dollars. Under the forward dollar pricing measure, this is given by an expectation . Now suppose that S represents the FX rate with respect to a foreign currency, such as the euro. That is, S is the future dollar value of one euro. Now consider the value of a future payout of V euros. This will have a value of dollars and, hence, we value it as . In particular, taking gives the forward price , which is the number of dollars we would now agree to exchange, at the future date, for one euro.
Again consider a future payment of V euros. As mentioned above, the dollar value is . The euro value is given by dividing through by the forward F. This is just the same as the expectation , so that is the euro pricing measure. The inverted FX cross is just the value of one dollar in euros, or . The expected value of this in the euro measure is the number of euros we would agree agree to pay at the future date for one dollar which, to be consistent with the above, must be . This can be confirmed mathematically,
Now consider a call option on the FX cross with strike K. This will pay us one euro in exchange for K dollars if, on the future date, we decide to exercise. Converted to dollars, this is an amount of but, as we would only exercise the option if the payout is positive for us, we receive . Hence, the Black-Scholes formula (8) gives the dollar value of this option.
Note that the following two are the same,
- Receive one euro and pay K dollars, if .
- Receive one euro if , and pay K dollars if .
The first of these describes a call option of strike K. The second describes a binary put option on denominated in euros minus K binary call options on S denominated in dollars. So, these two setups have the same value, and identity (9) is just the mathematical expression of this. The dollar binary call has value which, by the manipulations above has value , whereas the euro binary put has value .
It is well understood that FX options have the same volatility from the viewpoint of both foreign and domestic observers, even though they may be using different measures to express it. This is the first part of the statement of theorem 9 above. This means that has the same mean and log-variance under the euro measure as has under , so we have
As , we obtain from by replacing K with and changing the sign.
Note that, in the process of deriving the Black-Scholes formula, we also obtained the following simple result.
Lemma 10 Let S be a lognormal random variable with mean . Then, has the same distribution under the measure as S has under .
Writing this out explicitly gives
for all lognormal random variables S and measurable .
The mean of the absolute value and positive part of a standard normal random variable are straightforward to compute.
Lemma 11 Let X be standard normal. Then, and .
Proof: We apply identity (1) to evaluate ,
Then, by symmetry, . ⬜
Considering the Black-Scholes example again, sometimes traders use an approximate formula which is simple enough to be able to roughly value options in their head, without having to resort to a calculator. This applies to at-the-money options for which the strike K is close to the forward F, and where the log-variance is low enough that the lognormal distribution can be approximated by a normal. This is usually the case for options which are relatively close to their expiration date, implying a small variance.
Example 2 (Simplified Black-Scholes) Let S be normal with mean F and standard deviation . Then,
Proof: We write for a standard normal Y. So, by lemma 11,
By direct calculation, holds to within a relative error of 0.3%. ⬜
Moving on, there are also various expressions which help when looking at quadratic functions of normals. Recall that the gamma distribution with shape parameter (and unit rate parameter) is the nonnegative distribution with probability density proportional to over .
Lemma 12 Let X be standard normal. Then, has the gamma distribution with shape parameter . This has probability density
Proof: For any measurable function , compute the expectation of as,
as required. Here, the substitution was applied. ⬜
A consequence is that we can easily compute all moments of a standard normal, including the noninteger moments, in terms of the gamma function. Recall that this is defined by
Corollary 13 If X is standard normal then, for all with ,
Proof: As has the gamma distribution with parameter 1/2, the expected value of is
as required. ⬜
This result is not immediately obvious, even at since, there, the moment is equal to one and the result is equivalent to the identity . This is indeed satisfied by the gamma function. However, we have not stumbled upon a new way of proving this since, by a simple substitution, it can be seen to be equivalent to the fact that the density function integrates to one, so that is the correct normalization factor, which was assumed above.
For nonnegative even integer values , it is interesting to compare this with the moments given in corollary 3,
In particular, this can only be true if the gamma function at half-integer values satisfies
For , this is the identity discussed above and, for all positive integer n, it follows from the recurrence and induction.
For another consequence of lemma 12, expectations involving a normal variable can always be expressed using a gamma distribution.
Corollary 14 Let X be standard normal. Then, for all measurable ,
where has the gamma distribution of rate 1/2.
Proof: By symmetry,
and, adding these together gives the result. ⬜
The normal density function also satisfies the following simple identity
for all real . We can use this to prove the following result, which is of a similar flavour to theorem 1, except that it involves the square of X.
Theorem 15 If X is standard normal then,
for all measurable and .
Proof: The expectation of can be computed using (10),
as required. Here, the substitution was used. ⬜
Theorem 16 If X is standard normal and , then is integrable if and only if , in which case
Proof: If then
has infinite integral over the real numbers and, hence, is not integrable. On the other hand, for real then, taking in (11) gives (12). For complex values of , we have to be careful which of the square roots to take in (12). We take the one with positive real part, which is standard and is complex differentiable. Hence, as in the proof of theorem 2, analytic continuation implies that (12) holds for all complex values of with real part greater than -1. ⬜
Using real values of gives the moment generating function of and imaginary values gives the characteristic function. By lemma 12, these are the moment generating and characteristic function of the gamma distribution with shape parameter 1/2.
The remaining results given above for standard normals also carry across to the case with arbitrary mean and variance in a straightforward way. For example, theorem 16 extends as follows. This result is a bit less easy to remember than the others, so if needed, I would just derive it in the same way as done here.
Lemma 17 If X is normal with mean and variance then, for , is integrable if and only if , in which case
Proof: As for a standard normal Y then, for real , the expected value of is given by,
The first equality is applying (11) and the second is using (4). Rearranging gives (13). As previously, analytic continuation extends this to all . Letting decrease to , the right hand side of (13) increases to infinity, so is not integrable for . Hence, it is not integrable for . ⬜
Theorem 9 showing that normal variables remain normal under a lognormal change of measure also extends to changes of measure involving the square of a normal.
Theorem 18 If X is normal then it remains normal under the measure given by for any .
As stated, this is a very easy result to remember. The exact distribution of X under the measure change requires also computing its mean and variance. For example, the variance of X under the measure is given by theorem 15 to be and, as the measure change given by theorem 9 does not affect variances, it is the same under ,
Alternatively, the moment generating function under can be computed from (13). Using to denote equality up to a scaling factor independent of u,
Noting that this is the exponential of a quadratic in u, we can read off the mean and variance from the coefficients of and . For example, the coefficient of inside the expectation on the right hand side is giving the mean of X under as,
We see that, under the change of measure in theorem 18, both the mean and variance of X are divided by .