I looked at normal random variables in an earlier post but, what does it mean for a sequence of real-valued random variables to be jointly normal? We could simply require each of them to be normal, but this says very little about their joint distribution and is not much help in handling expressions involving more than one of the at once. In case that the random variables are independent, the following result is a very useful property of the normal distribution. All random variables in this post will be real-valued, except where stated otherwise, and we assume that they are defined with respect to some underlying probability space .
Lemma 1 Linear combinations of independent normal random variables are again normal.
Proof: More precisely, if is a sequence of independent normal random variables and are real numbers, then is normal. Let us suppose that has mean and variance . Then, the characteristic function of Y can be computed using the independence property and the characteristic functions of the individual normals,
where we have set and . This is the characteristic function of a normal random variable with mean and variance . ⬜
The definition of joint normal random variables will include the case of independent normals, so that any linear combination is also normal. We use use this result as the defining property for the general multivariate normal case.
Definition 2 A collection of real-valued random variables is multivariate normal (or joint normal) if and only if all of its finite linear combinations are normal.
Lemma 1 immediately shows that any independent set of normals is joint normal.
Lemma 3 Any collection of independent normal random variables is joint normal.
It also follows quickly from the definition that any collection of linear combinations of joint normals is itself joint normal.
Lemma 4 Let be a joint normal collection of normal random variables. Then, any collection of finite linear combinations of the is again joint normal.
Proof: Any finite linear combination of the is also a finite linear combination of the and, so, is normal. ⬜
A natural question to ask is whether, for a collection of two or more normal random variables, should we expect them to be joint normal? It does not take much consideration to see that this need not be the case, and that joint normality is a very special situation. Consider a pair of random variables X,Y with probability density function , which has two degrees of freedom. Normality of the individual random variables only constrains the one-dimensional marginal distribution functions and . However, as we will see, in addition to the marginals, if X and Y were jointly normal, then their distribution is completely specified by the additional knowledge of their covariance, which is just a single real number. So, given the marginal distributions and covariance, we still have two degrees of freedom remaining to describe the possible joint distributions of X and Y, with only one single distribution corresponding to joint normality. In probability theory, we frequently deal with joint normal collection of random variables, so it is easy to start to believe that this is the usual situation, but it is not. As just explained, there is a lot of freedom in how we can construct counterexamples to demonstrate this. I give an example where the covariance
is zero. For joint normal random variables, this condition is sufficient to ensure that they are independent, so the example also demonstrates how the joint normal property is required here.
Example 1 A pair of standard normal random variables X,Y which have zero covariance, but is not normal.
Although there is considerable flexibity in constructing random variables with properties as in the example, consider the following. Let X be standard normal and set for some fixed positive constant K. The fact that the standard normal distribution is symmetric, , is sufficient to show that Y has the same distribution as X. For any measurable ,
On the other hand, whenever , which has positive probability, yet is not identically equal to 0 and, so, it is not normal. The covariance is computed as,
Since this is continuous in K, tends to -1 as K goes to infinity, and equals 1 at , the intermediate value theorem says that there exists some positive real K making the covariance zero.
Rather than arbitrary collections of real-valued variables, we can also consider random variables taking values in for some integer , in which case they are said to be multivariate normal if and only if their components are. For a random variable , then linear combinations of its components can be expressed in the form for , giving the following definition.
Definition 5 An -valued random variable is multivariate normal (or, joint normal) if its components are joint normal or, equivalently, if is normal for all .
Multivariate normal distributions can be conveniently characterized by their mean and covariance matrix . If X is an -valued random variable with integrable components, then the mean is a vector in and can be written component-wise as . If the components are square-integrable, then the covariance matrix can also be defined as a matrix with components given by,
Similarly, expressing this with matrix algebra,
For vectors and then denotes the matrix with components , and expectation of random matrices is defined component-wise. The space of all real matrices will be denoted by .
Theorem 6 If X is d-dimensional multivariate normal, then there exists a unique and positive semidefinite such that,
These uniquely determine the distribution of X, which will be denoted by . Furthermore, for any , then .
Proof: As the components of X are normal and, hence, square-integrable, the mean and covariance matrix are well-defined. By definition of joint normality, is normal for any and, by linearity of expectations,
So, as required. Finally, as and uniquely determines the distribution of for all , it uniquely determines the distribution of X. See lemma 7 below. ⬜
The proof that the mean and covariance matrix uniquely determines the distribution of a multivariate normal made use of the following simple result concerning the distribution of random d-dimensional vectors. This is not specific to normal distributions and, as it is an interesting result in its own right, I state it as a lemma here.
Lemma 7 Let X be an -valued random variable. Then, its distribution is uniquely determined by the distributions of for all .
Proof: We use the fact that the distribution of X is uniquely determined by its characteristic function
as varies over . This, however, only depends on the distribution of . ⬜
Linear transformations of multivariate normals are themselves multivariate normals. This follows easily from the definition and is just an alternative statement to lemma 4 above.
Lemma 8 Let X be a d-dimensional multivariate normal and, for some , let and . Then, is r-dimensional multivariate normal.
Specifically, if then .
Proof: For any , we have
which, by definition of the multivariate normal, is normal. Hence, Y is multivariate normal.
The mean of Y is, using linearity of expectations,
and, the covariance matrix is given by,
as required. ⬜
The moment generating function of multivariate normals can be computed, generalizing the result for the normal distribution in the earlier post.
Lemma 9 If X has the d-dimensional distribution then is integrable for all and,
Next, for , then where is the real part of . Hence, is integrable. Furthermore, by dominated convergence, the left hand side of (1) is differentiable in with derivative . Hence, by analytic continuation, the fact that (1) holds for means that it also holds for all . ⬜
Taking to have imaginary components in (1) also gives the characteristic function for the multivariate normal as,
We obtain the following simple characterisation of joint normals as those distributions whose log-characteristic function is quadratic, extending corollary 6 of the previous post to the multivariate case. Here, a function will be said to be quadratic if is a linear combination of the monomial terms , , and a constant term.
Lemma 10 An -valued random variable X is joint normal if and only if its characteristic function is of the form , for a quadratic .
Proof: By (2), the characteristic function is of the required form when X is joint normal. Suppose, conversely, that the characteristic function is of the required form. Then, for any fixed then,
for . As is quadratic in , corollary 6 of the previous post implies that is normal and, hence, X is multivariate normal. ⬜
As noted in lemma 3 above, collections of independent normal random variables are joint normal. In particular, we can consider a random vector whose components are independent standard normals. The distribution of X is known as the standard normal on , which can be characterized in several ways.
Lemma 11 For an -valued random variable , the following are equivalent,
- The components are independent standard normal random variables.
- , where is the identity matrix.
- is normal with mean zero and variance for all .
- X has characteristic function .
- X has the probability density,
If either (and then, all) of these conditions hold then we say that X has the standard d-dimensional normal distribution.
Proof: The equivalence of each of the listed statements is straightforward, and can be proved in many ways. For example:
1 ⇒ 2: Lemma 3 says that X is joint normal. As its components have zero mean and unit variance and, by independence, its covariances are for . Hence, it has mean zero and covariance matrix as required.
2 ⇒ 3: By theorem 6, is normal with mean 0 and variance .
3 ⇒ 4: The required identity follows immediately from the characteristic function of the normal random variable .
4 ⇒ 1: This follows immediately from the converse statement which has already been proved, as probability distributions are uniquely determined by their characteristic functions.
1 ⇒ 5: As are independent, each with the standard normal probability density , the joint distribution is given by the product,
5 ⇒ 1: This follows immediately from the converse statement which has already been proved, as probability distributions are uniquely determined by their density functions. ⬜
Lemmas 8 and 11 open up a straightforward method of generating distributed random variables for any and symmetric positive semidefinite , which is useful both in the theory and for implementing practical simulation algorithms. We start by decomposing for some . For example, using functional calculus, we can take , which is the unique symmetric and positive semidefinite matrix satisfying . Alternatively, and more practically, Cholesky decomposition can be used to determine U. Next, let be independent standard normals so that lemma 11 says that the vector has the distribution. Lemma 8 tells us that has the distribution. This is useful for the theory, showing that such distributions do indeed exist.
Theorem 12 For any and positive semidefinite , there exists an -valued random variable (defined on some probability space) with the distribution.
The multivariate normal distribution is sometimes defined by its probability density function, although this does require the covariance matrix to be nonsingular.
Lemma 13 For and positive semidefinite , the distribution has a probability density if and only if C is nonsingular, in which case it is,
over . Here, is the determinant of C.
Proof: Nonsingularity is required since, otherwise, we would have for some nonzero . Then, has variance , so is almost surely constant and does not have a well-defined probability density.
So, supposing that C is nonsingular, we write where and Y has the distribution. This has the distribution and Y has probability density given by lemma 11. Hence, for measurable ,
Making the substitution then and . As we obtain,
as required. ⬜
Let us now move back to the subject of arbitrary collections of joint normal random variables. Theorem 6 can be generalized, stating that the distribution of a joint normal collection of variables is uniquely determined by its means and covariances. Before stating this, let me clarify what is meant by the distribution of such a collection. We could simply consider the finite distributions, which is simply the distribution of each finite subset of the collection. We can be a bit more sophisticated, but it works out the same.
Let us consider a collection of real-valued variables defined on a probability space as a single variable taking values in the space of functions . Specifically, maps to the given by . Next, define the coordinate maps by . Hence, if we use to denote the collection considered as a map , then this is determined by . Next, let be the sigma-algebra generated by the individual coordinates . This is the smallest sigma-algebra making each Borel measurable or, equivalently, is generated by sets of the form for Borel . As is Borel measurable, then is measurable. The distribution of is then the the induced (push-forward) measure on given by
for all measurable . I will simply refer to this as the distribution of the collection . Noting that the collection of finite intersections of sets of the form for and Borel is a pi-system generating , the pi-system lemma states that is uniquely determined by its restriction to such sets. Hence, the distribution of is unqiuely determined by its finite distributions. Explicitly, any two collections of random variables and have the same distribution if and only if and have the same distribution for all finite sequences .
Theorem 14 The distribution of a joint normal collection of random variables is uniquely determined by the means and covariances over .
Proof: Given any finite sequence then the random vector is multivariate normal with means and covariances which, by theorem 6, uniquely determines is distribution. ⬜
For example, Brownian motion is a joint normal process, so its distribution is determined by the covariances.
Example 2 Standard Brownian motion is a continuous stochastic process which is jointly normal with zero mean and covariances,
By definition, for times then the increments
are independent normals and, as the process values are obtained by summing these increments, X is joint normal. By definition, X has zero mean and, for times then has variance s and is independent of giving the covariance,
These means and covariances are sufficient to uniquely determine all finite distributions of Brownian motion, by theorem 14. The additional constraint that its sample paths are continuous need not even be measurable, so is not guaranteed by the finite distributions. For this reason, continuity is stated as an axiom in the definition separately from the distribution of X.
We note that the covariances in theorem 14 are symmetric and, for any finite sequence and then the linear combination is normal with mean and variance . In particular, as variances are nonnegative,
This inequality is expressed by saying that C is positive semidefinite, and is sufficient to guarantee the existence of the joint normal distribution.
Theorem 15 Let and be real numbers such that C is symmetric and positive semidefinite. Then, there exists a joint normal collection of random variables (defined on some probability space) with means and covariances .
Proof: We will take the underlying measurable space to be as described above. Then, for each finite , let be the sigma-algebra generated by over . Theorem 12 guarantees the existence of a unique probability measure on such that are joint normal with means and covariances . For any pair of finite sets then and agree on , since they both restrict to . The Kolmogorov extension theorem then guarantees the existence of a unique probability measure on restricting to on for all finite . Then, under this measure, satisfies the requirements of the theorem since all finite subsets are joint normal with the required means and covariances. ⬜
Theorem 15 can alternatively be expressed using inner product spaces, which as the advantage that positive definiteness is guaranteed. Recall that a semi-inner product on a real vector space V is a map satisfying linearity, symmetry and positiveness,
for all and . To be a true inner product, the positive definite property that is strictly positive for nonzero v should also hold, but this does not matter here, and we consider semi-inner product spaces which consist of a real vector space V together with a semi-inner product.
Theorem 16 Let V be a semi-inner product space. Then there exists a joint normal collection of random variables (defined on some probability space) with zero mean and covariances
Furthermore, this uniquely determines the joint distribution of .
Proof: For any finite sequence and then, by linearity and positivity,
In the statement of theorem 16, it would be natural to require to be linear. This was not done, however, as it is automatic, in an almost sure sense. This gives a second method of characterizing the distribution of which uses linearity, but does not explicitly require joint normality.
Lemma 17 Let V be a semi-inner space. Then, a collection of random variables satisfies the conclusion of theorem 16 if and only if,
- (linearity) almost surely, for all and .
- is normal with mean 0 and variance for all .
Proof: First, suppose that the conclusion of theorem 16 holds. Then, for any , is normal with mean zero and variance as required. Also, for and ,
In particular, as this holds for , , and , then it holds with replaced by , showing that this has zero variance so is equal to zero almost surely.
Conversely, suppose that the properties of the lemma hold. Then, for finite sequences and ,
(almost surely) is normal and, hence, is joint normal. Also, for , the covariances can be computed,
as required. ⬜
As an example of the use of theorem 16, we show how it can be used to construct standard Brownian motion. This approach was applied in the post on the Kolmogorov continuity theorem to construct Brownian motion on a multidimensional time index set, such as the Brownian sheet.
Example 3 Let V be the space with inner product , where is the Lebesgue measure on . Let be a joint normal collection of random variables with zero mean and covariances .
Then, (a continuous version of) is standard Brownian motion. Furthermore,
almost surely, for .
With as above, the covariances can be computed,
Hence, B has the same distribution as standard Brownian motion. Furthermore, (3) holds for functions of the form by construction so, by linearity, it holds for all linear combinations of such functions. As both sides of (3) are isometries on (the left hand side by definition and the right hand side by the Ito isometry), it holds on all .