Multivariate Normal Distributions

I looked at normal random variables in an earlier post but, what does it mean for a sequence of real-valued random variables ${X_1,X_2,\ldots,X_n}$ to be jointly normal? We could simply require each of them to be normal, but this says very little about their joint distribution and is not much help in handling expressions involving more than one of the ${X_i}$ at once. In case that the random variables are independent, the following result is a very useful property of the normal distribution. All random variables in this post will be real-valued, except where stated otherwise, and we assume that they are defined with respect to some underlying probability space ${(\Omega,\mathcal F,{\mathbb P})}$.

Lemma 1 Linear combinations of independent normal random variables are again normal.

Proof: More precisely, if ${X_1,\ldots,X_n}$ is a sequence of independent normal random variables and ${a_1,\ldots,a_n}$ are real numbers, then ${Y=a_1X_1+\cdots+a_nX_n}$ is normal. Let us suppose that ${X_k}$ has mean ${\mu_k}$ and variance ${\sigma_k^2}$. Then, the characteristic function of Y can be computed using the independence property and the characteristic functions of the individual normals,

 \displaystyle \begin{aligned} {\mathbb E}\left[e^{i\lambda Y}\right] &={\mathbb E}\left[\prod_ke^{i\lambda a_k X_k}\right] =\prod_k{\mathbb E}\left[e^{i\lambda a_k X_k}\right]\\ &=\prod_ke^{-\frac12\lambda^2a_k^2\sigma_k^2+i\lambda a_k\mu_k} =e^{-\frac12\lambda^2\sigma^2+i\lambda\mu} \end{aligned}

where we have set ${\mu_k=\sum_ka_k\mu_k}$ and ${\sigma^2=\sum_ka_k^2\sigma_k^2}$. This is the characteristic function of a normal random variable with mean ${\mu}$ and variance ${\sigma^2}$. ⬜

The definition of joint normal random variables will include the case of independent normals, so that any linear combination is also normal. We use use this result as the defining property for the general multivariate normal case.

Definition 2 A collection ${\{X_i\}_{i\in I}}$ of real-valued random variables is multivariate normal (or joint normal) if and only if all of its finite linear combinations are normal.

Lemma 1 immediately shows that any independent set of normals is joint normal.

Lemma 3 Any collection ${\{X_i\}_{i\in I}}$ of independent normal random variables is joint normal.

It also follows quickly from the definition that any collection of linear combinations of joint normals is itself joint normal.

Lemma 4 Let ${\{X_i\}_{i\in I}}$ be a joint normal collection of normal random variables. Then, any collection ${\{Y_j\}_{j\in J}}$ of finite linear combinations of the ${X_i}$ is again joint normal.

Proof: Any finite linear combination of the ${Y_j}$ is also a finite linear combination of the ${X_i}$ and, so, is normal. ⬜

A natural question to ask is whether, for a collection of two or more normal random variables, should we expect them to be joint normal? It does not take much consideration to see that this need not be the case, and that joint normality is a very special situation. Consider a pair of random variables X,Y with probability density function ${p(x,y)}$, which has two degrees of freedom. Normality of the individual random variables only constrains the one-dimensional marginal distribution functions ${\int p(x,y)dy}$ and ${\int p(x,y)dx}$. However, as we will see, in addition to the marginals, if X and Y were jointly normal, then their distribution is completely specified by the additional knowledge of their covariance, which is just a single real number. So, given the marginal distributions and covariance, we still have two degrees of freedom remaining to describe the possible joint distributions of X and Y, with only one single distribution corresponding to joint normality. In probability theory, we frequently deal with joint normal collection of random variables, so it is easy to start to believe that this is the usual situation, but it is not. As just explained, there is a lot of freedom in how we can construct counterexamples to demonstrate this. I give an example where the covariance

 \displaystyle \begin{aligned} {\rm Cov}(X,Y) &={\mathbb E}[(X-{\mathbb E} X)(Y-{\mathbb E} Y)]\\ &={\mathbb E}[XY]-{\mathbb E}[X]{\mathbb E}[Y] \end{aligned}

is zero. For joint normal random variables, this condition is sufficient to ensure that they are independent, so the example also demonstrates how the joint normal property is required here.

Example 1 A pair of standard normal random variables X,Y which have zero covariance, but ${X+Y}$ is not normal.

Although there is considerable flexibity in constructing random variables with properties as in the example, consider the following. Let X be standard normal and set ${Y={\rm sgn}(\lvert X\rvert-K)X}$ for some fixed positive constant K. The fact that the standard normal distribution is symmetric, ${-X\overset d= X}$, is sufficient to show that Y has the same distribution as X. For any measurable ${f\colon{\mathbb R}\rightarrow{\mathbb R}_+}$,

 \displaystyle \begin{aligned} {\mathbb E}[f(Y)] &={\mathbb E}[1_{\{\lvert X\rvert \ge K\}}f(X)]+{\mathbb E}[1_{\{\lvert X\rvert < K\}}f(-X)]\\ &={\mathbb E}[1_{\{\lvert X\rvert \ge K\}}f(X)]+{\mathbb E}[1_{\{\lvert X\rvert < K\}}f(X)]\\ &={\mathbb E}[f(X)]. \end{aligned}

On the other hand, ${X+Y=0}$ whenever ${\lvert X\rvert\le K}$, which has positive probability, yet ${X+Y}$ is not identically equal to 0 and, so, it is not normal. The covariance is computed as,

 $\displaystyle {\mathbb E}[XY]={\mathbb E}[{\rm sgn}(\lvert X\rvert - K)X^2].$

Since this is continuous in K, tends to -1 as K goes to infinity, and equals 1 at ${K=0}$, the intermediate value theorem says that there exists some positive real K making the covariance zero.

Rather than arbitrary collections of real-valued variables, we can also consider random variables taking values in ${{\mathbb R}^d}$ for some integer ${d\ge0}$, in which case they are said to be multivariate normal if and only if their components are. For a random variable ${X=(X_1,\ldots,X_d)}$, then linear combinations of its components ${X_k}$ can be expressed in the form ${a\cdot X}$ for ${a\in{\mathbb R}^d}$, giving the following definition.

Definition 5 An ${{\mathbb R}^d}$-valued random variable ${X=(X_1,X_2,\ldots,X_d)}$ is multivariate normal (or, joint normal) if its components ${X_k}$ are joint normal or, equivalently, if ${a\cdot X}$ is normal for all ${a\in{\mathbb R}^d}$.

Multivariate normal distributions can be conveniently characterized by their mean ${\mu={\mathbb E}[X]}$ and covariance matrix ${C={\rm Cov}(X,X)}$. If X is an ${{\mathbb R}^d}$-valued random variable with integrable components, then the mean is a vector in ${{\mathbb R}^d}$ and can be written component-wise as ${\mu_k={\mathbb E}[X_k]}$. If the components are square-integrable, then the covariance matrix can also be defined as a ${d\times d}$ matrix with components given by,

 \displaystyle \begin{aligned} {\rm Cov}(X,X)_{jk} &={\rm Cov}(X_j,X_k)\\ &={\mathbb E}[X_jX_k]-\mu_j\mu_k. \end{aligned}

Similarly, expressing this with matrix algebra,

 \displaystyle \begin{aligned} {\rm Cov}(X,X) &={\mathbb E}[(X-\mu)(X^T-\mu^T)]\\ &={\mathbb E}[XX^T]-\mu\mu^T. \end{aligned}

For vectors ${x\in{\mathbb R}^d}$ and ${y\in{\mathbb R}^r}$ then ${xy^T}$ denotes the ${d\times r}$ matrix with components ${x_jy_k}$, and expectation of random matrices is defined component-wise. The space of all real ${d\times r}$ matrices will be denoted by ${{\mathbb R}^{d\times r}}$.

Theorem 6 If X is d-dimensional multivariate normal, then there exists a unique ${\mu\in{\mathbb R}^d}$ and positive semidefinite ${C\in{\mathbb R}^{d\times d}}$ such that,

 \displaystyle \begin{aligned} &\mu={\mathbb E}[X],\\ &C={\rm Cov}(X,X). \end{aligned}

These uniquely determine the distribution of X, which will be denoted by ${N(\mu,C)}$. Furthermore, for any ${a\in{\mathbb R}^d}$, then ${a\cdot X\overset d= N(a\cdot\mu,a^TC a)}$.

Proof: As the components of X are normal and, hence, square-integrable, the mean and covariance matrix are well-defined. By definition of joint normality, ${a\cdot X}$ is normal for any ${a\in{\mathbb R}^d}$ and, by linearity of expectations,

 \displaystyle \begin{aligned} {\mathbb E}[a\cdot X] &=a\cdot{\mathbb E}[X]=a\cdot\mu,\\ {\rm Var}(a\cdot X) &={\rm Cov}(a\cdot X,a\cdot X)\\ &={\mathbb E}[a^TXX^Ta]-{\mathbb E}[a^T X]{\mathbb E}[X^Ta]\\ &=a^TCa. \end{aligned}

So, ${X\overset d=N(a\cdot\mu,a^TCa)}$ as required. Finally, as ${\mu}$ and ${C}$ uniquely determines the distribution of ${a\cdot X}$ for all ${a\in{\mathbb R}^d}$, it uniquely determines the distribution of X. See lemma 7 below. ⬜

The proof that the mean and covariance matrix uniquely determines the distribution of a multivariate normal made use of the following simple result concerning the distribution of random d-dimensional vectors. This is not specific to normal distributions and, as it is an interesting result in its own right, I state it as a lemma here.

Lemma 7 Let X be an ${{\mathbb R}^d}$-valued random variable. Then, its distribution is uniquely determined by the distributions of ${a\cdot X}$ for all ${a\in{\mathbb R}^d}$.

Proof: We use the fact that the distribution of X is uniquely determined by its characteristic function

 $\displaystyle a\mapsto{\mathbb E}[e^{ia\cdot X}]$

as ${a}$ varies over ${{\mathbb R}^d}$. This, however, only depends on the distribution of ${a\cdot X}$. ⬜

Linear transformations of multivariate normals are themselves multivariate normals. This follows easily from the definition and is just an alternative statement to lemma 4 above.

Lemma 8 Let X be a d-dimensional multivariate normal and, for some ${r\ge1}$, let ${A\in{\mathbb R}^{r\times d}}$ and ${b\in{\mathbb R}^r}$. Then, ${Y\equiv AX+b}$ is r-dimensional multivariate normal.

Specifically, if ${X\overset d=N(\mu,C)}$ then ${Y\overset d=N(A\mu+b,AC A^T)}$.

Proof: For any ${a\in{\mathbb R}^r}$, we have

 $\displaystyle a\cdot Y=(A^Ta)\cdot X+a\cdot b$

which, by definition of the multivariate normal, is normal. Hence, Y is multivariate normal.

The mean of Y is, using linearity of expectations,

 $\displaystyle {\mathbb E}[Y]=A{\mathbb E}[X]+b=A\mu+b$

and, the covariance matrix is given by,

 \displaystyle \begin{aligned} {\rm Cov}(Y,Y) &={\rm Cov}(AX+b,AX+b)\\ &={\rm Cov}(AX,AX)=A{\rm Cov}(X,X)A^T \end{aligned}

as required. ⬜

The moment generating function of multivariate normals can be computed, generalizing the result for the normal distribution in the earlier post.

Lemma 9 If X has the d-dimensional ${N(\mu,C)}$ distribution then ${\exp(a\cdot X)}$ is integrable for all ${a\in{\mathbb C}^d}$ and,

 $\displaystyle {\mathbb E}\left[e^{a\cdot X}\right]=e^{\frac12 a^TCa+a\cdot \mu}.$ (1)

Proof: For ${a\in{\mathbb R}^n}$, ${a\cdot X}$ is normal with mean ${a\cdot\mu}$ and variance ${a^TCa}$, so (1) follows from theorem 2 of the post on the normal distribution.

Next, for ${a\in{\mathbb C}^n}$, then ${\lvert e^{a\cdot X}\rvert=e^{b\cdot X}}$ where ${b}$ is the real part of ${a}$. Hence, ${e^{a\cdot X}}$ is integrable. Furthermore, by dominated convergence, the left hand side of (1) is differentiable in ${a}$ with derivative ${{\mathbb E}[Xe^{a\cdot X}]}$. Hence, by analytic continuation, the fact that (1) holds for ${a\in{\mathbb R}^d}$ means that it also holds for all ${a\in{\mathbb C}^d}$. ⬜

Taking ${a}$ to have imaginary components in (1) also gives the characteristic function for the multivariate normal as,

 $\displaystyle {\mathbb E}\left[e^{ia\cdot X}\right]=e^{-\frac12a^TCa+ia^T\mu}.$ (2)

We obtain the following simple characterisation of joint normals as those distributions whose log-characteristic function is quadratic, extending corollary 6 of the previous post to the multivariate case. Here, a function ${q\colon{\mathbb R}^d\rightarrow{\mathbb C}}$ will be said to be quadratic if ${q(x)}$ is a linear combination of the monomial terms ${x_jx_k}$, ${x_j}$, and a constant term.

Lemma 10 An ${{\mathbb R}^d}$-valued random variable X is joint normal if and only if its characteristic function is of the form ${{\mathbb E}[e^{ia\cdot X}]=e^{q(a)}}$, for a quadratic ${q(\cdot)}$.

Proof: By (2), the characteristic function is of the required form when X is joint normal. Suppose, conversely, that the characteristic function is of the required form. Then, for any fixed ${a\in{\mathbb R}^d}$ then,

 $\displaystyle {\mathbb E}\left[e^{i\lambda a\cdot X}\right]=e^{q(\lambda a)}$

for ${\lambda\in{\mathbb R}}$. As ${q(\lambda a)}$ is quadratic in ${\lambda}$, corollary 6 of the previous post implies that ${a\cdot X}$ is normal and, hence, X is multivariate normal. ⬜

As noted in lemma 3 above, collections of independent normal random variables are joint normal. In particular, we can consider a random vector ${X=(X_1,\ldots,X_d)}$ whose components are independent standard normals. The distribution of X is known as the standard normal on ${{\mathbb R}^d}$, which can be characterized in several ways.

Lemma 11 For an ${{\mathbb R}^d}$-valued random variable ${X=(X_1,\ldots,X_d)}$, the following are equivalent,

1. The components ${X_k}$ are independent standard normal random variables.
2. ${X\overset d=N(0,I_d)}$, where ${I_d}$ is the ${d\times d}$ identity matrix.
3. ${a\cdot X}$ is normal with mean zero and variance ${\lVert a\rVert^2}$ for all ${a\in{\mathbb R}^d}$.
4. X has characteristic function ${{\mathbb E}[e^{ia\cdot X}]=e^{-\frac12\lVert a\rVert^2}}$.
5. X has the probability density,
 $\displaystyle p(x)=(2\pi)^{-\frac n2}e^{-\frac12\lVert x\rVert^2}$

over ${x\in{\mathbb R}^d}$.

If either (and then, all) of these conditions hold then we say that X has the standard d-dimensional normal distribution.

Proof: The equivalence of each of the listed statements is straightforward, and can be proved in many ways. For example:

1 ⇒ 2: Lemma 3 says that X is joint normal. As its components have zero mean and unit variance and, by independence, its covariances are ${{\mathbb E}[X_jX_k]=0}$ for ${j\not=k}$. Hence, it has mean zero and covariance matrix ${I_d}$ as required.

2 ⇒ 3: By theorem 6, ${a\cdot X}$ is normal with mean 0 and variance ${a^TI_da=\lVert a\rVert^2}$.

3 ⇒ 4: The required identity follows immediately from the characteristic function of the normal random variable ${a\cdot X}$.

4 ⇒ 1: This follows immediately from the converse statement which has already been proved, as probability distributions are uniquely determined by their characteristic functions.

1 ⇒ 5: As ${X_1,\ldots,X_d}$ are independent, each with the standard normal probability density ${\varphi(x)=\frac1{\sqrt{2\pi}}e^{-\frac12x^2}}$, the joint distribution is given by the product,

 $\displaystyle p(x)=\prod_{j=1}^d\varphi(x_j)$

as required.

5 ⇒ 1: This follows immediately from the converse statement which has already been proved, as probability distributions are uniquely determined by their density functions. ⬜

Lemmas 8 and 11 open up a straightforward method of generating ${N(\mu,C)}$ distributed random variables for any ${\mu\in{\mathbb R}^d}$ and symmetric positive semidefinite ${C\in{\mathbb R}^{d\times d}}$, which is useful both in the theory and for implementing practical simulation algorithms. We start by decomposing ${C=UU^T}$ for some ${U\in{\mathbb R}^{d\times d}}$. For example, using functional calculus, we can take ${U=\sqrt{C}}$, which is the unique symmetric and positive semidefinite matrix satisfying ${U^2=C}$. Alternatively, and more practically, Cholesky decomposition can be used to determine U. Next, let ${X_1,X_2,\ldots,X_d}$ be independent standard normals so that lemma 11 says that the vector ${X=(X_1,X_2,\ldots,X_d)}$ has the ${N(0,I_d)}$ distribution. Lemma 8 tells us that ${UX+\mu}$ has the ${N(\mu,C)}$ distribution. This is useful for the theory, showing that such distributions do indeed exist.

Theorem 12 For any ${\mu\in{\mathbb R}^d}$ and positive semidefinite ${C\in{\mathbb R}^{d\times d}}$, there exists an ${{\mathbb R}^d}$-valued random variable (defined on some probability space) with the ${N(\mu,C)}$ distribution.

The multivariate normal distribution is sometimes defined by its probability density function, although this does require the covariance matrix to be nonsingular.

Lemma 13 For ${\mu\in{\mathbb R}^d}$ and positive semidefinite ${C\in{\mathbb R}^{d\times d}}$, the ${N(\mu,C)}$ distribution has a probability density if and only if C is nonsingular, in which case it is,

 $\displaystyle p(x)=(2\pi)^{-\frac n2}\lvert C\rvert^{-\frac12}e^{-\frac12 (x-\mu)^TC^{-1}(x-\mu)}$

over ${x\in{\mathbb R}^d}$. Here, ${\lvert C\rvert}$ is the determinant of C.

Proof: Nonsingularity is required since, otherwise, we would have ${Ca=0}$ for some nonzero ${a\in{\mathbb R}^d}$. Then, ${a\cdot X}$ has variance ${a^TCa=0}$, so is almost surely constant and does not have a well-defined probability density.

So, supposing that C is nonsingular, we write ${X=UY+\mu}$ where ${UU^T=C}$ and Y has the ${N(0,I_d)}$ distribution. This has the ${N(\mu,C)}$ distribution and Y has probability density given by lemma 11. Hence, for measurable ${f\colon{\mathbb R}^d\rightarrow{\mathbb R}_+}$,

 \displaystyle \begin{aligned} {\mathbb E}[f(X)] &={\mathbb E}[f(UY+\mu)]\\ &=(2\pi)^{-\frac n2}\int f(Uy+\mu)e^{-\frac12\lVert y\rVert^2}dy. \end{aligned}

Making the substitution ${x=Uy+\mu}$ then ${dx=\lvert U\rvert dy}$ and ${\lVert y\rVert^2=(x-\mu)^TC^{-1}(x-\mu)}$. As ${\lvert U\rvert=\lvert C\rvert^{1/2}}$ we obtain,

 $\displaystyle {\mathbb E}[f(X)]=(2\pi)^{-\frac n2}\lvert C\rvert^{-\frac12}\int f(y)e^{-\frac12(x-\mu)^TC^{-1}(x-\mu)}dy$

as required. ⬜

Let us now move back to the subject of arbitrary collections of joint normal random variables. Theorem 6 can be generalized, stating that the distribution of a joint normal collection of variables is uniquely determined by its means and covariances. Before stating this, let me clarify what is meant by the distribution of such a collection. We could simply consider the finite distributions, which is simply the distribution of each finite subset of the collection. We can be a bit more sophisticated, but it works out the same.

Let us consider a collection ${\{X_i\}_{i\in I}}$ of real-valued variables defined on a probability space ${(\Omega,\mathcal F,{\mathbb P})}$ as a single variable taking values in the space ${R^I}$ of functions ${z\colon I\rightarrow{\mathbb R}}$. Specifically, ${\omega\in\Omega}$ maps to the ${z\in{\mathbb R}^I}$ given by ${z_i=X_i(\omega)}$. Next, define the coordinate maps ${Z_i\colon{\mathbb R}^I\rightarrow{\mathbb R}}$ by ${Z_i(z)=z_i}$. Hence, if we use ${\bar X}$ to denote the collection ${\{X_i\}_{i\in I}}$ considered as a map ${\Omega\rightarrow{\mathbb R}^I}$, then this is determined by ${Z_i(\bar X)=X_i}$. Next, let ${\mathcal E}$ be the sigma-algebra generated by the individual coordinates ${Z_i}$. This is the smallest sigma-algebra making each ${Z_i}$ Borel measurable or, equivalently, is generated by sets of the form ${Z_i^{-1}(S)}$ for Borel ${S\subseteq{\mathbb R}}$. As ${Z_i(\bar X)=X_i}$ is Borel measurable, then ${\bar X\colon\Omega\rightarrow{\mathbb R}^I}$ is measurable. The distribution of ${\bar X}$ is then the the induced (push-forward) measure ${\mu}$ on ${({\mathbb R}^I,\mathcal E)}$ given by

 $\displaystyle \mu(f)={\mathbb E}[f(\bar Z)]$

for all measurable ${f\colon{\mathbb R}^I\rightarrow{\mathbb R}_+}$. I will simply refer to this as the distribution of the collection ${\{X_i\}_{i\in I}}$. Noting that the collection of finite intersections of sets of the form ${Z_i^{-1}(S)}$ for ${i\in I}$ and Borel ${S\subseteq{\mathbb R}}$ is a pi-system generating ${\mathcal E}$, the pi-system lemma states that ${\mu}$ is uniquely determined by its restriction to such sets. Hence, the distribution of ${\{X_i\}_{i\in I}}$ is unqiuely determined by its finite distributions. Explicitly, any two collections of random variables ${\{X_i\}_{i\in I}}$ and ${\{Y_i\}_{i\in I}}$ have the same distribution if and only if ${(X_{i_1},\ldots,X_{i_n})}$ and ${(Y_{i_1},\ldots,Y_{i_n})}$ have the same distribution for all finite sequences ${i_1,\ldots,i_n\in I}$.

Theorem 14 The distribution of a joint normal collection of random variables ${\{X_i\}_{i\in I}}$ is uniquely determined by the means ${\mu_i={\mathbb E}[X_i]}$ and covariances ${C_{ij}={\rm Cov}(X_i,X_j)}$ over ${i,j\in I}$.

Proof: Given any finite sequence ${i_1,\ldots,i_n\in I}$ then the random vector ${(X_{i_1},\ldots,X_{i_n})}$ is multivariate normal with means ${{\mathbb E}[X_{i_k}]=\mu_{i_k}}$ and covariances ${{\rm Cov}(X_{i_j},X_{i_k})=C_{i_ji_k}}$ which, by theorem 6, uniquely determines is distribution. ⬜

For example, Brownian motion is a joint normal process, so its distribution is determined by the covariances.

Example 2 Standard Brownian motion ${\{X_t\}_{t\in{\mathbb R}_+}}$ is a continuous stochastic process which is jointly normal with zero mean and covariances,

 $\displaystyle {\mathbb E}[X_sX_t]=s\wedge t.$

By definition, for times ${0=t_0\le t_1 \le \cdots \le t_n}$ then the increments

 $\displaystyle X_{t_1}-X_{t_0},\ldots,X_{t_n}-X_{t_{n-1}}$

are independent normals and, as the process values ${X_{t_1},\ldots,X_{t_n}}$ are obtained by summing these increments, X is joint normal. By definition, X has zero mean and, for times ${s\le t}$ then ${X_s}$ has variance s and ${X_t-X_s}$ is independent of ${X_s}$ giving the covariance,

 $\displaystyle {\mathbb E}[X_sX_t]={\mathbb E}[X_s^2]+{\mathbb E}[X_s(X_t-X_s)]=s+0=s\wedge t.$

These means and covariances are sufficient to uniquely determine all finite distributions of Brownian motion, by theorem 14. The additional constraint that its sample paths ${t\mapsto X_t}$ are continuous need not even be measurable, so is not guaranteed by the finite distributions. For this reason, continuity is stated as an axiom in the definition separately from the distribution of X.

We note that the covariances ${C_{ij}}$ in theorem 14 are symmetric and, for any finite sequence ${i_1,\ldots,i_n\in I}$ and ${a\in{\mathbb R}^n}$ then the linear combination ${\sum_{j=1}^n a_jX_{i_j}}$ is normal with mean ${\sum_{j=1}^na_j\mu_{i_j}}$ and variance ${\sum_{j,k=1^n}a_ja_kC_{i_ji_k}}$. In particular, as variances are nonnegative,

 $\displaystyle \sum_{j,k=1}^na_ja_kC_{i_ji_k}\ge0.$

This inequality is expressed by saying that C is positive semidefinite, and is sufficient to guarantee the existence of the joint normal distribution.

Theorem 15 Let ${\{\mu\}_{i\in I}}$ and ${\{C_{ij}\}_{i,j\in I}}$ be real numbers such that C is symmetric and positive semidefinite. Then, there exists a joint normal collection of random variables ${\{X_i\}_{i\in I}}$ (defined on some probability space) with means ${{\mathbb E}[X_i]=\mu_i}$ and covariances ${{\rm Cov}(X_i,X_j)=C_{ij}}$.

Proof: We will take the underlying measurable space to be ${({\mathbb R}^I,\mathcal E)}$ as described above. Then, for each finite ${J\subseteq I}$, let ${\mathcal E_J}$ be the sigma-algebra generated by ${Z_j}$ over ${j\in J}$. Theorem 12 guarantees the existence of a unique probability measure ${\mu_J}$ on ${({\mathbb R}^I,\mathcal E_J)}$ such that ${\{Z_j\}_{j\in J}}$ are joint normal with means ${{\mathbb E}[Z_j]=\mu_j}$ and covariances ${{\rm Cov}(Z_j,Z_k)=C_{jk}}$. For any pair of finite sets ${J,K\subseteq I}$ then ${\mu_J}$ and ${\mu_K}$ agree on ${\mathcal E_J\cap\mathcal E_K=\mathcal E_{J\cap K}}$, since they both restrict to ${\mu_{J\cap K}}$. The Kolmogorov extension theorem then guarantees the existence of a unique probability measure ${\mu}$ on ${({\mathbb R}^I,\mathcal E)}$ restricting to ${\mu_J}$ on ${\mathcal E_J}$ for all finite ${J\subseteq I}$. Then, under this measure, ${\{Z_i\}_{i\in I}}$ satisfies the requirements of the theorem since all finite subsets are joint normal with the required means and covariances. ⬜

Theorem 15 can alternatively be expressed using inner product spaces, which as the advantage that positive definiteness is guaranteed. Recall that a semi-inner product on a real vector space V is a map ${V\times V\rightarrow{\mathbb R}}$ satisfying linearity, symmetry and positiveness,

 \displaystyle \begin{aligned} &\langle u,av+bw\rangle=a\langle u,v\rangle+b\langle u,w\rangle,\\ &\langle u,v\rangle=\langle v,u\rangle,\\ &\langle v,v\rangle\ge0, \end{aligned}

for all ${u,v,w\in V}$ and ${a,b\in{\mathbb R}}$. To be a true inner product, the positive definite property that ${\langle v,v\rangle}$ is strictly positive for nonzero v should also hold, but this does not matter here, and we consider semi-inner product spaces which consist of a real vector space V together with a semi-inner product.

Theorem 16 Let V be a semi-inner product space. Then there exists a joint normal collection of random variables ${\{X(v)\}_{v\in V}}$ (defined on some probability space) with zero mean and covariances

 $\displaystyle {\mathbb E}[X(u)X(v)]=\langle u,v\rangle.$

Furthermore, this uniquely determines the joint distribution of ${\{X(v)\}_{v\in V}}$.

Proof: For any finite sequence ${v_1,\ldots,v_n\in V}$ and ${a\in{\mathbb R}^n}$ then, by linearity and positivity,

 $\displaystyle \sum_{j,k=1}^na_ja_k\langle v_j,v_k\rangle=\left\langle\sum_{j=1}^na_jv_j,\sum_{k=1}^na_kv_k\right\rangle\ge0.$

Hence, the existence of the random variables ${\{X(v)\}_{v\in V}}$ is given by theorem 15 and uniqueness by theorem 14. ⬜

In the statement of theorem 16, it would be natural to require ${v\mapsto X(v)}$ to be linear. This was not done, however, as it is automatic, in an almost sure sense. This gives a second method of characterizing the distribution of ${\{X(v)\}_{v\in V}}$ which uses linearity, but does not explicitly require joint normality.

Lemma 17 Let V be a semi-inner space. Then, a collection ${\{X(v)\}_{v\in V}}$ of random variables satisfies the conclusion of theorem 16 if and only if,

• (linearity) ${X(au+bv)=aX(u)+bX(v)}$ almost surely, for all ${u,v\in V}$ and ${a,b\in{\mathbb R}}$.
• ${X(v)}$ is normal with mean 0 and variance ${\lVert v\rVert^2}$ for all ${v\in V}$.

Proof: First, suppose that the conclusion of theorem 16 holds. Then, for any ${v\in V}$, ${X(v)}$ is normal with mean zero and variance ${\langle v,v\rangle=\lVert v\rvert^2}$ as required. Also, for ${u,v,w\in V}$ and ${a,b\in{\mathbb R}}$,

 \displaystyle \begin{aligned} {\mathbb E}[X(w)(X(au+bv)-aX(u)-bX(v))] &=\langle w,au+bv\rangle-a\langle w,u\rangle-b\langle w,v\rangle\\ &=0. \end{aligned}

In particular, as this holds for ${w=au+bv}$, ${w=u}$, and ${w=v}$, then it holds with ${X(w)}$ replaced by ${X(au+bv)-aX(u)-bX(v)}$, showing that this has zero variance so is equal to zero almost surely.

Conversely, suppose that the properties of the lemma hold. Then, for finite sequences ${v_1,\ldots,v_n\in V}$ and ${a\in{\mathbb R}^n}$,

 $\displaystyle \sum_{j=1}^na_jX(v_j)=X\left(\sum_{j=1}^na_jv_j\right)$

(almost surely) is normal and, hence, ${\{X(v)\}_{v\in V}}$ is joint normal. Also, for ${u,v\in V}$, the covariances can be computed,

 \displaystyle \begin{aligned} 2{\rm Cov}(X(u),X(v)) &={\rm Var}(X(u)+X(v))-{\rm Var}(X(u))-{\rm Var}(X(v))\\ &={\rm Var}(X(u+v))-{\rm Var}(X(u))-{\rm Var}(X(v))\\ &=\lVert u+v\rVert^2-\lVert u\rVert^2-\lVert v\rVert^2\\ &=2\langle u,v\rangle \end{aligned}

as required. ⬜

As an example of the use of theorem 16, we show how it can be used to construct standard Brownian motion. This approach was applied in the post on the Kolmogorov continuity theorem to construct Brownian motion on a multidimensional time index set, such as the Brownian sheet.

Example 3 Let V be the space ${L^2({\mathbb R}_+,\mathcal B({\mathbb R}_+),\lambda)}$ with inner product ${\langle f,g\rangle=\lambda(fg)}$, where ${\lambda}$ is the Lebesgue measure on ${{\mathbb R}_+}$. Let ${\{X(f)\}_{f\in V}}$ be a joint normal collection of random variables with zero mean and covariances ${{\mathbb E}[X(f)X(g)]=\langle f,g\rangle}$.

Then, (a continuous version of) ${B_t=X(1_{[0,t]})}$ is standard Brownian motion. Furthermore,

 $\displaystyle X(f)=\int_0^\infty f(t)dB_t$ (3)

almost surely, for ${f\in V}$.

With ${B_t}$ as above, the covariances can be computed,

 \displaystyle \begin{aligned} {\mathbb E}[B_sB_t] &=\langle1_{[0,s]},1_{0,t]}\rangle\\ &=\int_0^\infty 1_{\{u\le s\}}1_{\{u\le t\}}du\\ &=s\wedge t. \end{aligned}

Hence, B has the same distribution as standard Brownian motion. Furthermore, (3) holds for functions of the form ${f=1_{[0,t]}}$ by construction so, by linearity, it holds for all linear combinations of such functions. As both sides of (3) are isometries on ${f\in V}$ (the left hand side by definition and the right hand side by the Ito isometry), it holds on all ${f\in V}$.