Independence of Normals

A well known fact about joint normally distributed random variables, is that they are independent if and only if their covariance is zero. In one direction, this statement is trivial. Any independent pair of random variables has zero covariance (assuming that they are integrable, so that the covariance has a well-defined value). The strength of the statement is in the other direction. Knowing the value of the covariance does not tell us a lot about the joint distribution so, in the case that they are joint normal, the fact that we can determine independence from this is a rather strong statement.

Theorem 1 A joint normal pair of random variables are independent if and only if their covariance is zero.

Proof: Suppose that X,Y are joint normal, such that {X\overset d= N(\mu_X,\sigma^2_X)} and {Y\overset d=N(\mu_Y,\sigma_Y^2)}, and that their covariance is c. Then, the characteristic function of {(X,Y)} can be computed as

\displaystyle  \begin{aligned} {\mathbb E}\left[e^{iaX+ibY}\right] &=e^{ia\mu_X+ib\mu_Y-\frac12(a^2\sigma_X^2+2abc+b^2\sigma_Y^2)}\\ &=e^{-abc}{\mathbb E}\left[e^{iaX}\right]{\mathbb E}\left[e^{ibY}\right] \end{aligned}

for all {(a,b)\in{\mathbb R}^2}. It is standard that the joint characteristic function of a pair of random variables is equal to the product of their characteristic functions if and only if they are independent which, in this case, corresponds to the covariance c being zero. ⬜

To demonstrate necessity of the joint normality condition, consider the example from the previous post.

Example 1 A pair of standard normal random variables X,Y which have zero covariance, but {X+Y} is not normal.

As their sum is not normal, X and Y cannot be independent. This example was constructed by setting {Y={\rm sgn}(\lvert X\rvert -K)X} for some fixed {K > 0}, which is standard normal whenever X is. As explained in the previous post, the intermediate value theorem ensures that there is a unique value for K making the covariance {{\mathbb E}[XY]} equal to zero.

Theorem 1 generalizes in a straightforward manner to more than two random variables, even for infinitely many.

Theorem 2 Let {\{X_i\}_{i\in I}} be a joint normal collection of random variables. Then, the {X_i} are independent if and only if their covariances are zero.

This result can be established by a straightforward extension of the proof given or theorem 1 above. However, it is also a special case of a more general independence result to be given further below, so I leave the proof until then. Using the equivalence of pairwise independence and zero covariances, theorem 2 can also be expressed without direct reference to covariances.

Theorem 3 Let {\{X_i\}_{i\in I}} be a joint normal collection of random variables. Then, the {X_i} are independent if and only if they are pairwise independent.

Theorem 1 can also be extended in a different direction. Rather than replacing the pair of random variables by an arbitrary collection, all of which have zero covariance, they can instead be replaced by a pair of collections of variables.

Theorem 4 Let {\mathcal U} and {\mathcal V} be collections of random variables such that {\mathcal U\cup\mathcal V} is joint normal. Then, {\mathcal U} and {\mathcal V} are independent if and only if X and Y have zero covariance for all {X\in\mathcal U} and {Y\in\mathcal V}.

Just to clarify, this statement does not mean that random variables in {\mathcal U} are independent from each other but, rather, that the collection {\mathcal U} is independent of {\mathcal V}. As above, the equivalence of zero covariances and pairwise independence stated by theorem 1 allows to write this result without reference to covariances.

Theorem 5 Let {\mathcal U} and {\mathcal V} be collections of random variables such that {\mathcal U\cup\mathcal V} is joint normal. Then, {\mathcal U} and {\mathcal V} are independent if and only if X and Y are independent for all {X\in\mathcal U} and {Y\in\mathcal V}.

Although it is not difficult to generalize the proof of theorem 1 in order to directly prove theorems 4 and 5, they are also just special cases of a more general result which I prove further below. For now, the following example application is very useful in describing a Brownian motion X over the unit interval in terms of its endpoints and an independent Brownian bridge {{\it B}}.

Example 2 (Brownian bridge) Let {\{X_t\}_{t\ge0}} be a standard Brownian motion. Then, the process {B_t\equiv X_t-tX_1} over {t\le1} is independent of {X_t} over {t\ge1}.

This example is a direct application of theorem 4 to the collections {\mathcal U=\{B_t\colon t\le1\}} and {\mathcal V=\{X_t\colon t\ge1\}}. The covariances of the Brownian motion are given by {{\mathbb E}[X_sX_t]=s\wedge t}. Hence, for any {0\le s\le1\le t}, we compute

\displaystyle  {\rm Cov}(B_s,X_t) = {\rm Cov}(X_s,X_t) - s{\rm Cov}(X_1,X_t)=s-s1=0

as required.

The necessity of the joint normal condition in theorems 2, 3, 4, and 5 is demonstrated by the following example.

Example 3 A triple {X_1,X_2,X_3} of standard normals which are pairwise independent (and, hence, pairwise joint normal), but are not all independent (hence, not all joint normal).

This shows that we cannot infer independence from pairwise independence for arbitrary collections of normal random variables, so that the joint normality of {\{X_i\}_{i\in I}} in theorems 2 and 3 is necessary. Furthermore, consider the collections {\mathcal U=\{X_1\}} and {\mathcal V=\{X_2,X_3\}}, both of which are joint normal. Then, X and Y are independent for all {X\in\mathcal U} and {Y\in\mathcal V} even though {\mathcal U} is not independent of {\mathcal V}, demonstrating that joint normality of the union {\mathcal U\cup\mathcal V} is necessary for theorems 4 and 5.

Example 3 can be constructed as follows. Let {Y_1,Y_2,Y_3,\epsilon_1,\epsilon_2} be independent random variables, where the {Y_k} are standard normal and {\epsilon_k} have the Rademacher distribution, {{\mathbb P}(\epsilon_k=1)={\mathbb P}(\epsilon_k=-1)=1/2}. Then, set {\epsilon_3=\epsilon_1\epsilon_2}, which also has the Rademacher distribution. As the {\epsilon_k} are pairwise independent, then the random variables {X_k=\epsilon_k\lvert Y_k\rvert} are also pairwise independent and, by symmetry of the standard normal distribution, are also standard normal. However, the product {\epsilon_1\epsilon_2\epsilon_3} is equal to 1 and, hence,

\displaystyle  X_1X_2X_3=\lvert Y_1Y_2Y_3\rvert > 0

almost surely.

Theorems 1, 2, 3, 4 and 5 are all special cases of the following `master’ theorem.

Theorem 6 Let {\{\mathcal U_i\}_{i\in I}} be a collection of collections of random variables, whose union is joint normal.

Then, they are independent collections if and only if {{\rm Cov}(X,Y)=0} for all {X\in\mathcal U_i} and {Y\in\mathcal U_j} for {i\not=j} in I.

For clarity, I state precisely what the independence property means in this result. For each {i\in I}, let {\mathcal F_i} be the sigma-algebra generated by {\mathcal U_i}, which is just the smallest sigma-algebra on the underlying probability space with respect to which every {X\in\mathcal U_i} is measurable. Then, independence of the collections {\mathcal U_i} is equivalent to independence of these sigma-algebras, so that

\displaystyle  {\mathbb P}(A_1\cap\cdots\cap A_n)={\mathbb P}(A_1)\cdots{\mathbb P}(A_n) (1)

for all finite pairwise distinct sequences {i_1,\ldots,i_n\in I} and {A_k\in\mathcal F_{i_k}}.

We could prove theorem 6 directly by using the characteristic function for multivariate normals along similar lines to 1, although it can get a little messy. I will take an alternative approach and, instead, use theorem 1 proven above to imply the much more general result. This does not require making use of any further properties of joint normals beyond the basic statement that linear combinations of them remain joint normal. To make the leap from theorem 1 to 6, I use the following statement which applies to arbitrary collections of (real-valued) random variables. For any such collection {\mathcal U}, then {{\rm Lin}(\mathcal U)} denotes the collection of finite linear combinations from {\mathcal U}.

Lemma 7 Let {\{\mathcal U_i\}_{i\in I}} be a collection of collections of random variables. The following are equivalent,

  1. {\mathcal U_i} are independent over {i\in I}.
  2. for any finite sequence {i_1,\ldots,i_n\in I} and {j\in I\setminus\{i_1,\ldots,i_n\}}, then each pair {X\in{\rm Lin}(\bigcup\nolimits_{k=1}^n\mathcal U_{i_k})} and {Y\in{\rm Lin}(\mathcal U_j)} are independent.

Proof: The second statement immediately follows from the first, by the definitions, so I just concentrate on the proof that the second statement implies the first. Recall from the definition of independence, we need to show that (1) holds for any finite pairwise distinct sequence {i_1,\ldots,i_n\in I}. We use induction on n, so suppose that the result holds for {n-1}, and set {A=A_1\cap\cdots\cap A_{n-1}} and {B=A_n}. We just need to show that

\displaystyle  {\mathbb P}(A\cap B)={\mathbb P}(A){\mathbb P}(B) (2)

since, by the induction hypothesis, (1) follows immediately from this. We show that (2) holds for all A and B in the sigma-algebras generated by {\mathcal U=\mathcal U_{i_1}\cup\cdots\cup\mathcal U_{i_{n-1}}} and {\mathcal U_{i_n}} respectively. Letting {\mathcal A} and {\mathcal B} be the union of the sigma-algebras generated by each finite subset of, respectively, {\mathcal U} and {\mathcal U_{i_n}}, these are pi-systems and, by the pi-system lemma, it is sufficient to prove the result for {A\in\mathcal A} and {B\in\mathcal B}. Hence, we suppose that A and B are, respectively, in the sigma-algebras of finite sub-collections

\displaystyle  \begin{aligned} &\left\{Y_1,\dots,Y_r\right\}\subseteq\mathcal U,\\ &\left\{Z_1,\dots,Z_s\right\}\subseteq\mathcal U_{i_n}. \end{aligned}

The characteristic function of the random vector {(Y_1,\ldots,Y_r,Z_1,\ldots,Z_s)} is computed by

\displaystyle  {\mathbb E}\left[\exp\left(i(a\cdot Y+b\cdot Z)\right)\right] ={\mathbb E}\left[\exp\left(ia\cdot Y\right)\right]{\mathbb E}\left[\exp\left(ib\cdot Z\right)\right]

for any {a\in{\mathbb R}^r} and {b\in{\mathbb R}^s}, where {Y=(Y_1,\ldots,Y_r)} and {Z=(Z_1,\ldots,Z_s)}. Here, independence of {a\cdot Y\in{\rm Lin}(\mathcal U)} and {b\cdot Z\in{\rm Lin}(\mathcal U_{i_n})} was used, and is guaranteed by the second statement of the lemma. Since the distribution of a random vector is uniquely determined by its characteristic function, this shows that Y and Z are independent. Hence, A and B are independent, giving (1) as required. ⬜

Finally, I use this to give a proof of the main independence result.

Proof of theorem 6: Choose a finite sequence {i_1,\ldots,i_n\in I} and {j\in I\setminus\{i_1,\ldots,i_n\}}. Setting {\mathcal U=\mathcal U_{i_1}\cup\cdots\cup\mathcal U_{i_n}}, by lemma 7 we just need to show that every {X\in{\rm Lin}(\mathcal U)} and {Y\in{\rm Lin}(\mathcal U_j)} are independent. By assumption, we know that every {X\in\mathcal U} and {Y\in\mathcal U_j} have zero covariance. By linearity of the covariance, this immediately extends to all {X\in{\rm Lin}(\mathcal U)} and {Y\in{\rm Lin}(\mathcal U_j)} which, by theorem 1, are therefore independent, as required. ⬜

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s