# The Gaussian Correlation Inequality

When I first created this blog, the subject of my initial post was the Gaussian correlation conjecture. Using ${\mu_n}$ to denote the standard n-dimensional Gaussian probability measure, the conjecture states that the inequality

$\displaystyle \mu_n(A\cap B)\ge\mu_n(A)\mu_n(B)$

holds for all symmetric convex subsets A and B of ${{\mathbb R}^n}$. By symmetric, we mean symmetric about the origin, so that ${-x}$ is in A if and only ${x}$ is in A, and similarly for B. The standard Gaussian measure by definition has zero mean and covariance matrix equal to the nxn identity matrix, so that

$\displaystyle d\mu_n(x)=(2\pi)^{-n/2}e^{-\frac12x^Tx}\,dx,$

with ${dx}$ denoting the Lebesgue integral on ${{\mathbb R}^n}$. However, if it holds for the standard Gaussian measure, then the inequality can also be shown to hold for any centered (i.e., zero mean) Gaussian measure.

At the time of my original post, the Gaussian correlation conjecture was an unsolved mathematical problem, originally arising in the 1950s and formulated in its modern form in the 1970s. However, in the period since that post, the conjecture has been solved! A proof was published by Thomas Royen in 2014 [7]. This seems to have taken some time to come to the notice of much of the mathematical community. In December 2015, Rafał Latała, and Dariusz Matlak published a simplified version of Royen’s proof [4]. Although the original proof by Royen was already simple enough, it did consider a generalisation of the conjecture to a kind of multivariate gamma distribution. The exposition by Latała and Matlak ignores this generality and adds in some intermediate lemmas in order to improve readability and accessibility. Since then, the result has become widely known and, recently, has even been reported in the popular press [10,11]. There is an interesting article on Royen’s discovery of his proof at Quanta Magazine [12] including the background information that Royen was a 67 year old German retiree who supposedly came up with the idea while brushing his teeth one morning. Dick Lipton and Ken Regan have recently written about the history and eventual solution of the conjecture on their blog [5]. As it has now been shown to be true, I will stop referring to the result as a conjecture’ and, instead, use the common alternative name — the Gaussian correlation inequality.

In this post, I will describe some equivalent formulations of the Gaussian correlation inequality, or GCI for short, before describing a general method of attacking this problem which has worked for earlier proofs of special cases. I will then describe Royen’s proof and we will see that it uses the same ideas, but with some key differences.

Probably the most surprising aspect of Royen’s proof is its simplicity. Although it does involve some clever algebraic manipulations, no particularly complex mathematics is required to understand it. It seems incredible that a result which has been outstanding for so long has such a simple solution which has eluded all previous attempts. A common approach to the problem is to look at a local version of it and, by taking derivatives, replace it with a more easily manageable statement which I describe below in inequality (11). This local’ inequality has successfully been used before to prove special cases of GCI, such the 2-dimensional version. Royen’s approach to the problem is similar and, although he does not rely on inequality (11), we will see that (11) does follow indirectly from his proof. So, it seems that the methods which have been used long before Royen published his paper were already along the correct lines but, previously, no-one had quite managed to get it to work. It is intriguing that nobody has produced a direct proof of inequality (11) — to my knowledge, at least.

There are various different, but similar, formulations of GCI, and one of the key steps in Royen’s solution is to start with the correct formulation. I will first describe some of the alternative formulations of the inequality. As above, using ${\mu_n}$ to denote the standard Gaussian measure on ${{\mathbb R}^n}$, GCI states that

 $\displaystyle \mu_n\left(A\cap B\right)\ge\mu_n(A)\mu_n(B).$ (1)

for any symmetric convex sets ${A,B\subseteq{\mathbb R}^n}$. In one dimension this is trivial, since the sets ${A,B}$ will be symmetric intervals, so that one contains the other. So, for ${n=1}$, (1) holds with the right hand side replaced by the larger quantity ${\min(\mu_n(A),\mu_n(B))}$. In dimension ${n=2}$, the inequality is already rather difficult to establish, and a proof was first published in 1977 by Pitt [6]. For dimension ${n \ge3}$, no proof of (1) was known prior to Royen’s 2014 paper.

An alternative formulation can be given in terms of probabilities of multidimensional Gaussian random variables lying in specified sets. If ${X,Y}$ are zero-mean jointly Gaussian random vectors taking values in ${{\mathbb R}^m}$ and ${{\mathbb R}^n}$ respectively then,

 $\displaystyle {\mathbb P}\left(X\in A, Y\in B\right)\ge{\mathbb P}\left(X\in A\right){\mathbb P}\left(Y\in B\right).$ (2)

for any symmetric convex sets ${A\subseteq{\mathbb R}^m}$ and ${B\subseteq{\mathbb R}^n}$. This is the form in which the inequality was stated by Da Gupta et al [1], in 1972, in one of the earliest statements of the general conjecture. Inequality (2) is a clear generalization of (1), and reduces to it in the special case with ${m=n}$ and where ${X=Y}$ are standard Gaussian (i.e., with probability measure ${\mu_n}$). It can be shown without much trouble that (1) and (2) are equivalent statements. I will leave the proof of this for now, and show that the various formulations of GCI are equivalent in a moment. Dividing through by ${{\mathbb P}(Y\in B)}$, inequality (2) can be expressed in terms of conditional probabilities,

$\displaystyle {\mathbb P}\left(X\in A\vert Y\in B\right)\ge{\mathbb P}\left(X\in A\right).$

So, regardless of the correlations between X and Y, the knowledge that Y is in B increases the probability of finding X in A.

GCI can also be expressed using expectations rather than probabilities. A function ${f\colon{\mathbb R}^n\rightarrow{\mathbb R}^+}$ is quasiconcave if for all ${a\in{\mathbb R}^+}$ the set ${\{x\in{\mathbb R}^n\colon f(x)\ge a\}}$ is convex. Equivalently,

$\displaystyle f\left(\lambda x+(1-\lambda)y\right)\ge\min\left(f(x),f(y)\right)$

for all ${x,y\in{\mathbb R}^n}$ and ${0 < \lambda < 1}$. For example, indicator functions of convex sets are quasiconcave. If, as above, X and Y are zero-mean jointly Gaussian random vectors taking values in ${{\mathbb R}^m}$ and ${{\mathbb R}^n}$ respectively, then

 $\displaystyle {\mathbb E}\left[f(X)g(Y)\right]\ge{\mathbb E}\left[f(X)\right]{\mathbb E}\left[g(Y)\right]$ (3)

for all symmetric quasiconcave functions ${f\colon{\mathbb R}^m\rightarrow{\mathbb R}^+}$ and ${g\colon{\mathbb R}^n\rightarrow{\mathbb R}^+}$. For the specific case where ${f=1_A}$ and ${g=1_B}$ are indicator functions of sets A and B, (3) reduces to (2). The formulation of GCI given by (3) is expressed by the statement that ${f(X)}$ and ${g(Y)}$ have nonnegative covariance.

The final formulation of GCI which I will describe here is expressed in terms of the probabilities that a sequence of Gaussian random variables lie in specified symmetric intervals of ${{\mathbb R}}$. If ${X_1,\ldots,X_n}$ are zero-mean jointly Gaussian real random variables then,

 $\displaystyle \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle&{\mathbb P}\left(\lvert X_1\rvert\le 1,\lvert X_2\rvert\le 1,\ldots,\lvert X_n\rvert\le 1\right)\smallskip\\ &\qquad\ge{\mathbb P}\left(\lvert X_1\rvert\le 1,\ldots,\lvert X_k\rvert\le 1\right){\mathbb P}\left(\lvert X_{k+1}\rvert\le 1,\ldots,\lvert X_n\rvert\le 1\right) \end{array}$ (4)

for ${1\le k\le n}$. This formulation is a special case of (2) applied to the sets ${A\subseteq{\mathbb R}^k}$ and ${B\subseteq{\mathbb R}^{n-k}}$,

 $\displaystyle \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle&A=\left\{x\in{\mathbb R}^k\colon \lvert x_i\rvert\le 1,\ i=1,\ldots,k\right\},\\ \displaystyle&B=\left\{y\in{\mathbb R}^{n-k}\colon \lvert y_i\rvert\le 1,\ i=1,\ldots,n-k\right\}. \end{array}$ (5)

The formulation given by (4) is central to Royen’s proof, but has also been a common form in which the inequality has been stated ever since it was first formulated. The case with ${k=1}$ was proven in 1967 by both Khatri [3] and Sidak [8]. There is one minor technicality which I glossed over in the statements above. I merely required A and B to be symmetric and convex sets but, for the probabilities to be well-defined, the sets should be measurable. Often the additional requirement that they are closed sets is used, ensuring Borel measurability. However, it is not difficult to show that for any symmetric convex set A with closure ${\bar A}$, the difference ${\bar A\setminus A}$ is contained in a set of zero probability with respect to a centered Gaussian measure. Hence, A is already guaranteed to be measurable under the completion of the measure. Furthermore, statements (1) and (2) are unchanged if A and B are replaced by their closures ${\bar A}$ and ${\bar B}$. Similarly, in (3), I did not require the functions ${f,g}$ to be measurable but, by quasiconcavity, they are guaranteed to be measurable under the completion of the probability measure.

As promised, we show that each of the alternative formulations of GCI given above are equivalent.

Lemma 1 Each of the forms of GCI given by (1), (2), (3) and (4) are equivalent.

To be precise, here we mean that each of the inequalities are equivalent when stated in their full generality. That is, if any one holds in all dimensions m and n and for all pairs of symmetric convex sets (or quasiconcave functions), then all of the other statements hold.

Equivalence of (1) and (2): It was already noted above that inequality (1) is a special case of (2), so just the reverse implication remains to be shown. That is, assuming that inequality (1) holds, we need to prove (2). To do this, use the standard result that any centered multidimensional Gaussian vector can be expressed as a linear function of a standard Gaussian vector. In our case, this means that there is a random vector Z taking values in ${{\mathbb R}^k}$ (for some k), with the standard Gaussian distribution ${\mu_k}$, and such that ${X=MZ}$ and ${Y=NZ}$ for mxk and nxk matrices M, N. Then,

$\displaystyle \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle {\mathbb P}\left(X\in A,Y\in B\right) &= {\mathbb P}\left(Z\in M^{-1}A, Z\in N^{-1}B\right)\smallskip\\ &=\mu_k\left(M^{-1}A\cap N^{-1}B\right). \end{array}$

Applying inequality (1),

$\displaystyle \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle {\mathbb P}\left(X\in A,Y\in B\right) &\ge \mu_k\left(M^{-1}A\right)\mu_k\left(N^{-1}B\right)\smallskip\\ &={\mathbb P}\left(X\in A\right){\mathbb P}\left(Y\in B\right), \end{array}$

as required.

Equivalence of (2) and (3): As (2) is the special case of (3) where ${f,g}$ are indicator functions of symmetric convex sets, we only need to show that inequality (2) implies (3). We can decompose ${f}$ and ${g}$ as integrals of indicator functions of symmetric convex sets,

 $\displaystyle \setlength\arraycolsep{2pt} \begin{array}{rl} &\displaystyle f(X)=\int_0^\infty1_{\{f(X)\ge x\}}\,dx,\smallskip\\ &\displaystyle g(Y)=\int_0^\infty1_{\{g(Y)\ge y\}}\,dy. \end{array}$ (6)

Multiplying these, taking expectations, and using Fubini’s theorem to commute the integrals with the expectation,

$\displaystyle {\mathbb E}\left[f(X)g(Y)\right]=\int_0^\infty\int_0^\infty{\mathbb P}\left(f(X)\ge x, g(Y)\ge y\right)\,dxdy$

Applying inequality (2),

$\displaystyle \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle{\mathbb E}\left[f(X)g(Y)\right] &\displaystyle\ge\int_0^\infty\int_0^\infty{\mathbb P}\left(f(X)\ge x\right){\mathbb P}\left(g(Y)\ge y\right)\,dxdy\smallskip\\ &\displaystyle={\mathbb E}\left[f(X)\right]{\mathbb E}\left[g(Y)\right] \end{array}$

as required.

Equivalence of (2) and (4): It was noted above that (4) is a special case of (2) where the sets A and B are of the form given in equation (5). It only remains to show that inequality (2) follows from the assumption that (4) holds.

Consider the case where ${A\subseteq{\mathbb R}^m}$ and ${B\subseteq{\mathbb R}^n}$ are symmetric convex polytopes. These are convex sets of the form

 $\displaystyle \setlength\arraycolsep{2pt} \begin{array}{rl} &\displaystyle A=\left\{x\in{\mathbb R}^m\colon \lvert a_i\cdot x\rvert\le 1, i=1,\ldots,r\right\},\smallskip\\ &\displaystyle B=\left\{y\in{\mathbb R}^n\colon \lvert b_j\cdot y\rvert\le 1, j=1,\ldots,s\right\}, \end{array}$ (7)

for some ${r,s\in{\mathbb N}}$, ${a_i\in{\mathbb R}^m}$ and ${b_j\in{\mathbb R}^n}$. Defining the joint Gaussian random variables ${\tilde X_i=a_i\cdot X}$ and ${\tilde Y_j=b_j\cdot Y}$,

$\displaystyle {\mathbb P}\left(X\in A,Y\in B\right)={\mathbb P}\left(\lvert\tilde X_i\rvert\le 1, \lvert\tilde Y_j\rvert\le 1, i=1,\ldots,r, j=1,\ldots,s\right).$

Applying inequality (4) to this,

$\displaystyle \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle {\mathbb P}\left(X\in A,Y\in B\right)&\displaystyle\ge{\mathbb P}\left(\lvert\tilde X_i\rvert\le 1, i=1,\ldots,r\right){\mathbb P}\left(\lvert\tilde Y_j\rvert\le 1, j=1,\ldots,s\right)\smallskip\\ &\displaystyle={\mathbb P}\left(X\in A\right){\mathbb P}\left(Y\in B\right). \end{array}$

This proves (2) for symmetric convex polytopes. It is a consequence of the hyperplane separation theorem that every nonempty symmetric closed convex set is the intersection of a countable sequence of strips of the form ${\{x\colon\lvert a\cdot x\rvert\le 1\}}$ and, hence, is the limit of a decreasing sequence of symmetric convex polytopes. This proves (2) for symmetric closed convex sets. Finally, as was mentioned above, inequality (2) is unchanged if the sets A and B are replaced by their closures, extending (2) to all symmetric convex sets.

#### The Local Approach

One method of attacking the Gaussian correlation inequality is to note that both sides of the inequality can be expressed as the values of some function evaluated at different points of a connected topological space. If it can shown that we can move from one of the points to the other, along a continuous path in the space on which the function is monotonic, then GCI would follow. I will describe this approach now.

Let us start with a pair of independent n-dimensional random vectors X and Y, each with the standard Gaussian distribution ${\mu_n}$. Inequality (1) can be written as

 $\displaystyle {\mathbb P}(X\in A, X\in B)\ge{\mathbb P}(X\in A,Y\in B)$ (8)

for symmetric convex ${A,B\subseteq{\mathbb R}^n}$. If we can continuously transform the pair ${(X,Y)}$ into ${(X,X)}$ in such a way that the probability of being in ${A\times B}$ is increasing, then (8) will follow. A straightforward method is to define

$\displaystyle Y(t)=tX+\sqrt{1-t^2}Y$

over ${0\le t\le 1}$. Then, ${Y(t)}$ has the standard Gaussian distribution for each t with, ${Y(0)=Y}$ and ${Y(1)=X}$. Note that the covariance matrix

$\displaystyle {\rm Covar}(X,Y(t))={\mathbb E}\left[XY(t)^T\right]$

is just ${tI}$. Throughout this post I use the term increasing’ in the non-strict sense, equivalent to nondecreasing’.

Conjecture C1 For any symmetric convex sets ${A,B\subseteq{\mathbb R}^n}$, the probability

 $\displaystyle {\mathbb P}\left(X\in A,Y(t)\in B\right)$ (9)

is increasing in t, over the range ${0\le t\le 1}$.

If true, conjecture C1 implies (8) and, hence, the Gaussian correlation inequality. Due to Sidak [9], this conjecture is known to be true if A is a symmetric slab of the form

$\displaystyle A=\left\{x\in{\mathbb R}^n\colon \lvert a\cdot x\rvert\le1\right\}$

for some ${a\in{\mathbb R}^n}$ and, from Pitt [6], is true in dimension ${n=2}$. Although this conjecture is not used by Royen in his 2014 proof of GCI [7], we will see that it does follow from his argument.

One way of directly trying to prove conjecture C1 is to differentiate (9) with respect to t and show that the result is positive. This is more easily done if the indicator functions ${1_A}$, ${1_B}$ are replaced by differentiable functions. If ${f,g\colon{\mathbb R}^n\rightarrow{\mathbb R}}$ are continuously differentiable with bounded derivative, a straightforward application of integration by parts gives

 $\displaystyle \frac{d}{dt}{\mathbb E}\left[f(X)g(Y(t))\right]=\sum_{i=1}^n{\mathbb E}\left[f_{,i}(X)g_{,i}(Y(t))\right].$ (10)

Here ${f_{,i}(x)}$ denotes the partial derivative with respect to the i‘th component of x. By approximating with smooth functions, it can be seen that (10) holds for arbitrary Lipschitz continuous ${f}$, ${g}$ which, by Radamacher’s theorem, is sufficient to guarantee that they are differentiable almost everywhere. We want to show that (10) is non-negative. As I will show in a moment, if it always holds at ${t=1}$ then it will also hold at ${t < 1}$. As X has distribution ${\mu_n}$, we arrive at the following conjecture.

Conjecture C2 Let ${f,g\colon{\mathbb R}^n\rightarrow{\mathbb R}^+}$ be symmetric quasiconcave and Lipschitz continuous functions. Then,

 $\displaystyle \mu_n\left(\nabla f\cdot\nabla g\right)\ge0$ (11)

where ${\mu_n}$ is the standard Gaussian measure on ${{\mathbb R}^n}$.

Inequality (11) was used by Pitt to prove GCI in two dimensions, and appears as Theorem 1 of his 1977 paper [6]. In fact, it is shown that if ${\phi(x)=\phi(\lVert x\rVert)}$ is a nonnegative decreasing function of ${\lVert x\rVert}$ then

 $\displaystyle \int_{{\mathbb R}^2}\nabla f(x)\cdot\nabla g(x)\phi(x)\,dx\ge0.$ (12)

The special case with ${\phi(x)}$ equal to the Gaussian density is equivalent to inequality (11). As stated by Pitt, his proof of (12) does not extend to greater than 2 dimensions but, if it could be extended to ${{\mathbb R}^n}$ then GCI would follow. It seems unlikely that it extends in this generality to three or more dimensions, unless ${\phi}$ is restricted to being the Gaussian density. Although Royen does not consider this inequality, we will see that (11) is a consequence of his proof. It is interesting that no-one has produced a direct proof of conjecture C2, which could then be used as an alternative proof of GCI.

Conjecture C1 can be stated in a more general form, which I will do now. Letting X and Y be centered Gaussian random vectors of dimension m and n, use ${C_1}$ and ${C_2}$ to denote their respective covariance matrices. Consider the (m+n)x(m+n) matrices of the form

$\displaystyle C_Q=\left(\begin{array}{cc} C_1 & Q \smallskip\\ Q^T & C_2 \end{array}\right).$

Here, Q is any mxn matrix. When ${C_Q}$ is positive semidefinite, let ${{\mathbb P}_Q}$ denote a probability measure for which ${(X,Y)}$ is a centered Gaussian random vector with correlation matrix ${C_Q}$. Inequality (2) can be written as,

 $\displaystyle \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle{\mathbb P}_Q\left(X\in A,Y\in B\right)&\displaystyle\ge{\mathbb P}_Q\left(X\in A\right){\mathbb P}_Q\left(Y\in B\right)\smallskip\\ &\displaystyle={\mathbb P}_0\left(X\in A\right){\mathbb P}_0\left(Y\in B\right)\smallskip\\ &\displaystyle={\mathbb P}_0\left(X\in A,Y\in B\right). \end{array}$ (13)

The first equality here follows because the vectors X and Y have covariance matrices ${C_1}$ and ${C_2}$ respectively, so their distribution does not depend on Q. Hence, Q can be replaced by the zero matrix. The second equality holds because, under ${{\mathbb P}_0}$, X and Y are independent. This shows that GCI is equivalent to stating that

$\displaystyle Q\mapsto{\mathbb P}_Q\left(X\in A,Y\in B\right)$

has a global minimum at ${Q=0}$.

Inequality (13) would be proved if we can find a continuous curve, ${t\mapsto Q(t)}$ joining 0 to an arbitrary ${Q}$ such that ${C_{Q(t)}}$ is positive semidefinite and ${{\mathbb P}_{Q(t)}(X\in A,Y\in B)}$ is increasing in t. A similar approach was, in effect, used by Hargé [2] in 1999 to prove GCI in arbitrary dimensions for the case where A is a symmetric ellipsoid so,

$\displaystyle A=\left\{x\in{\mathbb R}^n\colon x^TMx\le1\right\}$

for a positive definite mxm matrix M. However, Hargé used a curve specific to the ellipsoid under consideration,

$\displaystyle Q(t)=e^{-M^{-1}t}Q.$

This has the limits ${Q(0)=Q}$ and ${Q(\infty)=0}$, and it can be shown that ${{\mathbb P}_{Q(t)}(X\in A,Y\in B)}$ is decreasing in t.

In the general case, the only canonical choice of curve joining 0 to Q is the line segment ${Q(t)=tQ}$. So, under ${{\mathbb P}_t\equiv{\mathbb P}_{Q(t)}}$ the centered Gaussian vector ${(X,Y)}$ has covariance matrix

 $\displaystyle C(t)=\left(\begin{array}{cc} C_1 & tQ \smallskip\\ tQ^T & C_2 \end{array}\right).$ (14)

We state the following conjecture.

Conjecture C3 For any symmetric convex sets ${A\subseteq{\mathbb R}^m}$, ${B\subseteq{\mathbb R}^n}$, the probability

 $\displaystyle {\mathbb P}_t\left(X\in A,Y\in B\right)$ (15)

is increasing in t over the range ${0\le t\le 1}$.

This has a more general appearance than conjecture C1, but is easily shown to be equivalent. Sidak [9] proved this conjecture for ${m=1}$ and Pitt [6] proved the case with ${m=2}$. Royen’s proof of the Gaussian correlation inequality proceeds by proving C3 for rectangular sets, of the form specified above in (5).

Lemma 2 Conjectures C1, C2 and C3 are equivalent, and imply the Gaussian correlation inequality.

Proof: To be precise, in the statement of this lemma, we mean that if any of the conjectures holds in full generality (i.e., in any number of dimensions) then they all hold. It was already explained above how C1 implies GCI, so I will concentrate on showing equivalence of the three conjectures.

If C1 holds then, representing quasiconcave functions ${f}$ and ${g}$ as integrals over indicator functions as in (6) and commuting the expectation with the integrals,

$\displaystyle {\mathbb E}\left[f(X)g(Y(t))\right]=\int_0^\infty\int_0^\infty{\mathbb P}\left(f(X)\ge x,g(Y(t))\ge y\right)\,dxdy.$

So, the expectation is increasing in t. If f and g are Lipschitz continuous then, the derivative (10) with respect to t is nonnegative and, setting ${t=1}$, gives conjecture C2.

Now, suppose that C2 holds. Let ${X,Y}$ be independent standard Gaussian random vectors of dimension n and ${f,g\colon{\mathbb R}^n\rightarrow{\mathbb R}^+}$ be symmetric, quasiconcave and Lipschitz continuous. Then, ${Z=(X,Y)}$ is a 2n-dimensional standard Gaussian random vector. For a fixed ${0\le t\le1}$, defining functions ${\tilde f,\tilde g\colon{\mathbb R}^{2n}\rightarrow{\mathbb R}^+}$,

$\displaystyle \setlength\arraycolsep{2pt} \begin{array}{rl} &\displaystyle \tilde f(Z) = f(X),\smallskip\\ &\displaystyle \tilde g(Z)=g(tX+\sqrt{1-t^2}Y). \end{array}$

Then, by C2,

$\displaystyle \sum_{i=1}^n{\mathbb E}\left[f_{,i}(X)g_{,i}(tX+\sqrt{1-t^2}Y)\right] =\sum_{i=1}^{2n}{\mathbb E}\left[\tilde f_{,i}(Z)\tilde g_{,i}(Z)\right]\ge0.$

From (10), this is the derivative of ${{\mathbb E}[f(X)g(Y(t))]}$, which is therefore increasing in t.

Consider symmetric convex sets ${A,B\subseteq{\mathbb R}^n}$. Using ${d_A(x)}$ to represent the minimum distance of x from the points of A, we can define symmetric quasiconcave and Lipschitz continuous functions

$\displaystyle \setlength\arraycolsep{2pt} \begin{array}{rl} &\displaystyle f_n(x)=\max(1-n d_A(x),0),\smallskip\\ &\displaystyle g_n(x)=\max(1-n d_B(x),0), \end{array}$

for each integer n. Then, ${{\mathbb E}[f_n(X)g_n(Y(t))]}$ is increasing in t. Taking the limit as n goes to infinity, ${f_n}$ and ${g_n}$ tend to ${1_{\bar A}}$ and ${1_{\bar B}}$ respectively. Dominated convergence implies that ${{\mathbb P}(X\in \bar A,Y(t)\in\bar B)}$ is increasing in t. As ${\bar A\setminus A}$ and ${\bar B\setminus B}$ have zero Lebesgue measure, this proves conjecture C1.

Conjecture C1 is just the special case of C3 with ${m=n}$ and ${C_1=C_2=Q}$ being the identity matrix. So, it only remains to show that conjecture C3 follows from C1.

Suppose that X, Y are centered jointly Gaussian random vectors with dimensions m and n respectively, and such that ${(X,Y)}$ has the covariance matrix ${C_Q}$. Then, we can write ${X=M\tilde X}$ and ${Y=N\tilde X}$ for some p-dimensional standard Gaussian random vector ${\tilde X}$. Enlarging the probability space if necessary, we can let ${\tilde Y}$ be another p-dimensional standard Gaussian vector independent from ${\tilde X}$. Setting

$\displaystyle \tilde Y(t)=t\tilde X(t)+\sqrt{1-t^2}\tilde Y$

the covariance matrix is

$\displaystyle \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle {\rm Cov}(M\tilde X,N\tilde Y(t))&\displaystyle=t{\rm Cov}(M\tilde X,N\tilde X)\smallskip\\ &\displaystyle=t{\rm Cov}(X,Y)=tQ. \end{array}$

Therefore, ${(M\tilde X,N\tilde Y(t))}$ has the same probability distribution as ${(X,Y)}$ has under the measure ${{\mathbb P}_t}$. For symmetric convex sets ${A\subseteq{\mathbb R}^m}$ and ${B\subseteq{\mathbb R}^n}$,

$\displaystyle {\mathbb P}_t\left(X\in A,Y\in B\right)= {\mathbb P}\left(\tilde X\in M^{-1}A,\tilde Y(t)\in N^{-1}B\right).$

Conjecture C1 states that the right hand side of this equality is increasing in t and, hence, conjecture C3 follows. ⬜

#### Royen’s Proof of the Gaussian Correlation Inequality

I now describe Royen’s proof of the Gaussian correlation inequality. The formulation best suited to this approach is inequality (4) above,

 $\displaystyle {\mathbb P}\left(\max_{1\le i\le n}\lvert X_i\rvert\le1\right)\ge {\mathbb P}\left(\max_{1\le i\le k}\lvert X_i\rvert\le1\right) {\mathbb P}\left(\max_{k < i\le n}\lvert X_i\rvert\le1\right)$ (16)

for an n-dimensional centered Gaussian random vector X and any ${1\le k\le n}$.

The idea behind the proof is then along similar lines to that described in the `local approach’ above. We continuously vary the covariance matrix of X in order to transform the right hand side of (16) into the probability on the left. By differentiating, we aim to show that this gives an increasing function, and (16) will follow.

Laplace transforms will be used to compute the derivatives of the probability as the covariance matrix is varied. Key to this approach is to look at the transforms of the squares, ${X_i^2}$, of the components of X rather than of X itself. The identity

 $\displaystyle {\mathbb E}\left[\exp\left(-X^TAX/2\right)\right]=\lvert1+CA\rvert^{-1/2}$ (17)

holds for any nxn positive semidefinite matrix A, where C is the covariance matrix and ${\lvert\cdot\rvert}$ denotes the determinant. This identity can be computed directly by expressing the expectation in terms of an integral over the Gaussian density.

Defining the random vector Z taking values in ${{\mathbb R}_+^n}$ by ${Z_i=X_i^2/2}$, (17) gives the Laplace transform

 $\displaystyle \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle {\mathbb E}\left[\exp\left(-\lambda\cdot Z\right)\right]&={\mathbb E}\left[\exp\left(-X^T\Lambda X/2\right)\right]\smallskip\\ &=\lvert1+C\Lambda\rvert^{-1/2}. \end{array}$ (18)

Here, ${\lambda\in{\mathbb R}_+^n}$ and ${\Lambda={\rm diag}(\lambda_1,\ldots,\lambda_n)}$ is the diagonal matrix formed from the components of ${\lambda}$. To handle the determinant in (18), we will make use of the identity

 $\displaystyle \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle\lvert1+C\Lambda\rvert&\displaystyle=1+\sum_{\emptyset\not=J\subseteq[n]}\lvert(C\Lambda)_J\rvert\smallskip\\ &\displaystyle=1+\sum_{\emptyset\not=J\subseteq[n]}\lvert C_J\rvert \lambda^J. \end{array}$ (19)

The summation is over all finite nonempty subsets J of ${[n]=\{1,2,\ldots,n\}}$, ${\lambda^J}$ denotes the product ${\prod_{i\in J}\lambda_i}$ and ${C_J}$ is the submatrix of C consisting of the elements with row and column indices in J. So, ${\lvert C_J\rvert}$ are the principal minors of C. The first equality in (19) is a classical identity, and follows from expanding out the Leibnitz formula for the determinant of ${I+C\Lambda}$, and the second equality uses the fact that the determinant of ${\Lambda_J}$ is ${\lambda^J}$.

Let us now consider the covariance matrix to be a differentiable function of time, ${C=C(t)}$. Then, the Laplace transform (18) can be differentiated,

 $\displaystyle \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle \frac{d}{dt} {\mathbb E}\left[\exp\left(-\lambda\cdot Z\right)\right]&\displaystyle=-\frac12\lvert1+C\Lambda\rvert^{-3/2}\frac{d}{dt}\lvert1+C\Lambda\rvert\smallskip\\ &\displaystyle=-\frac12\lvert1+C\Lambda\rvert^{-3/2}\sum_{\emptyset\not=J\subseteq[n]}\frac{d}{dt}\lvert C_J\rvert\lambda^J. \end{array}$ (20)

Now, an important step in Royen’s proof of the Gaussian correlation inequality is to note that the right hand side of (20) can also be expressed in terms of a Laplace transform, but with respect to a different distribution than that used for Z. In fact, ${\lvert1+C\Lambda\rvert^{-3/2}}$ is itself a Laplace transform. To show this, for any positive integer d, let ${Y_1,\ldots,Y_d}$ be independent centered Gaussian random vectors each with covariance matrix C. Define the n-dimensional random vector ${\tilde Z_i=\sum_{j=1}^dY_{j,i}^2/2}$. Expressing ${\tilde Z}$ as the sum of the independent random vectors ${\tilde Z^{(j)}_i=Y_{j,i}^2/2}$, each of which has the same distribution as Z above, its Laplace transform can be computed,

 $\displaystyle \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle {\mathbb E}\left[\exp\left(-\lambda\cdot \tilde Z\right)\right] &\displaystyle={\mathbb E}\left[\prod_{j=1}^d\exp\left(-\lambda\cdot \tilde Z^{(j)}\right)\right]\smallskip\\ &\displaystyle=\prod_{j=1}^d \lvert1+C\Lambda\rvert^{-1/2}\smallskip\\ &\displaystyle=\lvert1+C\Lambda\rvert^{-d/2}. \end{array}$ (21)

Royen refers to this as an n-variate gamma distribution, which I will denote by ${\Gamma(\frac d2,C)}$. It is just the same thing as the diagonal elements of a matrix with the Wishart distribution, scaled by a factor of 1/2.

Putting the above calculations together, we can now compute how the expectations of functions of Z vary as the covariance matrix C is varied. This is stated in Lemma 3 below. I will deviate slightly from Royen’s argument here. Whereas he expresses the time derivative of the probability density of Z in terms of the ${\Gamma(\frac32,C)}$ probability density, I instead look at the derivatives of the expectation of a smooth function. This avoids some technicalities, such as showing that ${\Gamma(\frac32,C)}$ has smooth probability densities and having to restrict to nonsingular covariance matrices.

Equation (22) below can be compared with (10) above. They both express the derivative of the expectation of a function of our random variables in terms of an expectation over partial derivatives of the function. The difference is that now, we look at expectations of a function of ${Z_i=X_i^2/2}$ rather than of X directly, and the expectation on the right hand side is with respect to a different probability measure from the original one. Equation (22) does have a more complicated form than (10), involving coefficients ${c_J}$ which are defined in terms of derivatives of the minors of C. However, these coefficients will be nonnegative in the situation concerning us here, which is all that matters. In the following, I use ${(-\partial)_J}$ to denote the mixed partial derivatives

$\displaystyle (-\partial)_J=\prod_{j\in J}(-\partial/\partial z_j).$

Lemma 3 Let the positive semidefinite matrix ${C=C(t)}$ be continuously differentiable in parameter t, and X be a centered Gaussian random vector with covariance matrix ${C(t)}$ under the measure ${{\mathbb P}_t}$, and set ${Z_i=X_i^2/2}$. Then, for any smooth ${f\colon{\mathbb R}^n\rightarrow{\mathbb R}}$ with compact support, ${{\mathbb E}_t[f(Z)]}$ is continuously differentiable with,

 $\displaystyle \frac{d}{dt}{\mathbb E}_t\left[f(Z)\right]=\sum_{\emptyset\not=J\subseteq[n]} c_J\tilde{\mathbb E}_t\left[(-\partial)_Jf(\tilde Z)\right].$ (22)

Here, ${\tilde{\mathbb E}_t}$ represents expectation under a probability measure for which ${\tilde Z}$ has the ${\Gamma(\frac32,C(t))}$ distribution, and ${c_J\in{\mathbb R}}$ are the coefficients

 $\displaystyle c_J=-\frac12\frac{d}{dt}\lvert C(t)_J\rvert.$ (23)

Proof: In the case where f is of the form ${f(z)=\exp(-\lambda\cdot z)}$ for some fixed ${\lambda\in{\mathbb R}^n_+}$, we have ${(-\partial)_Jf=\lambda^Jf}$ and, hence, (22) is given by substituting (21) for ${d=3}$ into (20) above. The general case then follows by using Laplace or Fourier transform inversion, or by approximating f by linear combinations of functions of the form ${\exp(\lambda\cdot x)}$.

I demonstrate how Fourier transforms allow us to extend to the general case of (22). By analytic continuation, the above can be extended to imaginary values of ${\lambda}$. So, if we set ${g(x,\lambda)=\exp(i\lambda\cdot x)}$ then (22) will hold with ${f(x)=g(x,\lambda)}$ for fixed ${\lambda\in{\mathbb R}^n}$. If f is smooth of compact support, as in the statement of the lemma, then it can be expressed as

$\displaystyle f(z)=\int_{{\mathbb R}^n}\hat f(\lambda)g(z,\lambda)\,d\lambda$

where the Fourier transform ${\hat f}$ is in the Schwartz space of rapidily decreasing functions on ${{\mathbb R}^n}$. Fubini’s theorem allows us to commute the integral with expectation, and by dominated convergence the partial derivatives commute with the integral, giving

$\displaystyle \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle \frac{d}{dt}{\mathbb E}_t[f(Z)] &\displaystyle=\frac{d}{dt}\int_{{\mathbb R}^n}\hat f(\lambda){\mathbb E}_t[g(Z,\lambda)]\,d\lambda\smallskip\\ &\displaystyle=\sum_{\emptyset\not=J\subseteq[n]}c_J\int_{{\mathbb R}^n}\hat f(\lambda)\tilde{\mathbb E}_t[(-\partial)_Jg(\tilde Z,\lambda)]\,d\lambda\smallskip\\ &\displaystyle=\sum_{\emptyset\not=J\subseteq[n]}c_J\tilde{\mathbb E}_t[(-\partial)_Jf(\tilde Z)]. \end{array}$

Boundedness and continuity of ${d{\mathbb E}_t[g(Z,\lambda)]/dt}$ follows from (20) so, by dominated convergence, we see that ${{\mathbb E}_t[f(Z)]}$ is continuously differentiable. ⬜

As in the explanation of the local approach to the correlation conjecture above, we consider covariance matrices ${C=C(t)}$ which are linearly interpolated between the full covariance matrix of X at ${t=1}$ and the case at ${t=0}$ making the first k components of X independent from the final ${n-k}$ components. Fortunately, in this situation, the coefficients ${c_J}$ introduced in Lemma 3 are always nonnegative.

Lemma 4 Suppose that covariance matrix ${C(t)}$ is of the form (14),

$\displaystyle C(t)=\left(\begin{array}{cc} C_1 & tQ \smallskip\\ tQ^T & C_2 \end{array}\right).$

for rxr matrix ${C_1}$, sxs matrix ${C_2}$ and rxs matrix ${Q}$, with ${n=r+s}$. It is assumed that C is positive semidefinite over ${0\le t\le 1}$. Then, the coefficients ${c_J}$ defined by (23) are nonnegative.

Proof: For any ${J\subseteq[n]}$, ${C(t)_J}$ can be written as

$\displaystyle C(t)_J=\left(\begin{array}{cc} \tilde C_1 & t\tilde Q \smallskip\\ t\tilde Q^T & \tilde C_2 \end{array}\right).$

where ${\tilde C_1,\tilde C_2,\tilde Q}$ are submatrices of ${C_1,C_2,Q}$ respectively. For the moment, suppose that ${C(t)}$ is strictly positive definite, implying that ${\tilde C_1}$ and ${\tilde C_2}$ are positive definite and, in particular, are invertible. Decompose

$\displaystyle C(t)_J= \left(\begin{array}{cc} \tilde C_1^{1/2} & 0 \smallskip\\ 0 & \tilde C_2^{1/2} \end{array}\right)\left(\begin{array}{cc} I & tR \smallskip\\ tR^T & I \end{array}\right)\left(\begin{array}{cc} \tilde C_1^{1/2} & 0 \smallskip\\ 0 & \tilde C_2^{1/2} \end{array}\right)$

where ${R}$ is the matrix ${\tilde C_1^{-1/2}\tilde Q\tilde C_2^{-1/2}}$. Using the block matrix determinant formula,

$\displaystyle \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle \lvert C(t)_J\rvert&\displaystyle=\lvert \tilde C_1\rvert\,\lvert\tilde C_2\rvert\,\lvert 1-t^2R^TR\rvert\smallskip\\ &\displaystyle=\lvert \tilde C_1\rvert\,\lvert\tilde C_2\rvert\,\prod_\alpha(1-\alpha t^2) \end{array}$

where ${\alpha\ge0}$ runs over the eigenvalues of the positive semidefinite matrix ${R^TR}$. Then, ${1-\alpha t^2}$ is decreasing over ${0\le t\le 1}$ and, from the assumption that ${C(t)}$ is strictly positive definite, is nonzero. This implies that ${1-\alpha t^2}$ is nonnegative and decreasing. So, ${\lvert C(t)_J\rvert}$ is decreasing in t.

In the case where ${C(t)}$ is not strictly positive definite, the above argument can be applied to the positive definite matrices ${C(t)+\epsilon I}$ for any ${\epsilon > 0}$ to see that ${\lvert (C(t)+\epsilon I)_J\rvert}$ is decreasing in t and, letting ${\epsilon}$ go to zero, ${\lvert C(t)_J\rvert}$ is decreasing in t, so has nonpositive derivative. Applying this to the definition (23) shows that ${c_J}$ is nonnegative. ⬜

The previous two lemmas can be combined to show that we can continuously transform the probabilities on the right hand side of inequality (16) into the left hand side in such a way that they are increasing. This is a special case of conjecture C3 above.

Lemma 5 Let X be a random vector with values in ${{\mathbb R}^n}$ and, for each ${0\le t\le1}$, let ${{\mathbb P}_t}$ be a probability measure with respect to which X is centered Gaussian with covariance matrix ${C(t)}$ of the form (14). Then,

 $\displaystyle t\mapsto{\mathbb P}_t\left(\max_{1\le i\le n}\lvert X_i\rvert\le1\right)$ (24)

is increasing in t

Proof: Let Z be the random vector taking values in ${{\mathbb R}_+^n}$ defined by ${Z_i=X_i^2/2}$. Choosing ${\epsilon > 0}$, let ${\phi_\epsilon\colon{\mathbb R}_+\rightarrow{\mathbb R}_+}$ be smooth, decreasing, and satisfy ${\phi_\epsilon(x)=1}$ for ${x\le1/2}$ and ${\phi_\epsilon(x)=0}$ for ${x\ge1/2+\epsilon}$. Define ${f_\epsilon\colon{\mathbb R}^n\rightarrow{\mathbb R}}$ by

$\displaystyle f_\epsilon(z)=\prod_{i=1}^n\phi_\epsilon(\lvert z_i\rvert).$

This is smooth with compact support. For any ${J\subseteq[n]}$ and ${z\in{\mathbb R}_+^n}$ we have,

$\displaystyle (-\partial)_J f_\epsilon(z)=\prod_{i\in J}(-\phi^\prime(z_i))\prod_{i\in[n]\setminus J}\phi(z_i)\ge0.$

From Lemma 3,

$\displaystyle \frac{d}{dt}{\mathbb E}_t\left[f_\epsilon(Z)\right]=\sum_{\emptyset\not=J\subseteq[n]}c_J\tilde{\mathbb E}_t\left[(-\partial)_Jf_\epsilon(\tilde Z)\right].$

The coefficients ${c_J}$ are nonnegative, by Lemma 4, so the derivative above is nonnegative and ${{\mathbb E}_t[f_\epsilon(Z)]}$ is increasing in t.

Finally, letting ${\epsilon}$ go to zero,

$\displaystyle {\mathbb E}_t[f_\epsilon(Z)]\rightarrow{\mathbb P}_t\left(\max_{1\le i\le n}\lvert X_i\rvert\le1\right)$

so that (24) is increasing in t as claimed. ⬜

Finally, GCI follows from Lemma 5.

Theorem 6 (Gaussian Correlation Inequality) If X is an n-dimensional centered Gaussian random vector then inequality (16),

$\displaystyle {\mathbb P}\left(\max_{1\le i\le n}\lvert X_i\rvert\le1\right)\ge {\mathbb P}\left(\max_{1\le i\le k}\lvert X_i\rvert\le1\right) {\mathbb P}\left(\max_{k < i\le n}\lvert X_i\rvert\le1\right)$

holds, for each ${1\le k\le n}$.

Proof: Setting ${r=k}$ and ${s=n-k}$, write the covariance matrix of X in the form

$\displaystyle C=\left(\begin{array}{cc} C_1 & Q \smallskip\\ Q^T & C_2 \end{array}\right).$

for rxr matrix ${C_1}$, sxs matrix ${C_2}$ and rxs matrix Q. As this is positive semidefinite, ${C_1}$ and ${C_2}$ are positive semidefinite. Letting ${C(t)}$ be defined by (14), ${C(0)}$ is positive semidefinite and, therefore, so is

$\displaystyle C(t)=(1-t)C(0)+tC(1).$

So, we can define ${{\mathbb P}_t}$ to be a probability measure with respect to which X is centered Gaussian with covariance matrix ${C(t)}$. Using Lemma 5,

$\displaystyle \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle {\mathbb P}\left(\max_{1\le i\le n}\lvert X_i\rvert\le1\right) &\displaystyle= {\mathbb P}_1\left(\max_{1\le i\le n}\lvert X_i\rvert\le1\right)\smallskip\\ &\displaystyle\ge {\mathbb P}_0\left(\max_{1\le i\le n}\lvert X_i\rvert\le1\right)\smallskip\\ &\displaystyle= {\mathbb P}_0\left(\max_{1\le i\le k}\lvert X_i\rvert\le1\right){\mathbb P}_0\left(\max_{k < i\le n}\lvert X_i\rvert\le1\right)\smallskip\\ &\displaystyle= {\mathbb P}\left(\max_{1\le i\le k}\lvert X_i\rvert\le1\right){\mathbb P}\left(\max_{k < i\le n}\lvert X_i\rvert\le1\right) \end{array}$

as required. ⬜

#### The Local Method Revisited

Looking at Royen’s proof, it is very similar in appearance to the local method described previously. Both consider continuously varying the coviarance matrix of the joint multidimensional Gaussian random vectors according to (14), and attempt to show that the resulting probabilities are increasing. Royen diverges from previous methods in two ways. First, rather than looking directly at Gaussian measures, by squaring the components of the random vector he instead brings in properties of the multidimensional gamma distribution. Secondly, he considers GCI in the special form (4). This results in showing that the probability (24) stated in Lemma 5 is increasing, rather than the more general form (15) stated in conjecture C3 above.

However, it follows from Royen’s proof that the conjectures discussed in the local approach above are true. It is surprising then that no-one has managed previously to solve the Gaussian correlation inequality using that approach.

Theorem 7 Conjectures C1, C2 and C3 are true.

Proof: We already know, as shown in Lemma 2, that these conjectures are equivalent. I will concentrate on proving C1. That is, if X and Y are independent standard Gaussian random vectors of dimension n then, setting ${Y(t)=tX+\sqrt{1-t^2}Y}$, the probability

 $\displaystyle {\mathbb P}\left(X\in A,Y(t)\in B\right)$ (25)

is increasing in t over the range ${0\le t\le1}$. Here, A and B are symmetric convex subsets of ${{\mathbb R}^n}$.

Consider the case where A and B are convex polytopes of the form (7),

$\displaystyle \setlength\arraycolsep{2pt} \begin{array}{rl} &\displaystyle A=\left\{x\in{\mathbb R}^n\colon \lvert a_i\cdot x\rvert\le 1, i=1,\ldots,r\right\},\smallskip\\ &\displaystyle B=\left\{y\in{\mathbb R}^n\colon \lvert b_j\cdot y\rvert\le 1, j=1,\ldots,s\right\}, \end{array}$

for some ${r,s\in{\mathbb N}}$ and ${a_i,b_j\in{\mathbb R}^n}$. Defining the (r + s)-dimensional random vector ${\tilde X(t)}$ by

$\displaystyle \tilde X_i(t)=\begin{cases} a_i\cdot X,&{\rm if\ }1\le i\le r,\smallskip\\ b_{i-r}\cdot Y(t),&{\rm if\ }r < i\le r+s, \end{cases}$

its covariance matrix is seen to be as in (14) with ${(C_1)_{ij}=a_i\cdot a_j}$, ${(C_2)_{ij}=b_i\cdot b_j}$ and ${Q_{ij}=a_i\cdot b_j}$. By Lemma 5,

$\displaystyle {\mathbb P}\left(X\in A,Y(t)\in B\right)={\mathbb P}\left(\max_{1\le i\le n}\lvert\tilde X_i(t)\rvert\le1\right)$

is increasing in t.

This proves the result for convex polytopes. As was noted above, every closed symmetric convex set is the limit of a decreasing sequence of convex polytopes. So by taking limits, (25) is increasing in t for any closed symmetric convex sets A and B. Finally, if the sets are symmetric and convex we can use the fact that their closures ${\bar A}$ and ${\bar B}$ have the same Gaussian measures as A and B to conclude that (25) is increasing in t. ⬜

#### References

1. Das Gupta, S., Eaton, M. L., Olkin, I., Perlman, M., Savage, L. J., Sobel, M. (1972)Inequalitites on the probability content of convex regions for elliptically contoured distributions Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 2: Probability Theory, University of California Press, 241–265. Link
2. Hargé, Gilles (1999) A Particular Case of Correlation Inequality for the Gaussian Measure. The Annals of Probability. Vol. 27, No. 4, 1939–1951. doi:10.1214/aop/1022874822
3. Khatri, C.G. (1967) On Certain Inequalities for Normal Distributions and their Applications to Simultaneous Confidence Bounds. Ann. Math. Statist. 38, No. 6, 1853–1867. doi:10.1214/aoms/1177698618
4. Latała, R., and Matlak, D. (2017) Royen’s Proof of the Gaussian Correlation Inequality. Geometric Aspects of Functional Analysis: Israel Seminar (GAFA) 2014–2016, Springer International Publishing, 265–275. doi:10.1007/978-3-319-45282-1_17. Preprint (2015) available at arXiv:1512.08776
5. Lipton, R.J., Regan, K.W. A Great Solution. Gödel’s Lost Letter and P=NP (blog), 30 Apr 2017.
6. Pitt, Loren D. (1977) A Gaussian Correlation Inequality for Symmetric Convex Sets. The Annals of Probability. Vol. 5, No. 3, 470–474. doi:10.1214/aop/1176995808.
7. Royen, T. (2014) A simple proof of the Gaussian correlation conjecture extended to multivariate gamma distributions. Far East Journal of Theoretical Statistics. Vol 48, Issue 2, 139–145. Preprint available at arXiv:1408.1028
8. Sidak, Z. (1967) Rectangular Confidence Regions for the Means of Multivariate Normal Distributions. Journal of the American Statistical Association. Vol. 62, No. 318, 626–633. doi:10.2307/2283989
9. Sidak, Z. (1968) On Multivariate Normal Probabilities of Rectangles: Their Dependence on Correlations. The Annals of Mathematical Statistics. Vol. 39, No. 5, 1425–1434. Link
10. Retired German man solves one of world’s most complex maths problem with simple proof. Independent, Mon 3 April 2017.
11. Retired 67-year-old man solves one of the world’s most complex maths problems while brushing his teeth using a ‘surprisingly simple’ solution. Daily Mail, Tues 4 April 2017.
12. A Long-Sought Proof, Found and Almost Lost. Quanta Magazine, 28 Mar 2017.

## 3 thoughts on “The Gaussian Correlation Inequality”

1. Kunal Dutta says:

Thanks for this very nice blog post. Royen has subsequently simplified his proof even further, see: https://arxiv.org/abs/1507.00528 . He now looks at the Laplace transform of the cdf rather than the pdf, reducing the proof to half a page!

1. Thanks. Giving the new paper a first look through, it seems that he has further extended his inequalities. Although the proof of Theorem 1 stated there is only half a page, I think this would be rather difficult to follow in isolation, and is rather longer when you add the required lemmas and explanations.

2. NG says:

Are there any interesting open problems remaining in this general area?