The Gaussian Correlation Conjecture 2

Update: This conjecture has now been solved! See A simple proof of the Gaussian correlation conjecture extended to multivariate gamma distributions by T. Royen, and Royen’s proof of the Gaussian correlation inequality by Rafał Latała, and Dariusz Matlak.

We continue investigating the Gaussian correlation conjecture in this post. This states that if {\mu_n} is the standard Gaussian measure on {{\mathbb R}^n} then

\displaystyle  \mu_n(A\cap B)\ge \mu_n(A)\mu_n(B) (1)

for all symmetric and convex sets {A,B}. In this entry, we consider a stronger `local’ version of the conjecture, which has the advantage that it can be approached using differential calculus. Inequality (1) can alternatively be stated in terms of integrals,

\displaystyle  \mu_n(fg)\ge\mu_n(f)\mu_n(g). (2)

This is clearly equivalent to (2) when {f,g} are indicator functions of convex symmetric sets. More generally, using linearity, it extends to all nonnegative functions such that {f^{-1}([a,\infty))} and {g^{-1}([a,\infty))} are symmetric and convex subsets of {{\mathbb R}^n} for positive {a}. A class of functions lying between these two extremes, which I consider here, are the log-concave functions. The reason for concentrating on this class of functions is that they satisfy the following stability result, due to Prékopa (available here) and Leindler.

Theorem 1 If {f\colon{\mathbb R}^m\times{\mathbb R}^n\rightarrow{\mathbb R}_+} is log-concave then so is {x\mapsto\int f(x,y)dy}.

The approach used in this post is as follows; Given an nxn matrix {Q}, the idea is to consider a probability measure {{\mathbb P}_Q} with respect to which we have joint normal random variables {X_1,\ldots,X_n} and {Y_1,\ldots,Y_n}, all with zero mean and unit variance, and covariances given by

\displaystyle  {\rm Cov}(X_i,X_j)={\rm Cov}(Y_i,Y_j)=0\ {\rm(for\ }i\not=j{\rm)}\ {\rm and}\ {\rm Cov}(X_i,Y_j)=Q_{ij}.

The random variables {(X_1,\ldots,X_n,Y_1,\ldots,Y_n)} then have covariance matrix

\displaystyle  \left(\begin{array}{cc}I_n&Q\\ Q^{\rm t}&I_n\end{array}\right).

This must be positive semidefinite or, equivalently, {\lVert Q\rVert\le 1} in the operator norm. Writing {{\mathbb E}_Q} for expectation with respect to the probability measure {{\mathbb P}_Q}, the Gaussian correlation conjecture is equivalent to the statement

\displaystyle  {\mathbb E}\!_I[f(X)g(Y)]\ge{\mathbb E}\!_0[f(X)g(Y)]. (3)

Furthermore, considering {{\mathbb E}_Q[f(X)g(Y)]} as a function of {Q}, its partial derivatives can be calculated as follows,

\displaystyle  \frac{\partial}{\partial Q_{ij}}{\mathbb E}\!_Q\left[f(X)g(Y)\right]={\mathbb E}\!_Q\left[D_if(X)D_jg(Y)\right]. (4)

Using this, the Taylor expansion to second order about Q=0 can be written as,

\displaystyle  {\mathbb E}\!_Q\left[f(X)g(Y)\right] = {\mathbb E}\!_0\left[f(X)g(Y)\right]+Q_{i_1j_1}Q_{i_2j_2}\mu_n(D_{i_1i_2}f)\mu_n(D_{j_1j_2}g)

(we use the summation convention where repeated indices in a product are summed over). From Theorem 1, the function {y\mapsto\int f(x+y)\,d\mu_n(x)} is log-concave and, therefore the matrix {(A_f)_{ij}=-\mu_n(D_{ij}f)} is positive semidefinite. So, to second order in Q,

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rcl} \displaystyle {\mathbb E}\!_Q\left[f(X)g(Y)\right] - {\mathbb E}\!_0\left[f(X)g(Y)\right] &\displaystyle =&\displaystyle {\rm Tr}(Q^tA_fQA_g)\\ &\displaystyle=&\displaystyle {\rm Tr}(A_g^{1/2}Q^tA_fQA_g^{1/2})\ge 0. \end{array}

If {f,g} are strictly log-concave, so that {A_f,A_g} are positive definite, then this shows that {{\mathbb E}_Q[f(X)g(Y)]} has a local minimum at {Q=0}. This suggests the following conjecture, which is stronger than the Gaussian correlation inequality.

Conjecture: For log-concave functions {f,g} on {{\mathbb R}^n}, then over all matrices {\lVert Q\rVert\le 1}, the function {Q\mapsto{\mathbb E}_Q[f(X)g(Y)]} is minimized at {Q=0}. In fact, we conjecture that it never has a (strict) local minimum at any {Q\not=0}.

Certainly, this conjecture would imply (3). Being a stronger version of the Gaussian correlation inequality, we can be less sure that this statement holds than the original conjecture. However, if it is true then it has one advantage. That is, it can be approached locally with respect to the covariances, only considering infinitesimal changes in Q. In fact, it can be shown to be equivalent to the following (for any fixed log-concave and differentiable {f}). I do not give the proof here, to save space, but it is just an application of equation (4).

Conjecture: For all log-concave functions {g}, the matrix {\mu_n(D_ifD_jg)} is never symmetric and negative definite.

If this can be proven, then the correlation conjecture follows. As demonstrated in 1977 by Pitt, the Gaussian correlation conjecture will follow if this matrix always has nonnegative trace (this was used to prove the two dimensional case of the conjecture), which would certainly imply that it can’t be negative definite. It is easily shown that the matrix will indeed have nonnegative trace if the Gaussian measure is replaced by the standard (Lebesgue) measure. This is because {\int f(x+y)g(x-y)\,dx} is log-concave and symmetric in {y} (by Theorem 1), and is therefore maximized at {y=0}. Expanding to second order in y shows that {\int D_if(x)D_jg(x)\,dx} is symmetric and positive semidefinite. In fact, in two dimensions it can be shown that

\displaystyle  \int_{|x|\le r} D_if(x)D_jg(x)\,dx

has nonnegative trace (see Pitt). From this it follows that {\int D_if(x)D_jg(x)\phi(|x|)\,dx} has nonnegative trace for all decreasing functions {\phi}. In particular, letting {\phi(|x|)} be the Gaussian density, the above conjecture follows. I don’t know whether a similar argument can be carried across to an arbitrary number of dimensions.

Consider, for example, the case where {f(x)} is a function of {x^tMx} for a positive definite matrix {M}. This includes the cases where {A} is a centered ellipsoid (proven in 1999 by Hargé) and, as a limit of centered ellipsoids, where it is a symmetric slab (i.e., the region between two parallel hyperplanes, independently proven in 1967 by both Khatri and Šidák). Say, {f(x)=\theta(x^t M x)}. Then,

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rcl} \displaystyle M^{-1}_{ij}\mu_n(D_ifD_jg)&\displaystyle=&\displaystyle 2M^{-1}_{ij}\int \theta^\prime(x^tMx)M_{ik}x_kD_jg(x)\,d\mu_n(x)\\ &\displaystyle =&\displaystyle 2\int \theta^\prime(x^tMx)x_jD_jg(x)\,d\mu_n(x)\ge 0. \end{array}

The final inequality follows because {f} and {g} are necessarily decreasing along lines going radially out from the origin, implying that {\theta^\prime} and {x_jD_jg(x)} are both nonpositive. Consequently, {\mu_n(D_ifD_jg)} is not positive definite in this case and the Gaussian correlation inequality follows for centered ellipsoids.

10 thoughts on “The Gaussian Correlation Conjecture 2

  1. It is perfectly OK to use “we” about yourself and your tapeworm — which should of course be properly credited.

    By the way, is it just me, or are we lacking an RSS feed for new posts? (I.e. do we have to resort to e-mail notification?)

    1. Thanks.

      However, I do see a problem in that paper. How does proving the result for sets with Gaussian measure \mu_d(A) < 10^{-7} imply it for all sets? It mentions "contracting the sets linearly", but how does that work? Certainly \mu_d(cA) \ge c^d \mu(A) for c less than one. Also, for sets contained in \sqrt{d}B_d, \mu(cA)\le e^{d/2}c^d\mu(A) holds. If I try scaling the sets down so that they have probability no more than 10-7, this leads to a factor of 10^{-7}e^{-d/2} which, rather crucially, depends on d. You can probably do better than that, but I don’t know where the claimed 10^{-14} comes from.

  2. There is something else in Guan’s paper, I do not understand:

    On Page 2, line 4, he seems to apply the CLT to the set F_{1,D}, and in the definition of F_{1,D} the
    left depends on n, so one would need some uniformity of the convergence in the CLT. I do not see
    that, can anybody help me?

    1. Thomas Schlunprecht:

      That’s a very good point! It does look like a problem. In fact, conditioning on the event \{X_i\in D\} (i.e., the event F_{2,D} in the paper), the law of large numbers implies that the term on the left hand side of the inequality defining F_{1,D} almost surely grows at rate

      \displaystyle\frac{n\mathbb{E}[X_1^2\mid X_1\in D]-n\mathbb{E}[X_1^2\mid X_1\in D]\mu(D)}{\sqrt{n{\rm Var}(X_1^2 I_{X_1\in D})}}.

      This is at least as large as \sqrt{n}\mathbb{E}[X_1^2\vert X_1\in D](1-\mu(D))/(\mathbb{E}[X_1^4\mid X_1\in D]^{1/2}\mu(D)^{1/2}), and is larger than the right hand side of the inequality whenever \mathbb{E}[X_1^2\mid X_1\in D] > 10^{-2}\mathbb{E}[X_1^4\mid X_1\in D]^{1/2}/(1-\mu_d(D)) (and I see no reason why that shouldn’t be the case). Then, \mu_{nd}(F_{1,D}\cap F_{2,D})\sim\mu_{nd}(F_{2,D})=\mu_d(D)^n, which means that the CLT cannot give the claimed bound. Using this expression instead of the CLT gives zero on the right hand side of (2.4), which isn’t very useful, and I don’t see how the rest of the argument can follow.

      So, unless I am mistaken, this does seem like a major problem in the argument.

  3. Hi,

    Thanks for your observations, they seem quite clear and serious enough to have legitimate doubts about the proof. Unfortunately, I couldn’t find the author’s e-mail so he could discuss his argument with you.

    Best Regards

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s