Homomorphisms of *-Probability Spaces

I previously introduced the concept of a *-probability space as a pair {(\mathcal A,p)} consisting of a state {p} on a *-algebra {\mathcal A}. As we noted, this concept is rather too simplistic to properly capture a noncommutative generalisation of classical probability spaces, and I will later give conditions for {(\mathcal A,p)} to be considered as a true probability space. For now, I continue the investigation of these preprobability spaces, and will look at homomorphisms in this post.

A *-homomorphism between *-algebras {\mathcal A} and {\mathcal A^\prime} is a map {\varphi\colon\mathcal A\rightarrow\mathcal A^\prime} preserving the algebra operations,

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} &\displaystyle \varphi(\lambda a+\mu b)=\lambda\varphi(a)+\mu\varphi(b),\smallskip\\ &\displaystyle \varphi(ab)=\varphi(a)\varphi(b),\smallskip\\ &\displaystyle \varphi(a^*)=\varphi(a)^*, \end{array}

for all {a,b\in\mathcal A} and {\lambda,\mu\in{\mathbb C}}. The term `*-homomorphism’ is used to distinguish it from the concept of simple algebra homomorphisms which need not preserve the involution (the third identity above). Next, I will say that {\varphi} is a homomorphism of *-probability spaces {(\mathcal A,p)} and {(\mathcal A^\prime,p^\prime)} if it is a *-homomorphism from {\mathcal A} to {\mathcal A^\prime} which preserves the state,

\displaystyle  p^\prime(\varphi(a))=p(a),

for all {a\in\mathcal A}.

Now, recall that for any *-probability space {(\mathcal A,p)}, we define a semi-inner product {\langle x,y\rangle=p(x^*y)} on {\mathcal A} and the associated {L^2(p)} seminorm, {\lVert x\rVert_2=\sqrt{p(x^*x)}}. Homomorphisms of *-probability spaces are clearly {L^2}-isometries,

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle \langle\varphi(x),\varphi(y)\rangle&\displaystyle=p^\prime\left(\varphi(x)^*\varphi(y)\right)=p^\prime\left(\varphi(x^*y)\right)\smallskip\\ &\displaystyle=p(x^*y)=\langle x,y\rangle. \end{array}

For each {a\in\mathcal A}, the {L^\infty(p)} seminorm {\lVert a\rVert_\infty} is defined as the operator norm of the left-multiplication map {x\mapsto ax} on {\mathcal A}, considered as a vector space with the {L^2} seminorm. Homomorphisms of *-probability spaces do not need to be {L^\infty}-isometric.

Lemma 1 If {\varphi\colon(\mathcal A,p)\rightarrow(\mathcal A^\prime,p^\prime)} is a homomorphism of *-probability spaces then, for any {a\in\mathcal A},

\displaystyle  \lVert\varphi(a)\rVert_\infty\ge\lVert a\rVert_\infty. (1)

Continue reading “Homomorphisms of *-Probability Spaces”

States on *-Algebras

So far, we have been considering positive linear maps on a *-algebra. Taking things a step further, we want to consider positive maps which are normalized so as to correspond to expectations under a probability measure. That is, we require {p(1)=1}, although this is only defined for unitial algebras. I use the definitions and notation of the previous post on *-algebras.

Definition 1 A state on a unitial *-algebra {\mathcal A} is a positive linear map {p\colon\mathcal A\rightarrow{\mathbb C}} satisfying {p(1)=1}.

Examples 3 and 4 of the previous post can be extended to give states.

Example 1 Let {(X,\mathcal E,\mu)} be a probability space, and {\mathcal A} be the bounded measurable maps {X\rightarrow{\mathbb C}}. Then, integration w.r.t. {\mu} defines a state on {\mathcal A},

\displaystyle  p(f)=\int f d\mu.

Example 2 Let {V} be an inner product space, and {\mathcal A} be a *-algebra of the space of linear maps {a\colon V\rightarrow V} as in example 2 of the previous post, and including the identity map {I}. Then, any {\xi\in V} with {\lVert\xi\rVert=1} defines a state on {\mathcal A},

\displaystyle  p(a)=\langle\xi,a\xi\rangle.

Continue reading “States on *-Algebras”

*-Algebras

After the previous posts motivating the idea of studying probability spaces by looking at states on algebras, I will now make a start on the theory. The idea is that an abstract algebra can represent the collection of bounded, and complex-valued, random variables, with a state on this algebra taking the place of the probability measure. By allowing the algebra to be noncommutative, we also incorporate quantum probability.

I will take very small first steps in this post, considering only the basic definition of a *-algebra and positive maps. To effectively emulate classical probability theory in this context will involve additional technical requirements. However, that is not the aim here. We take a bare-bones approach, to get a feeling for the underlying constructs, and start with the definition of a *-algebra. I use {\bar\lambda} to denote the complex conjugate of a complex number {\lambda}.

Definition 1 An algebra {\mathcal A} over field {K} is a {K}-vector space together with a binary product {(a,b)\mapsto ab} satisfying

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} &\displaystyle a(bc)=(ab)c,\smallskip\\ &\displaystyle \lambda(ab)=(\lambda a)b=a(\lambda b)\smallskip\\ &\displaystyle a(b+c)=ab+ac,\smallskip\\ &\displaystyle (a+b)c=ac+bc, \end{array}

for all {a,b,c\in\mathcal A} and {\lambda\in K}.

A *-algebra {\mathcal A} is an algebra over {{\mathbb C}} with a unary involution, {a\mapsto a^*} satisfying

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} &\displaystyle (\lambda a+\mu b)^*=\bar\lambda a^*+\bar\mu b^*,\smallskip\\ &\displaystyle (ab)^*=b^*a^*\smallskip\\ &\displaystyle a^{**}=a. \end{array}

for all {a,b,c\in\mathcal A} and {\lambda,\mu\in{\mathbb C}}.

An algebra is called unitial if there exists {1\in\mathcal A} such that

\displaystyle  1a=a1=a

for all {a\in\mathcal A}. Then, {1} is called the unit or identity of {\mathcal A}.

Continue reading “*-Algebras”

Algebraic Probability: Quantum Theory

We continue the investigation of representing probability spaces as states on algebras. Whereas, previously, I focused attention on the commutative case and on classical probabilities, in the current post I will look at non-commutative quantum probability.

Quantum theory is concerned with computing probabilities of outcomes of measurements of a physical system, as conducted by an observer. The standard approach is to start with a Hilbert space {\mathcal H}, which is used to represent the states of the system. This is a vector space over the complex numbers, together with an inner product {\langle\cdot,\cdot\rangle}. By definition, this is linear in one argument and anti-linear in the other,

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} &\displaystyle\langle\phi,\lambda\psi+\mu\chi\rangle=\lambda\langle\phi,\psi\rangle+\mu\langle\phi,\chi\rangle,\smallskip\\ &\displaystyle\langle\lambda\phi+\mu\psi,\chi\rangle=\bar\lambda\langle\phi,\chi\rangle+\bar\mu\langle\psi,\chi\rangle,\smallskip\\ &\displaystyle\langle\psi,\phi\rangle=\overline{\langle\phi,\psi\rangle}, \end{array}

for {\phi,\psi,\chi\in\mathcal H} and {\lambda,\mu\in{\mathbb C}}. Positive definiteness is required, so that {\langle\psi,\psi\rangle > 0} for {\psi\not=0}. I am using the physicists’ convention, where the inner product is linear in the second argument and anti-linear in the first. Furthermore, physicists often use the bra–ket notation {\langle\phi\vert\psi\rangle}, which can be split up into the `bra’ {\langle\phi\vert} and `ket’ {\vert\psi\rangle} considered as elements of the dual space of {\mathcal H} and of {\mathcal H} respectively. For a linear operator {A\colon\mathcal H\rightarrow\mathcal H}, the expression {\langle\phi,A\psi\rangle} is often expressed as {\langle\phi\vert A\vert\psi\rangle} in the physicists’ language. By the Hilbert space definition, {\mathcal H} is complete with respect to the norm {\lVert\psi\rVert=\sqrt{\langle\psi,\psi\rangle}}. Continue reading “Algebraic Probability: Quantum Theory”

Algebraic Probability (continued)

Continuing on from the previous post, I look at cases where the abstract concept of states on algebras correspond to classical probability measures. Up until now, we have considered commutative real algebras but, before going further, it will help to look instead at algebras over the complex numbers {{\mathbb C}}. In the commutative case, we will see that this is equivalent to using real algebras, but can be more convenient, and in the non-commutative case it is essential. When using complex algebras, we will require the existence of an involution, which can be thought of as a generalisation of complex conjugation.

Recall that, by an algebra {\mathcal A} over a field {K}, we mean that {\mathcal A} is a {K}-vector space together with a binary product operation satisfying associativity, distributivity over addition, compatibility with scalars, and which has a multiplicative identity.

Definition 1 A *-algebra {\mathcal A} is an algebra over {{\mathbb C}} together with an involution, which is a unary operator {\mathcal A\rightarrow\mathcal A}, {a\mapsto a^*}, satisfying,

  1. Anti-linearity: {(\lambda a+\mu b)^*=\bar\lambda a^*+\bar\mu b^*}.
  2. {(ab)^*=b^*a^*}.
  3. {a^{**}=a}

for all {a,b\in\mathcal A} and {\lambda,\mu\in{\mathbb C}}.

Continue reading “Algebraic Probability (continued)”

Algebraic Probability

The aim of this post is to motivate the idea of representing probability spaces as states on a commutative algebra. We will consider how this abstract construction relates directly to classical probabilities.

In the standard axiomatization of probability theory, due to Kolmogorov, the central construct is a probability space {(\Omega,\mathcal F,{\mathbb P})}. This consists of a state space {\Omega}, an event space {\mathcal F}, which is a sigma-algebra of subsets of {\Omega}, and a probability measure {{\mathbb P}}. The measure {{\mathbb P}} is defined as a map {{\mathbb P}\colon\mathcal F\rightarrow{\mathbb R}^+} satisfying countable additivity and normalised as {{\mathbb P}(\Omega)=1}.

A measure space allows us to define integrals of real-valued measurable functions or, in the language of probability, expectations of random variables. We construct the set {L^\infty(\Omega,\mathcal F)} of all bounded measurable functions {X\colon\Omega\rightarrow{\mathbb R}}. This is a real vector space and, as it is closed under multiplication, is an algebra. Expectation, by definition, is the unique linear map {L^\infty\rightarrow{\mathbb R}}, {X\mapsto{\mathbb E}[X]} satisfying {{\mathbb E}[1_A]={\mathbb P}(A)} for {A\in\mathcal F} and monotone convergence: if {X_n\in L^\infty} is a nonnegative sequence increasing to a bounded limit {X}, then {{\mathbb E}[X_n]} tends to {{\mathbb E}[X]}.

In the opposite direction, any nonnegative linear map {p\colon L^\infty(\Omega,\mathcal F)\rightarrow{\mathbb R}} satisfying monotone convergence and {p(1)=1} defines a probability measure by {{\mathbb P}(A)=p(1_A)}. This is the unique measure with respect to which expectation agrees with the linear map, {{\mathbb E}=p}. So, probability measures are in one-to-one correspondence with such linear maps, and they can be viewed as one and the same thing. The Kolmogorov definition of a probability space can be thought of as representing the expectation on the subset of {L^\infty} consisting of indicator functions {1_A}. In practice, it is often more convenient to start with a different subset of {L^\infty}. For example, probability measures on {{\mathbb R}^+} can be defined via their Laplace transform, {\mathcal L_{{\mathbb P}}(a)=\int e^{-ax}d{\mathbb P}(x)}, which represents the expectation on exponential functions {x\mapsto e^{-ax}}. Generalising to complex-valued random variables, probability measures on {{\mathbb R}} are often represented by their characteristic function {\varphi(a)=\int e^{iax}d{\mathbb P}(x)}, which is just the expectation of the complex exponentials {x\mapsto e^{iax}}. In fact, by the monotone class theorem, we can uniquely represent probability measures on {(\Omega,\mathcal F)} by the expectations on any subset {\mathcal K\subseteq L^\infty} which is closed under taking products and generates the sigma-algebra {\mathcal F}. Continue reading “Algebraic Probability”

The Functional Monotone Class Theorem

The monotone class theorem is a very helpful and frequently used tool in measure theory. As measurable functions are a rather general construct, and can be difficult to describe explicitly, it is common to prove results by initially considering just a very simple class of functions. For example, we would start by looking at continuous or piecewise constant functions. Then, the monotone class theorem is used to extend to arbitrary measurable functions. There are different, but related, `monotone class theorems’ which apply, respectively, to sets and to functions. As the theorem for sets was covered in a previous post, this entry will be concerned with the functional version. In fact, even for the functional version, there are various similar, but slightly different, statements of the monotone class theorem. In practice, it is beneficial to use the version which most directly applies to the specific application. So, I will state and prove several different versions in this post. Continue reading “The Functional Monotone Class Theorem”

The Monotone Class Theorem

The monotone class theorem, and closely related {\pi}-system lemma, are simple but fundamental theorems in measure theory, and form an essential step in the proofs of many results. General measurable sets are difficult to describe explicitly so, when proving results in measure theory, it is often necessary to start by considering much simpler sets. The monotone class theorem is then used to extend to arbitrary measurable sets. For example, when proving a result about Borel subsets of {{\mathbb R}}, we may start by considering compact intervals and then apply the monotone class theorem. I include this post on the monotone class theorem for reference. Continue reading “The Monotone Class Theorem”

Essential Suprema

Given a sequence {X_1,X_2,\ldots} of real-valued random variables defined on a probability space {(\Omega,\mathcal F,{\mathbb P})}, it is a standard result that the supremum

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} &\displaystyle X\colon\Omega\rightarrow{\mathbb R}\cup\{\infty\},\smallskip\\ &\displaystyle X(\omega)=\sup_nX_n(\omega). \end{array}

is measurable. To ensure that this is well-defined, we need to allow X to have values in {{\mathbb R}\cup\{\infty\}}, so that {X(\omega)=\infty} whenever the sequence {X_n(\omega)} is unbounded above. The proof of this fact is simple. We just need to show that {X^{-1}((-\infty,a])} is in {\mathcal F} for all {a\in{\mathbb R}}. Writing,

\displaystyle  X^{-1}((-\infty,a])=\bigcap_nX_n^{-1}((-\infty,a]),

the properties that {X_n} are measurable and that the sigma-algebra {\mathcal F} is closed under countable intersections gives the result.

The measurability of the suprema of sequences of random variables is a vital property, used throughout probability theory. However, once we start looking at uncountable collections of random variables things get more complicated. Given a, possibly uncountable, collection of random variables {\mathcal S}, the supremum {S=\sup\mathcal S} is,

\displaystyle  S(\omega)=\sup\left\{X(\omega)\colon X\in\mathcal S\right\}. (1)

However, there are a couple of reasons why this is often not a useful construction:

  • The supremum need not be measurable. For example, consider the probability space {\Omega=[0,1]} with {\mathcal F} the collection of Borel or Lebesgue subsets of {\Omega}, and {{\mathbb P}} the standard Lebesgue measure. For any {a\in[0,1]} define the random variable {X_a(\omega)=1_{\{\omega=a\}}} and, for a subset A of {[0,1]}, consider the collection of random variables {\mathcal S=\{X_a\colon a\in A\}}. Its supremum is

    \displaystyle  S(\omega)=1_{\{\omega\in A\}}

    which is not measurable if A is a non-measurable set (e.g., a Vitali set).

  • Even if the supremum is measurable, it might not be a useful quantity. Letting {X_a} be the random variables on {(\Omega,\mathcal F,{\mathbb P})} constructed above, consider {\mathcal S=\{X_a\colon a\in[0,1]\}}. Its supremum is the constant function {S=1}. As every {X\in\mathcal S} is almost surely equal to 0, it is almost surely bounded above by the constant function {Y=0}. So, the supremum {S=1} is larger than we may expect, and is not what we want in many cases.

The essential supremum can be used to correct these deficiencies, and has been important in several places in my notes. See, for example, the proof of the debut theorem for right-continuous processes. So, I am posting this to use as a reference. Note that there is an alternative use of the term `essential supremum’ to refer to the smallest real number almost surely bounding a specified random variable, which is the one referred to by Wikipedia. This is different from the use here, where we look at a collection of random variables and the essential supremum is itself a random variable.

The essential supremum is really just the supremum taken within the equivalence classes of random variables under the almost sure ordering. Consider the equivalence relation {X\cong Y} if and only if {X=Y} almost surely. Writing {[X]} for the equivalence class of X, we can consider the ordering given by {[X]\le[Y]} if {X\le Y} almost surely. Then, the equivalence class of the essential supremum of a collection {\mathcal S} of random variables is the supremum of the equivalence classes of the elements of {\mathcal S}. In order to avoid issues with unbounded sets, we consider random variables taking values in the extended reals {\bar{\mathbb R}={\mathbb R}\cup\{\pm\infty\}}.

Definition 1 An essential supremum of a collection {\mathcal S} of {\bar{\mathbb R}}-valued random variables,

\displaystyle  S = {\rm ess\,sup\,}\mathcal{S}

is the least upper bound of {\mathcal{S}}, using the almost-sure ordering on random variables. That is, S is an {\bar{\mathbb R}}-valued random variable satisfying

  • upper bound: {S\ge X} almost surely, for all {X\in\mathcal S}.
  • minimality: for all {\bar{\mathbb R}}-valued random variables Y satisfying {Y\ge X} almost surely for all {X\in\mathcal S}, we have {Y\ge S} almost surely.

Continue reading “Essential Suprema”

The Gaussian Correlation Inequality

When I first created this blog, the subject of my initial post was the Gaussian correlation conjecture. Using {\mu_n} to denote the standard n-dimensional Gaussian probability measure, the conjecture states that the inequality

\displaystyle  \mu_n(A\cap B)\ge\mu_n(A)\mu_n(B)

holds for all symmetric convex subsets A and B of {{\mathbb R}^n}. By symmetric, we mean symmetric about the origin, so that {-x} is in A if and only {x} is in A, and similarly for B. The standard Gaussian measure by definition has zero mean and covariance matrix equal to the nxn identity matrix, so that

\displaystyle  d\mu_n(x)=(2\pi)^{-n/2}e^{-\frac12x^Tx}\,dx,

with {dx} denoting the Lebesgue integral on {{\mathbb R}^n}. However, if it holds for the standard Gaussian measure, then the inequality can also be shown to hold for any centered (i.e., zero mean) Gaussian measure.

At the time of my original post, the Gaussian correlation conjecture was an unsolved mathematical problem, originally arising in the 1950s and formulated in its modern form in the 1970s. However, in the period since that post, the conjecture has been solved! A proof was published by Thomas Royen in 2014 [7]. This seems to have taken some time to come to the notice of much of the mathematical community. In December 2015, Rafał Latała, and Dariusz Matlak published a simplified version of Royen’s proof [4]. Although the original proof by Royen was already simple enough, it did consider a generalisation of the conjecture to a kind of multivariate gamma distribution. The exposition by Latała and Matlak ignores this generality and adds in some intermediate lemmas in order to improve readability and accessibility. Since then, the result has become widely known and, recently, has even been reported in the popular press [10,11]. There is an interesting article on Royen’s discovery of his proof at Quanta Magazine [12] including the background information that Royen was a 67 year old German retiree who supposedly came up with the idea while brushing his teeth one morning. Dick Lipton and Ken Regan have recently written about the history and eventual solution of the conjecture on their blog [5]. As it has now been shown to be true, I will stop referring to the result as a `conjecture’ and, instead, use the common alternative name — the Gaussian correlation inequality.

In this post, I will describe some equivalent formulations of the Gaussian correlation inequality, or GCI for short, before describing a general method of attacking this problem which has worked for earlier proofs of special cases. I will then describe Royen’s proof and we will see that it uses the same ideas, but with some key differences. Continue reading “The Gaussian Correlation Inequality”