Multivariate Normal Distributions

I looked at normal random variables in an earlier post but, what does it mean for a sequence of real-valued random variables {X_1,X_2,\ldots,X_n} to be jointly normal? We could simply require each of them to be normal, but this says very little about their joint distribution and is not much help in handling expressions involving more than one of the {X_i} at once. In case that the random variables are independent, the following result is a very useful property of the normal distribution. All random variables in this post will be real-valued, except where stated otherwise, and we assume that they are defined with respect to some underlying probability space {(\Omega,\mathcal F,{\mathbb P})}.

Lemma 1 Linear combinations of independent normal random variables are again normal.

Proof: More precisely, if {X_1,\ldots,X_n} is a sequence of independent normal random variables and {a_1,\ldots,a_n} are real numbers, then {Y=a_1X_1+\cdots+a_nX_n} is normal. Let us suppose that {X_k} has mean {\mu_k} and variance {\sigma_k^2}. Then, the characteristic function of Y can be computed using the independence property and the characteristic functions of the individual normals,

\displaystyle  \begin{aligned} {\mathbb E}\left[e^{i\lambda Y}\right] &={\mathbb E}\left[\prod_ke^{i\lambda a_k X_k}\right] =\prod_k{\mathbb E}\left[e^{i\lambda a_k X_k}\right]\\ &=\prod_ke^{-\frac12\lambda^2a_k^2\sigma_k^2+i\lambda a_k\mu_k} =e^{-\frac12\lambda^2\sigma^2+i\lambda\mu} \end{aligned}

where we have set {\mu_k=\sum_ka_k\mu_k} and {\sigma^2=\sum_ka_k^2\sigma_k^2}. This is the characteristic function of a normal random variable with mean {\mu} and variance {\sigma^2}. ⬜

The definition of joint normal random variables will include the case of independent normals, so that any linear combination is also normal. We use use this result as the defining property for the general multivariate normal case.

Definition 2 A collection {\{X_i\}_{i\in I}} of real-valued random variables is multivariate normal (or joint normal) if and only if all of its finite linear combinations are normal.

Continue reading “Multivariate Normal Distributions”

The Riemann Zeta Function and Probability Distributions

Phi and Psi densities
Figure 1: Probability densities used to extend the zeta function

The famous Riemann zeta function was first introduced by Riemann in order to describe the distribution of the prime numbers. It is defined by the infinite sum

\displaystyle  \begin{aligned} \zeta(s) &=1+2^{-s}+3^{-s}+4^{-s}+\cdots\\ &=\sum_{n=1}^\infty n^{-s}, \end{aligned} (1)

which is absolutely convergent for all complex s with real part greater than one. One of the first properties of this is that, as shown by Riemann, it extends to an analytic function on the entire complex plane, other than a simple pole at {s=1}. By the theory of analytic continuation this extension is necessarily unique, so the importance of the result lies in showing that an extension exists. One way of doing this is to find an alternative expression for the zeta function which is well defined everywhere. For example, it can be expressed as an absolutely convergent integral, as performed by Riemann himself in his original 1859 paper on the subject. This leads to an explicit expression for the zeta function, scaled by an analytic prefactor, as the integral of {x^s} multiplied by a function of x over the range { x > 0}. In fact, this can be done in a way such that the function of x is a probability density function, and hence expresses the Riemann zeta function over the entire complex plane in terms of the generating function {{\mathbb E}[X^s]} of a positive random variable X. The probability distributions involved here are not the standard ones taught to students of probability theory, so may be new to many people. Although these distributions are intimately related to the Riemann zeta function they also, intriguingly, turn up in seemingly unrelated contexts involving Brownian motion.

In this post, I derive two probability distributions related to the extension of the Riemann zeta function, and describe some of their properties. I also show how they can be constructed as the sum of a sequence of gamma distributed random variables. For motivation, some examples are given of where they show up in apparently unrelated areas of probability theory, although I do not give proofs of these statements here. For more information, see the 2001 paper Probability laws related to the Jacobi theta and Riemann zeta functions, and Brownian excursions by Biane, Pitman, and Yor. Continue reading “The Riemann Zeta Function and Probability Distributions”

Manipulating the Normal Distribution

The normal (or Gaussian) distribution is ubiquitous throughout probability theory for various reasons, including the central limit theorem, the fact that it is realistic for many practical applications, and because it satisfies nice properties making it amenable to mathematical manipulation. It is, therefore, one of the first continuous distributions that students encounter at school. As such, it is not something that I have spent much time discussing on this blog, which is usually concerned with more advanced topics. However, there are many nice properties and methods that can be performed with normal distributions, greatly simplifying the manipulation of expressions in which it is involved. While it is usually possible to ignore these, and instead just substitute in the density function and manipulate the resulting integrals, that approach can get very messy. So, I will describe some of the basic results and ideas that I use frequently.

Throughout, I assume the existence of an underlying probability space {(\Omega,\mathcal F,{\mathbb P})}. Recall that a real-valued random variable X has the standard normal distribution if it has a probability density function given by,

\displaystyle  \varphi(x)=\frac1{\sqrt{2\pi}}e^{-\frac{x^2}2}.

For it to function as a probability density, it is necessary that it integrates to one. While it is not obvious that the normalization factor {1/\sqrt{2\pi}} is the correct value for this to be true, it is the one fact that I state here without proof. Wikipedia does list a couple of proofs, which can be referred to. By symmetry, {-X} and {X} have the same distribution, so that they have the same mean and, therefore, {{\mathbb E}[X]=0}.

The derivative of the density function satisfies the useful identity

\displaystyle  \varphi^\prime(x)=-x\varphi(x). (1)

This allows us to quickly verify that standard normal variables have unit variance, by an application of integration by parts.

\displaystyle  \begin{aligned} {\mathbb E}[X^2] &=\int x^2\varphi(x)dx\\ &= -\int x\varphi^\prime(x)dx\\ &=\int\varphi(x)dx-[x\varphi(x)]_{-\infty}^\infty=1 \end{aligned}

Another identity satisfied by the normal density function is,

\displaystyle  \varphi(x+y)=e^{-xy - \frac{y^2}2}\varphi(x) (2)

This enables us to prove the following very useful result. In fact, it is difficult to overstate how helpful this result can be. I make use of it frequently when manipulating expressions involving normal variables, as it significantly simplifies the calculations. It is also easy to remember, and simple to derive if needed.

Theorem 1 Let X be standard normal and {f\colon{\mathbb R}\rightarrow{\mathbb R}_+} be measurable. Then, for all {\lambda\in{\mathbb R}},

\displaystyle  \begin{aligned} {\mathbb E}[e^{\lambda X}f(X)] &={\mathbb E}[e^{\lambda X}]{\mathbb E}[f(X+\lambda)]\\ &=e^{\frac{\lambda^2}{2}}{\mathbb E}[f(X+\lambda)]. \end{aligned} (3)

Continue reading “Manipulating the Normal Distribution”

Quantum Coin Tossing

coinflip

Let me ask the following very simple question. Suppose that I toss a pair of identical coins at the same time, then what is the probability of them both coming up heads? There is no catch here, both coins are fair. There are three possible outcomes, both tails, one head and one tail, and both heads. Assuming that it is completely random so that all outcomes are equally likely, then we could argue that each possibility has a one in three chance of occurring, so that the answer to the question is that the probability is 1/3.

Of course, this is wrong! A fair coin has a probability of 1/2 of showing heads and, by independence, standard probability theory says that we should multiply these together for each coin to get the correct answer of {\frac12\times\frac12=\frac14}, which can be verified by experiment. Alternatively, we can note that the outcome of one tail and one head, in reality, consists of two equally likely possibilities. Either the first coin can be a head and the second a tail, or vice-versa. So, there are actually four equally likely possible outcomes, only one of which has both coins showing heads, again giving a probability of 1/4. Continue reading “Quantum Coin Tossing”

Quantum Entanglement States

In an earlier post, I described four simple thought experiments, involving some black boxes and two or more participants. As described there, the results of these experiments were inconsistent with any classical description, assuming that the boxes cannot communicate. However, I also stated that all of these experiments are consistent with quantum probability, and that I would give the mathematical details in a further post. I will do this now. Continue reading “Quantum Entanglement States”

Quantum Entanglement

Quantum entanglement is one of the most striking differences between the behaviour of the universe described by quantum theory, and that given by classical physics. If two physical systems interact then, even if they later separate, their future evolutions can no longer be considered purely in isolation. Any attempt to describe the systems with classical logic leads inevitably to an apparent link between them, where simply observing one instantaneously impacts the state of the other. This effect remains, regardless of how far apart the systems become.

An EPR-Bohm experiment
Figure 1: An EPR-Bohm experiment

As it is a very famous quantum phenomenon, a lot has been written about entanglement in both the scientific and popular literature. However, it does still seem to be frequently misunderstood, with many surrounding misconceptions. I will attempt to explain the effects of entanglement in as straightforward a way as possible, with some very basic thought experiments. These can be followed without any understanding of what physical processes may be going on underneath. They only involve pressing a button on a box and observing the colour of a light bulb mounted on it. In fact, this is one of the features of quantum entanglement. It does not matter how you describe the physical world, whether you think of things as particles, waves, or whatever. Entanglement is an observable property independently of how, or even if, we try to describe the physical processes. Continue reading “Quantum Entanglement”

The Khintchine Inequality

For a Rademacher sequence {X=(X_1,X_2,\ldots)} and square summable sequence of real numbers {a=(a_1,a_2,\ldots)}, the Khintchine inequality provides upper and lower bounds for the moments of the random variable,

\displaystyle  a\cdot X=a_1X_1+a_2X_2+\cdots.

We use {\ell^2} for the space of square summable real sequences and

\displaystyle  \lVert a\rVert_2=\left(a_1^2+a_2^2+\cdots\right)^{1/2}

for the associated Banach norm.

Theorem 1 (Khintchine) For each {0 < p < \infty}, there exists positive constants {c_p,C_p} such that,

\displaystyle  c_p\lVert a\rVert_2^p\le{\mathbb E}\left[\lvert a\cdot X\rvert^p\right]\le C_p\lVert a\rVert_2^p, (1)

for all {a\in\ell^2}.

Continue reading “The Khintchine Inequality”

Rademacher Series

The Rademacher distribution is probably the simplest nontrivial probability distribution that you can imagine. This is a discrete distribution taking only the two possible values {\{1,-1\}}, each occurring with equal probability. A random variable X has the Rademacher distribution if

\displaystyle  {\mathbb P}(X=1)={\mathbb P}(X=-1)=1/2.

A Randemacher sequence is an IID sequence of Rademacher random variables,

\displaystyle  X = (X_1,X_2,X_3\ldots).

Recall that the partial sums {S_N=\sum_{n=1}^NX_n} of a Rademacher sequence is a simple random walk. Generalizing a bit, we can consider scaling by a sequence of real weights {a_1,a_2,\ldots}, so that {S_N=\sum_{n=1}^Na_nX_n}. I will concentrate on infinite sums, as N goes to infinity, which will clearly include the finite Rademacher sums as the subset with only finitely many nonzero weights.

Rademacher series serve as simple prototypes of more general IID series, but also have applications in various areas. Results include concentration and anti-concentration inequalities, and the Khintchine inequality, which imply various properties of {L^p} spaces and of linear maps between them. For example, in my notes constructing the stochastic integral starting from a minimal set of assumptions, the {L^0} version of the Khintchine inequality was required. Rademacher series are also interesting in their own right, and a source of some very simple statements which are nevertheless quite difficult to prove, some of which are still open problems. See, for example, Some explorations on two conjectures about Rademacher sequences by Hu, Lan and Sun. As I would like to look at some of these problems in the blog, I include this post to outline the basic constructions. One intriguing aspect of Rademacher series, is the way that they mix discrete distributions with combinatorial aspects, and continuous distributions. On the one hand, by the central limit theorem, Rademacher series can often be approximated well by a Gaussian distribution but, on the other hand, they depend on the discrete set of signs of the individual variables in the sum. Continue reading “Rademacher Series”

Completions of *-Probability Spaces

We previously defined noncommutative probability spaces as a *-algebra together with a nondegenerate state satisfying a completeness property. Justification for the stated definition was twofold. First, an argument similar to the construction of measurable random variables on classical probability spaces was used, by taking all possible limits for which an expectation can reasonably be defined. Second, I stated various natural mathematical properties of this construction, including the existence of completions and their functorial property, which allows us to pass from preprobability spaces, and homomorphisms between these, to the NC probability spaces which they generate. However, the statements were given without proof, so the purpose of the current post is to establish these results. Specifically, I will give proofs of each of the theorems stated in the post on noncommutative probability spaces, with the exception of the two theorems relating commutative *-probability spaces to their classical counterpart (theorems 2 and 10), which will be looked at in a later post. Continue reading “Completions of *-Probability Spaces”

Noncommutative Probability Spaces

In classical probability theory, we start with a sample space {\Omega}, a collection {\mathcal F} of events, which is a sigma-algebra on {\Omega}, and a probability measure {{\mathbb P}} on {(\Omega,\mathcal F)}. The triple {(\Omega,\mathcal F,{\mathbb P})} is a probability space, and the collection {L^\infty(\Omega,\mathcal F,{\mathbb P})} of bounded complex-valued random variables on the probability space forms a commutative algebra under pointwise addition and products. The measure {{\mathbb P}} defines an expectation, or integral with respect to {{\mathbb P}}, which is a linear map

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} &\displaystyle L^\infty(\Omega,\mathcal F,{\mathbb P})\rightarrow{\mathbb C},\smallskip\\ &\displaystyle X\mapsto{\mathbb E}[X]=\int X(\omega)d{\mathbb P}(\omega). \end{array}

In this post I provide definitions of probability spaces from the algebraic viewpoint. Statements of some of their first properties will be given in order to justify and clarify the definitions, although any proofs will be left until later posts. In the algebraic setting, we begin with a *-algebra {\mathcal A}, which takes the place of the collection of bounded random variables from the classical theory. It is not necessary for the algebra to be represented as a space of functions from an underlying sample space. Since the individual points {\omega\in\Omega} constituting the sample space are not required in the theory, this is a pointless approach. By allowing multiplication of `random variables’ in {\mathcal A} to be noncommutative, we incorporate probability spaces which have no counterpart in the classical setting, such as are used in quantum theory. The second and final ingredient is a state on the algebra, taking the place of the classical expectation operator. This is a linear map {p\colon\mathcal A\rightarrow{\mathbb C}} satisfying the positivity constraint {p(a^*a)\ge1} and, when {\mathcal A} is unitial, the normalisation condition {p(1)=1}. Algebraic, or noncommutative probability spaces are completely described by a pair {(\mathcal A,p)} consisting of a *-algebra {\mathcal A} and a state {p}. Noncommutative examples include the *-algebra of bounded linear operators on a Hilbert space with pure state {p(a)=\langle\xi,a\xi\rangle} for a fixed unit vector {\xi}. Continue reading “Noncommutative Probability Spaces”