Quantum Coin Tossing


Let me ask the following very simple question. Suppose that I toss a pair of identical coins at the same time, then what is the probability of them both coming up heads? There is no catch here, both coins are fair. There are three possible outcomes, both tails, one head and one tail, and both heads. Assuming that it is completely random so that all outcomes are equally likely, then we could argue that each possibility has a one in three chance of occurring, so that the answer to the question is that the probability is 1/3.

Of course, this is wrong! A fair coin has a probability of 1/2 of showing heads and, by independence, standard probability theory says that we should multiply these together for each coin to get the correct answer of {\frac12\times\frac12=\frac14}, which can be verified by experiment. Alternatively, we can note that the outcome of one tail and one head, in reality, consists of two equally likely possibilities. Either the first coin can be a head and the second a tail, or vice-versa. So, there are actually four equally likely possible outcomes, only one of which has both coins showing heads, again giving a probability of 1/4. Continue reading “Quantum Coin Tossing”

Local Time Continuity

Local time surface
Figure 1: Brownian motion and its local time surface

The local time of a semimartingale at a level x is a continuous increasing process, giving a measure of the amount of time that the process spends at the given level. As the definition involves stochastic integrals, it was only defined up to probability one. This can cause issues if we want to simultaneously consider local times at all levels. As x can be any real number, it can take uncountably many values and, as a union of uncountably many zero probability sets can have positive measure or, even, be unmeasurable, this is not sufficient to determine the entire local time ‘surface’

\displaystyle  (t,x)\mapsto L^x_t(\omega)

for almost all {\omega\in\Omega}. This is the common issue of choosing good versions of processes. In this case, we already have a continuous version in the time index but, as yet, have not constructed a good version jointly in the time and level. This issue arose in the post on the Ito–Tanaka–Meyer formula, for which we needed to choose a version which is jointly measurable. Although that was sufficient there, joint measurability is still not enough to uniquely determine the full set of local times, up to probability one. The ideal situation is when a version exists which is jointly continuous in both time and level, in which case we should work with this choice. This is always possible for continuous local martingales.

Theorem 1 Let X be a continuous local martingale. Then, the local times

\displaystyle  (t,x)\mapsto L^x_t

have a modification which is jointly continuous in x and t. Furthermore, this is almost surely {\gamma}-Hölder continuous w.r.t. x, for all {\gamma < 1/2} and over all bounded regions for t.

Continue reading “Local Time Continuity”

The Kolmogorov Continuity Theorem

Fractional BM
Figure 1: Fractional Brownian motion with H = 1/4, 1/2, 3/4

One of the common themes throughout the theory of continuous-time stochastic processes, is the importance of choosing good versions of processes. Specifying the finite distributions of a process is not sufficient to determine its sample paths so, if a continuous modification exists, then it makes sense to work with that. A relatively straightforward criterion ensuring the existence of a continuous version is provided by Kolmogorov’s continuity theorem.

For any positive real number {\gamma}, a map {f\colon E\rightarrow F} between metric spaces E and F is said to be {\gamma}-Hölder continuous if there exists a positive constant C satisfying

\displaystyle  d(f(x),f(y))\le Cd(x,y)^\gamma

for all {x,y\in E}. Hölder continuous functions are always continuous and, at least on bounded spaces, is a stronger property for larger values of the coefficient {\gamma}. So, if E is a bounded metric space and {\alpha\le\beta}, then every {\beta}-Hölder continuous map from E is also {\alpha}-Hölder continuous. In particular, 1-Hölder and Lipschitz continuity are equivalent.

Kolmogorov’s theorem gives simple conditions on the pairwise distributions of a process which guarantee the existence of a continuous modification but, also, states that the sample paths {t\mapsto X_t} are almost surely locally Hölder continuous. That is, they are almost surely Hölder continuous on every bounded interval. To start with, we look at real-valued processes. Throughout this post, we work with repect to a probability space {(\Omega,\mathcal F, {\mathbb P})}. There is no need to assume the existence of any filtration, since they play no part in the results here

Theorem 1 (Kolmogorov) Let {\{X_t\}_{t\ge0}} be a real-valued stochastic process such that there exists positive constants {\alpha,\beta,C} satisfying

\displaystyle  {\mathbb E}\left[\lvert X_t-X_s\rvert^\alpha\right]\le C\lvert t-s\vert^{1+\beta},

for all {s,t\ge0}. Then, X has a continuous modification which, with probability one, is locally {\gamma}-Hölder continuous for all {0 < \gamma < \beta/\alpha}.

Continue reading “The Kolmogorov Continuity Theorem”

The Ito-Tanaka-Meyer Formula

Ito’s lemma is one of the most important and useful results in the theory of stochastic calculus. This is a stochastic generalization of the chain rule, or change of variables formula, and differs from the classical deterministic formulas by the presence of a quadratic variation term. One drawback which can limit the applicability of Ito’s lemma in some situations, is that it only applies for twice continuously differentiable functions. However, the quadratic variation term can alternatively be expressed using local times, which relaxes the differentiability requirement. This generalization of Ito’s lemma was derived by Tanaka and Meyer, and applies to one dimensional semimartingales.

The local time of a stochastic process X at a fixed level x can be written, very informally, as an integral of a Dirac delta function with respect to the continuous part of the quadratic variation {[X]^{c}},

\displaystyle  L^x_t=\int_0^t\delta(X-x)d[X]^c. (1)

This was explained in an earlier post. As the Dirac delta is only a distribution, and not a true function, equation (1) is not really a well-defined mathematical expression. However, as we saw, with some manipulation a valid expression can be obtained which defines the local time whenever X is a semimartingale.

Going in a slightly different direction, we can try multiplying (1) by a bounded measurable function {f(x)} and integrating over x. Commuting the order of integration on the right hand side, and applying the defining property of the delta function, that {\int f(X-x)\delta(x)dx} is equal to {f(X)}, gives

\displaystyle  \int_{-\infty}^{\infty} L^x_t f(x)dx=\int_0^tf(X)d[X]^c. (2)

By eliminating the delta function, the right hand side has been transformed into a well-defined expression. In fact, it is now the left side of the identity that is a problem, since the local time was only defined up to probability one at each level x. Ignoring this issue for the moment, recall the version of Ito’s lemma for general non-continuous semimartingales,

\displaystyle  \begin{aligned} f(X_t)=& f(X_0)+\int_0^t f^{\prime}(X_-)dX+\frac12A_t\\ &\quad+\sum_{s\le t}\left(\Delta f(X_s)-f^\prime(X_{s-})\Delta X_s\right). \end{aligned} (3)

where {A_t=\int_0^t f^{\prime\prime}(X)d[X]^c}. Equation (2) allows us to express this quadratic variation term using local times,

\displaystyle  A_t=\int_{-\infty}^{\infty} L^x_t f^{\prime\prime}(x)dx.

The benefit of this form is that, even though it still uses the second derivative of {f}, it is only really necessary for this to exist in a weaker, measure theoretic, sense. Suppose that {f} is convex, or a linear combination of convex functions. Then, its right-hand derivative {f^\prime(x+)} exists, and is itself of locally finite variation. Hence, the Stieltjes integral {\int L^xdf^\prime(x+)} exists. The infinitesimal {df^\prime(x+)} is alternatively written {f^{\prime\prime}(dx)} and, in the twice continuously differentiable case, equals {f^{\prime\prime}(x)dx}. Then,

\displaystyle  A_t=\int _{-\infty}^{\infty} L^x_t f^{\prime\prime}(dx). (4)

Using this expression in (3) gives the Ito-Tanaka-Meyer formula. Continue reading “The Ito-Tanaka-Meyer Formula”

The Stochastic Fubini Theorem

Fubini’s theorem states that, subject to precise conditions, it is possible to switch the order of integration when computing double integrals. In the theory of stochastic calculus, we also encounter double integrals and would like to be able to commute their order. However, since these can involve stochastic integration rather than the usual deterministic case, the classical results are not always applicable. To help with such cases, we could do with a new stochastic version of Fubini’s theorem. Here, I will consider the situation where one integral is of the standard kind with respect to a finite measure, and the other is stochastic. To start, recall the classical Fubini theorem.

Theorem 1 (Fubini) Let {(E,\mathcal E,\mu)} and {(F,\mathcal F,\nu)} be finite measure spaces, and {f\colon E\times F\rightarrow{\mathbb R}} be a bounded {\mathcal E\otimes\mathcal F}-measurable function. Then,

\displaystyle  y\mapsto\int f(x,y)d\mu(x)

is {\mathcal F}-measurable,

\displaystyle  x\mapsto\int f(x,y)d\nu(y)

is {\mathcal E}-measurable, and,

\displaystyle  \int\int f(x,y)d\mu(x)d\nu(y)=\int\int f(x,y)d\nu(x)d\mu(y). (1)

Continue reading “The Stochastic Fubini Theorem”

The Khintchine Inequality

For a Rademacher sequence {X=(X_1,X_2,\ldots)} and square summable sequence of real numbers {a=(a_1,a_2,\ldots)}, the Khintchine inequality provides upper and lower bounds for the moments of the random variable,

\displaystyle  a\cdot X=a_1X_1+a_2X_2+\cdots.

We use {\ell^2} for the space of square summable real sequences and

\displaystyle  \lVert a\rVert_2=\left(a_1^2+a_2^2+\cdots\right)^{1/2}

for the associated Banach norm.

Theorem 1 (Khintchine) For each {0 < p < \infty}, there exists positive constants {c_p,C_p} such that,

\displaystyle  c_p\lVert a\rVert_2^p\le{\mathbb E}\left[\lvert a\cdot X\rvert^p\right]\le C_p\lVert a\rVert_2^p, (1)

for all {a\in\ell^2}.

Continue reading “The Khintchine Inequality”

Rademacher Series

The Rademacher distribution is probably the simplest nontrivial probability distribution that you can imagine. This is a discrete distribution taking only the two possible values {\{1,-1\}}, each occurring with equal probability. A random variable X has the Rademacher distribution if

\displaystyle  {\mathbb P}(X=1)={\mathbb P}(X=-1)=1/2.

A Randemacher sequence is an IID sequence of Rademacher random variables,

\displaystyle  X = (X_1,X_2,X_3\ldots).

Recall that the partial sums {S_N=\sum_{n=1}^NX_n} of a Rademacher sequence is a simple random walk. Generalizing a bit, we can consider scaling by a sequence of real weights {a_1,a_2,\ldots}, so that {S_N=\sum_{n=1}^Na_nX_n}. I will concentrate on infinite sums, as N goes to infinity, which will clearly include the finite Rademacher sums as the subset with only finitely many nonzero weights.

Rademacher series serve as simple prototypes of more general IID series, but also have applications in various areas. Results include concentration and anti-concentration inequalities, and the Khintchine inequality, which imply various properties of {L^p} spaces and of linear maps between them. For example, in my notes constructing the stochastic integral starting from a minimal set of assumptions, the {L^0} version of the Khintchine inequality was required. Rademacher series are also interesting in their own right, and a source of some very simple statements which are nevertheless quite difficult to prove, some of which are still open problems. See, for example, Some explorations on two conjectures about Rademacher sequences by Hu, Lan and Sun. As I would like to look at some of these problems in the blog, I include this post to outline the basic constructions. One intriguing aspect of Rademacher series, is the way that they mix discrete distributions with combinatorial aspects, and continuous distributions. On the one hand, by the central limit theorem, Rademacher series can often be approximated well by a Gaussian distribution but, on the other hand, they depend on the discrete set of signs of the individual variables in the sum. Continue reading “Rademacher Series”

Pathwise Burkholder-Davis-Gundy Inequalities

As covered earlier in my notes, the Burkholder-David-Gundy inequality relates the moments of the maximum of a local martingale M with its quadratic variation,

\displaystyle  c_p^{-1}{\mathbb E}[[M]^{p/2}_\tau]\le{\mathbb E}[\bar M_\tau^p]\le C_p{\mathbb E}[[M]^{p/2}_\tau]. (1)

Here, {\bar M_t\equiv\sup_{s\le t}\lvert M_s\rvert} is the running maximum, {[M]} is the quadratic variation, {\tau} is a stopping time, and the exponent {p} is a real number greater than or equal to 1. Then, {c_p} and {C_p} are positive constants depending on p, but independent of the choice of local martingale and stopping time. Furthermore, for continuous local martingales, which are the focus of this post, the inequality holds for all {p > 0}.

Since the quadratic variation used in my notes, by definition, starts at zero, the BDG inequality also required the local martingale to start at zero. This is not an important restriction, but it can be removed by requiring the quadratic variation to start at {[M]_0=M_0^2}. Henceforth, I will assume that this is the case, which means that if we are working with the definition in my notes then we should add {M_0^2} everywhere to the quadratic variation {[M]}.

In keeping with the theme of the previous post on Doob’s inequalities, such martingale inequalities should have pathwise versions of the form

\displaystyle  c_p^{-1}[M]^{p/2}+\int\alpha dM\le\bar M^p\le C_p[M]^{p/2}+\int\beta dM (2)

for predictable processes {\alpha,\beta}. Inequalities in this form are considerably stronger than (1), since they apply on all sample paths, not just on average. Also, we do not require M to be a local martingale — it is sufficient to be a (continuous) semimartingale. However, in the case where M is a local martingale, the pathwise version (2) does imply the BDG inequality (1), using the fact that stochastic integration preserves the local martingale property.

Lemma 1 Let X and Y be nonnegative increasing measurable processes satisfying {X\le Y-N} for a local (sub)martingale N starting from zero. Then, {{\mathbb E}[X_\tau]\le{\mathbb E}[Y_\tau]} for all stopping times {\tau}.

Proof: Let {\tau_n} be an increasing sequence of bounded stopping times increasing to infinity such that the stopped processes {N^{\tau_n}} are submartingales. Then,

\displaystyle  {\mathbb E}[1_{\{\tau_n\ge\tau\}}X_\tau]\le{\mathbb E}[X_{\tau_n\wedge\tau}]={\mathbb E}[Y_{\tau_n\wedge\tau}]-{\mathbb E}[N_{\tau_n\wedge\tau}]\le{\mathbb E}[Y_{\tau_n\wedge\tau}]\le{\mathbb E}[Y_\tau].

Letting n increase to infinity and using monotone convergence on the left hand side gives the result. ⬜

Moving on to the main statements of this post, I will mention that there are actually many different pathwise versions of the BDG inequalities. I opt for the especially simple statements given in Theorem 2 below. See the papers Pathwise Versions of the Burkholder-Davis Gundy Inequality by Bieglböck and Siorpaes, and Applications of Pathwise Burkholder-Davis-Gundy inequalities by Soirpaes, for slightly different approaches, although these papers do also effectively contain proofs of (3,4) for the special case of {r=1/2}. As usual, I am using {x\vee y} to represent the maximum of two numbers.

Theorem 2 Let X and Y be nonnegative continuous processes with {X_0=Y_0}. For any {0 < r\le1} we have,

\displaystyle  (1-r)\bar X^r\le (3-2r)\bar Y^r+r\int(\bar X\vee\bar Y)^{r-1}d(X-Y) (3)

and, if X is increasing, this can be improved to,

\displaystyle  \bar X^r\le (2-r)\bar Y^r+r\int(\bar X\vee\bar Y)^{r-1}d(X-Y). (4)

If {r\ge1} and X is increasing then,

\displaystyle  \bar X^r\le r^{r\vee 2}\,\bar Y^r+r^2\int(\bar X\vee\bar Y)^{r-1}d(X-Y). (5)

Continue reading “Pathwise Burkholder-Davis-Gundy Inequalities”

Pathwise Martingale Inequalities

Recall Doob’s inequalities, covered earlier in these notes, which bound expectations of functions of the maximum of a martingale in terms of its terminal distribution. Although these are often applied to martingales, they hold true more generally for cadlag submartingales. Here, I use {\bar X_t\equiv\sup_{s\le t}X_s} to denote the running maximum of a process.

Theorem 1 Let X be a nonnegative cadlag submartingale. Then,

  • {{\mathbb P}\left(\bar X_t \ge K\right)\le K^{-1}{\mathbb E}[X_t]} for all {K > 0}.
  • {\lVert\bar X_t\rVert_p\le (p/(p-1))\lVert X_t\rVert_p} for all {p > 1}.
  • {{\mathbb E}[\bar X_t]\le(e/(e-1)){\mathbb E}[X_t\log X_t+1]}.

In particular, for a cadlag martingale X, then {\lvert X\rvert} is a submartingale, so theorem 1 applies with {\lvert X\rvert} in place of X.

We also saw the following much stronger (sub)martingale inequality in the post on the maximum maximum of martingales with known terminal distribution.

Theorem 2 Let X be a cadlag submartingale. Then, for any real K and nonnegative real t,

\displaystyle  {\mathbb P}(\bar X_t\ge K)\le\inf_{x < K}\frac{{\mathbb E}[(X_t-x)_+]}{K-x}. (1)

This is particularly sharp, in the sense that for any distribution for {X_t}, there exists a martingale with this terminal distribution for which (1) becomes an equality simultaneously for all values of K. Furthermore, all of the inequalities stated in theorem 1 follow from (1). For example, the first one is obtained by taking {x=0} in (1). The remaining two can also be proved from (1) by integrating over K.

Note that all of the submartingale inequalities above are of the form

\displaystyle  {\mathbb E}[F(\bar X_t)]\le{\mathbb E}[G(X_t)] (2)

for certain choices of functions {F,G\colon{\mathbb R}\rightarrow{\mathbb R}^+}. The aim of this post is to show how they have a more general `pathwise’ form,

\displaystyle  F(\bar X_t)\le G(X_t) - \int_0^t\xi\,dX (3)

for some nonnegative predictable process {\xi}. It is relatively straightforward to show that (2) follows from (3) by noting that the integral is a submartingale and, hence, has nonnegative expectation. To be rigorous, there are some integrability considerations to deal with, so a proof will be included later in this post.

Inequality (3) is required to hold almost everywhere, and not just in expectation, so is a considerably stronger statement than the standard martingale inequalities. Furthermore, it is not necessary for X to be a submartingale for (3) to make sense, as it holds for all semimartingales. We can go further, and even drop the requirement that X is a semimartingale. As we will see, in the examples covered in this post, {\xi_t} will be of the form {h(\bar X_{t-})} for an increasing right-continuous function {h\colon{\mathbb R}\rightarrow{\mathbb R}}, so integration by parts can be used,

\displaystyle  \int h(\bar X_-)\,dX = h(\bar X)X-h(\bar X_0)X_0 - \int X\,dh(\bar X). (4)

The right hand side of (4) is well-defined for any cadlag real-valued process, by using the pathwise Lebesgue–Stieltjes integral with respect to the increasing process {h(\bar X)}, so can be used as the definition of {\int h(\bar X_-)dX}. In the case where X is a semimartingale, integration by parts ensures that this agrees with the stochastic integral {\int\xi\,dX}. Since we now have an interpretation of (3) in a pathwise sense for all cadlag processes X, it is no longer required to suppose that X is a submartingale, a semimartingale, or even require the existence of an underlying probability space. All that is necessary is for {t\mapsto X_t} to be a cadlag real-valued function. Hence, we reduce the martingale inequalities to straightforward results of real-analysis not requiring any probability theory and, consequently, are much more general. I state the precise pathwise generalizations of Doob’s inequalities now, leaving the proof until later in the post. As the first of inequality of theorem 1 is just the special case of (1) with {x=0}, we do not need to explicitly include this here.

Theorem 3 Let X be a cadlag process and t be a nonnegative time.

  1. For real {K > x},
    \displaystyle  1_{\{\bar X_t\ge K\}}\le\frac{(X_t-x)_+}{K-x}-\int_0^t\xi\,dX (5)

    where {\xi=(K-x)^{-1}1_{\{\bar X_-\ge K\}}}.

  2. If X is nonnegative and p,q are positive reals with {p^{-1}+q^{-1}=1} then,
    \displaystyle  \bar X_t^p\le q^p X^p_t-\int_0^t\xi dX (6)

    where {\xi=pq\bar X_-^{p-1}}.

  3. If X is nonnegative then,
    \displaystyle  \bar X_t\le\frac{e}{e-1}\left( X_t \log X_t +1\right)-\int_0^t\xi\,dX (7)

    where {\xi=\frac{e}{e-1}\log(\bar X_-\vee1)}.

Continue reading “Pathwise Martingale Inequalities”

Semimartingale Local Times

Figure 1: Brownian motion B with local time L and auxiliary Brownian motion W

For a stochastic process X taking values in a state space E, its local time at a point {x\in E} is a measure of the time spent at x. For a continuous time stochastic process, we could try and simply compute the Lebesgue measure of the time at the level,

\displaystyle  L^x_t=\int_0^t1_{\{X_s=x\}}ds. (1)

For processes which hit the level {x} and stick there for some time, this makes some sense. However, if X is a standard Brownian motion, it will always give zero, so is not helpful. Even though X will hit every real value infinitely often, continuity of the normal distribution gives {{\mathbb P}(X_s=x)=0} at each positive time, so that that {L^x_t} defined by (1) will have zero expectation.

Rather than the indicator function of {\{X=x\}} as in (1), an alternative is to use the Dirac delta function,

\displaystyle  L^x_t=\int_0^t\delta(X_s-x)\,ds. (2)

Unfortunately, the Dirac delta is not a true function, it is a distribution, so (2) is not a well-defined expression. However, if it can be made rigorous, then it does seem to have some of the properties we would want. For example, the expectation {{\mathbb E}[\delta(X_s-x)]} can be interpreted as the probability density of {X_s} evaluated at {x}, which has a positive and finite value, so it should lead to positive and finite local times. Equation (2) still relies on the Lebesgue measure over the time index, so will not behave as we may expect under time changes, and will not make sense for processes without a continuous probability density. A better approach is to integrate with respect to the quadratic variation,

\displaystyle  L^x_t=\int_0^t\delta(X_s-x)d[X]_s (3)

which, for Brownian motion, amounts to the same thing. Although (3) is still not a well-defined expression, since it still involves the Dirac delta, the idea is to come up with a definition which amounts to the same thing in spirit. Important properties that it should satisfy are that it is an adapted, continuous and increasing process with increments supported on the set {\{X=x\}},

\displaystyle  L^x_t=\int_0^t1_{\{X_s=x\}}dL^x_s.

Local times are a very useful and interesting part of stochastic calculus, and finds important applications to excursion theory, stochastic integration and stochastic differential equations. However, I have not covered this subject in my notes, so do this now. Recalling Ito’s lemma for a function {f(X)} of a semimartingale X, this involves a term of the form {\int f^{\prime\prime}(X)d[X]} and, hence, requires {f} to be twice differentiable. If we were to try to apply the Ito formula for functions which are not twice differentiable, then {f^{\prime\prime}} can be understood in terms of distributions, and delta functions can appear, which brings local times into the picture. In the opposite direction, which I take in this post, we can try to generalise Ito’s formula and invert this to give a meaning to (3). Continue reading “Semimartingale Local Times”