Is this a Blog?

Well, according to the tagline, it is a “random mathematical blog”, and is hosted on the popular blogging platform, WordPress.com. According to Wikipedia, a blog is a

discussion or informational website published on the World Wide Web consisting of discrete, often informal diary-style text entries (posts).

Most of the posts on this site are not especially discrete, consist of some quite formal maths, and are not at all diary style. According to wix.com, blogs are

regularly updated websites that provide essential insights into a certain topic.

I like to think think that this site provides essential insights into stochastic calculus and probability theory, but how regular is regular? Daily? To qualify, it should probably be weekly, at least. Here, I can go a month or so without updating. It has sometimes been significantly longer between updates. I try and write posts containing proper rigorous mathematics, and to explain things as well as I can. The style that I aim for in many of the posts here are not unlike you might see in published maths papers or in textbooks. These take some time, and I cannot just rush out a post containing detailed proofs and explanations of mathematical theory. It is not my job, just something that I like to do. I like to work through advanced maths subjects, prove results, and to get a good understanding of these. It would be possible to post weekly, but the style would not be the same and the mathematical content would be much reduced and at a shallower level, which is not what I would enjoy doing.

So, no, this is not a blog!

I should probably change the tagline. This is a random mathematical website. I would however like to change the style of the site a bit. The idea is to update with detailed maths posts, as always, but not on a weekly basis. However, I am inclined to also do some weekly updates. Just short updates on mathematical subjects, or anything related to this website. I’ll see how it goes…

The Khintchine Inequality

For a Rademacher sequence {X=(X_1,X_2,\ldots)} and square summable sequence of real numbers {a=(a_1,a_2,\ldots)}, the Khintchine inequality provides upper and lower bounds for the moments of the random variable,

\displaystyle  a\cdot X=a_1X_1+a_2X_2+\cdots.

We use {\ell^2} for the space of square summable real sequences and

\displaystyle  \lVert a\rVert_2=\left(a_1^2+a_2^2+\cdots\right)^{1/2}

for the associated Banach norm.

Theorem 1 (Khintchine) For each {0 < p < \infty}, there exists positive constants {c_p,C_p} such that,

\displaystyle  c_p\lVert a\rVert_2^p\le{\mathbb E}\left[\lvert a\cdot X\rvert^p\right]\le C_p\lVert a\rVert_2^p, (1)

for all {a\in\ell^2}.

Continue reading “The Khintchine Inequality”

Rademacher Series

The Rademacher distribution is probably the simplest nontrivial probability distribution that you can imagine. This is a discrete distribution taking only the two possible values {\{1,-1\}}, each occurring with equal probability. A random variable X has the Rademacher distribution if

\displaystyle  {\mathbb P}(X=1)={\mathbb P}(X=-1)=1/2.

A Randemacher sequence is an IID sequence of Rademacher random variables,

\displaystyle  X = (X_1,X_2,X_3\ldots).

Recall that the partial sums {S_N=\sum_{n=1}^NX_n} of a Rademacher sequence is a simple random walk. Generalizing a bit, we can consider scaling by a sequence of real weights {a_1,a_2,\ldots}, so that {S_N=\sum_{n=1}^Na_nX_n}. I will concentrate on infinite sums, as N goes to infinity, which will clearly include the finite Rademacher sums as the subset with only finitely many nonzero weights.

Rademacher series serve as simple prototypes of more general IID series, but also have applications in various areas. Results include concentration and anti-concentration inequalities, and the Khintchine inequality, which imply various properties of {L^p} spaces and of linear maps between them. For example, in my notes constructing the stochastic integral starting from a minimal set of assumptions, the {L^0} version of the Khintchine inequality was required. Rademacher series are also interesting in their own right, and a source of some very simple statements which are nevertheless quite difficult to prove, some of which are still open problems. See, for example, Some explorations on two conjectures about Rademacher sequences by Hu, Lan and Sun. As I would like to look at some of these problems in the blog, I include this post to outline the basic constructions. One intriguing aspect of Rademacher series, is the way that they mix discrete distributions with combinatorial aspects, and continuous distributions. On the one hand, by the central limit theorem, Rademacher series can often be approximated well by a Gaussian distribution but, on the other hand, they depend on the discrete set of signs of the individual variables in the sum. Continue reading “Rademacher Series”

Pathwise Burkholder-Davis-Gundy Inequalities

As covered earlier in my notes, the Burkholder-David-Gundy inequality relates the moments of the maximum of a local martingale M with its quadratic variation,

\displaystyle  c_p^{-1}{\mathbb E}[[M]^{p/2}_\tau]\le{\mathbb E}[\bar M_\tau^p]\le C_p{\mathbb E}[[M]^{p/2}_\tau]. (1)

Here, {\bar M_t\equiv\sup_{s\le t}\lvert M_s\rvert} is the running maximum, {[M]} is the quadratic variation, {\tau} is a stopping time, and the exponent {p} is a real number greater than or equal to 1. Then, {c_p} and {C_p} are positive constants depending on p, but independent of the choice of local martingale and stopping time. Furthermore, for continuous local martingales, which are the focus of this post, the inequality holds for all {p > 0}.

Since the quadratic variation used in my notes, by definition, starts at zero, the BDG inequality also required the local martingale to start at zero. This is not an important restriction, but it can be removed by requiring the quadratic variation to start at {[M]_0=M_0^2}. Henceforth, I will assume that this is the case, which means that if we are working with the definition in my notes then we should add {M_0^2} everywhere to the quadratic variation {[M]}.

In keeping with the theme of the previous post on Doob’s inequalities, such martingale inequalities should have pathwise versions of the form

\displaystyle  c_p^{-1}[M]^{p/2}+\int\alpha dM\le\bar M^p\le C_p[M]^{p/2}+\int\beta dM (2)

for predictable processes {\alpha,\beta}. Inequalities in this form are considerably stronger than (1), since they apply on all sample paths, not just on average. Also, we do not require M to be a local martingale — it is sufficient to be a (continuous) semimartingale. However, in the case where M is a local martingale, the pathwise version (2) does imply the BDG inequality (1), using the fact that stochastic integration preserves the local martingale property.

Lemma 1 Let X and Y be nonnegative increasing measurable processes satisfying {X\le Y-N} for a local (sub)martingale N starting from zero. Then, {{\mathbb E}[X_\tau]\le{\mathbb E}[Y_\tau]} for all stopping times {\tau}.

Proof: Let {\tau_n} be an increasing sequence of bounded stopping times increasing to infinity such that the stopped processes {N^{\tau_n}} are submartingales. Then,

\displaystyle  {\mathbb E}[1_{\{\tau_n\ge\tau\}}X_\tau]\le{\mathbb E}[X_{\tau_n\wedge\tau}]={\mathbb E}[Y_{\tau_n\wedge\tau}]-{\mathbb E}[N_{\tau_n\wedge\tau}]\le{\mathbb E}[Y_{\tau_n\wedge\tau}]\le{\mathbb E}[Y_\tau].

Letting n increase to infinity and using monotone convergence on the left hand side gives the result. ⬜

Moving on to the main statements of this post, I will mention that there are actually many different pathwise versions of the BDG inequalities. I opt for the especially simple statements given in Theorem 2 below. See the papers Pathwise Versions of the Burkholder-Davis Gundy Inequality by Bieglböck and Siorpaes, and Applications of Pathwise Burkholder-Davis-Gundy inequalities by Soirpaes, for slightly different approaches, although these papers do also effectively contain proofs of (3,4) for the special case of {r=1/2}. As usual, I am using {x\vee y} to represent the maximum of two numbers.

Theorem 2 Let X and Y be nonnegative continuous processes with {X_0=Y_0}. For any {0 < r\le1} we have,

\displaystyle  (1-r)\bar X^r\le (3-2r)\bar Y^r+r\int(\bar X\vee\bar Y)^{r-1}d(X-Y) (3)

and, if X is increasing, this can be improved to,

\displaystyle  \bar X^r\le (2-r)\bar Y^r+r\int(\bar X\vee\bar Y)^{r-1}d(X-Y). (4)

If {r\ge1} and X is increasing then,

\displaystyle  \bar X^r\le r^{r\vee 2}\,\bar Y^r+r^2\int(\bar X\vee\bar Y)^{r-1}d(X-Y). (5)

Continue reading “Pathwise Burkholder-Davis-Gundy Inequalities”

Pathwise Martingale Inequalities

Recall Doob’s inequalities, covered earlier in these notes, which bound expectations of functions of the maximum of a martingale in terms of its terminal distribution. Although these are often applied to martingales, they hold true more generally for cadlag submartingales. Here, I use {\bar X_t\equiv\sup_{s\le t}X_s} to denote the running maximum of a process.

Theorem 1 Let X be a nonnegative cadlag submartingale. Then,

  • {{\mathbb P}\left(\bar X_t \ge K\right)\le K^{-1}{\mathbb E}[X_t]} for all {K > 0}.
  • {\lVert\bar X_t\rVert_p\le (p/(p-1))\lVert X_t\rVert_p} for all {p > 1}.
  • {{\mathbb E}[\bar X_t]\le(e/(e-1)){\mathbb E}[X_t\log X_t+1]}.

In particular, for a cadlag martingale X, then {\lvert X\rvert} is a submartingale, so theorem 1 applies with {\lvert X\rvert} in place of X.

We also saw the following much stronger (sub)martingale inequality in the post on the maximum maximum of martingales with known terminal distribution.

Theorem 2 Let X be a cadlag submartingale. Then, for any real K and nonnegative real t,

\displaystyle  {\mathbb P}(\bar X_t\ge K)\le\inf_{x < K}\frac{{\mathbb E}[(X_t-x)_+]}{K-x}. (1)

This is particularly sharp, in the sense that for any distribution for {X_t}, there exists a martingale with this terminal distribution for which (1) becomes an equality simultaneously for all values of K. Furthermore, all of the inequalities stated in theorem 1 follow from (1). For example, the first one is obtained by taking {x=0} in (1). The remaining two can also be proved from (1) by integrating over K.

Note that all of the submartingale inequalities above are of the form

\displaystyle  {\mathbb E}[F(\bar X_t)]\le{\mathbb E}[G(X_t)] (2)

for certain choices of functions {F,G\colon{\mathbb R}\rightarrow{\mathbb R}^+}. The aim of this post is to show how they have a more general `pathwise’ form,

\displaystyle  F(\bar X_t)\le G(X_t) - \int_0^t\xi\,dX (3)

for some nonnegative predictable process {\xi}. It is relatively straightforward to show that (2) follows from (3) by noting that the integral is a submartingale and, hence, has nonnegative expectation. To be rigorous, there are some integrability considerations to deal with, so a proof will be included later in this post.

Inequality (3) is required to hold almost everywhere, and not just in expectation, so is a considerably stronger statement than the standard martingale inequalities. Furthermore, it is not necessary for X to be a submartingale for (3) to make sense, as it holds for all semimartingales. We can go further, and even drop the requirement that X is a semimartingale. As we will see, in the examples covered in this post, {\xi_t} will be of the form {h(\bar X_{t-})} for an increasing right-continuous function {h\colon{\mathbb R}\rightarrow{\mathbb R}}, so integration by parts can be used,

\displaystyle  \int h(\bar X_-)\,dX = h(\bar X)X-h(\bar X_0)X_0 - \int X\,dh(\bar X). (4)

The right hand side of (4) is well-defined for any cadlag real-valued process, by using the pathwise Lebesgue–Stieltjes integral with respect to the increasing process {h(\bar X)}, so can be used as the definition of {\int h(\bar X_-)dX}. In the case where X is a semimartingale, integration by parts ensures that this agrees with the stochastic integral {\int\xi\,dX}. Since we now have an interpretation of (3) in a pathwise sense for all cadlag processes X, it is no longer required to suppose that X is a submartingale, a semimartingale, or even require the existence of an underlying probability space. All that is necessary is for {t\mapsto X_t} to be a cadlag real-valued function. Hence, we reduce the martingale inequalities to straightforward results of real-analysis not requiring any probability theory and, consequently, are much more general. I state the precise pathwise generalizations of Doob’s inequalities now, leaving the proof until later in the post. As the first of inequality of theorem 1 is just the special case of (1) with {x=0}, we do not need to explicitly include this here.

Theorem 3 Let X be a cadlag process and t be a nonnegative time.

  1. For real {K > x},
    \displaystyle  1_{\{\bar X_t\ge K\}}\le\frac{(X_t-x)_+}{K-x}-\int_0^t\xi\,dX (5)

    where {\xi=(K-x)^{-1}1_{\{\bar X_-\ge K\}}}.

  2. If X is nonnegative and p,q are positive reals with {p^{-1}+q^{-1}=1} then,
    \displaystyle  \bar X_t^p\le q^p X^p_t-\int_0^t\xi dX (6)

    where {\xi=pq\bar X_-^{p-1}}.

  3. If X is nonnegative then,
    \displaystyle  \bar X_t\le\frac{e}{e-1}\left( X_t \log X_t +1\right)-\int_0^t\xi\,dX (7)

    where {\xi=\frac{e}{e-1}\log(\bar X_-\vee1)}.

Continue reading “Pathwise Martingale Inequalities”

Semimartingale Local Times

Figure 1: Brownian motion B with local time L and auxiliary Brownian motion W

For a stochastic process X taking values in a state space E, its local time at a point {x\in E} is a measure of the time spent at x. For a continuous time stochastic process, we could try and simply compute the Lebesgue measure of the time at the level,

\displaystyle  L^x_t=\int_0^t1_{\{X_s=x\}}ds. (1)

For processes which hit the level {x} and stick there for some time, this makes some sense. However, if X is a standard Brownian motion, it will always give zero, so is not helpful. Even though X will hit every real value infinitely often, continuity of the normal distribution gives {{\mathbb P}(X_s=x)=0} at each positive time, so that that {L^x_t} defined by (1) will have zero expectation.

Rather than the indicator function of {\{X=x\}} as in (1), an alternative is to use the Dirac delta function,

\displaystyle  L^x_t=\int_0^t\delta(X_s-x)\,ds. (2)

Unfortunately, the Dirac delta is not a true function, it is a distribution, so (2) is not a well-defined expression. However, if it can be made rigorous, then it does seem to have some of the properties we would want. For example, the expectation {{\mathbb E}[\delta(X_s-x)]} can be interpreted as the probability density of {X_s} evaluated at {x}, which has a positive and finite value, so it should lead to positive and finite local times. Equation (2) still relies on the Lebesgue measure over the time index, so will not behave as we may expect under time changes, and will not make sense for processes without a continuous probability density. A better approach is to integrate with respect to the quadratic variation,

\displaystyle  L^x_t=\int_0^t\delta(X_s-x)d[X]_s (3)

which, for Brownian motion, amounts to the same thing. Although (3) is still not a well-defined expression, since it still involves the Dirac delta, the idea is to come up with a definition which amounts to the same thing in spirit. Important properties that it should satisfy are that it is an adapted, continuous and increasing process with increments supported on the set {\{X=x\}},

\displaystyle  L^x_t=\int_0^t1_{\{X_s=x\}}dL^x_s.

Local times are a very useful and interesting part of stochastic calculus, and finds important applications to excursion theory, stochastic integration and stochastic differential equations. However, I have not covered this subject in my notes, so do this now. Recalling Ito’s lemma for a function {f(X)} of a semimartingale X, this involves a term of the form {\int f^{\prime\prime}(X)d[X]} and, hence, requires {f} to be twice differentiable. If we were to try to apply the Ito formula for functions which are not twice differentiable, then {f^{\prime\prime}} can be understood in terms of distributions, and delta functions can appear, which brings local times into the picture. In the opposite direction, which I take in this post, we can try to generalise Ito’s formula and invert this to give a meaning to (3). Continue reading “Semimartingale Local Times”

A Process With Hidden Drift

Consider a stochastic process X of the form

\displaystyle  X_t=W_t+\int_0^t\xi_sds, (1)

for a standard Brownian motion W and predictable process {\xi}, defined with respect to a filtered probability space {(\Omega,\mathcal F,\{\mathcal F_t\}_{t\in{\mathbb R}_+},{\mathbb P})}. For this to make sense, we must assume that {\int_0^t\lvert\xi_s\rvert ds} is almost surely finite at all times, and I will suppose that {\mathcal F_\cdot} is the filtration generated by W.

The question is whether the drift {\xi} can be backed out from knowledge of the process X alone. As I will show with an example, this is not possible. In fact, in our example, X will itself be a standard Brownian motion, even though the drift {\xi} is non-trivial (that is, {\int\xi dt} is not almost surely zero). In this case X has exactly the same distribution as W, so cannot be distinguished from the driftless case with {\xi=0} by looking at the distribution of X alone.

On the face of it, this seems rather counter-intuitive. By standard semimartingale decomposition, it is known that we can always decompose

\displaystyle  X=M+A (2)

for a unique continuous local martingale M starting from zero, and unique continuous FV process A. By uniqueness, {M=W} and {A=\int\xi dt}. This allows us to back out the drift {\xi} and, in particular, if the drift is non-trivial then X cannot be a martingale. However, in the semimartingale decomposition, it is required that M is a martingale with respect to the original filtration {\mathcal F_\cdot}. If we do not know the filtration {\mathcal F_\cdot}, then it might not be possible to construct decomposition (2) from knowledge of X alone. As mentioned above, we will give an example where X is a standard Brownian motion which, in particular, means that it is a martingale under its natural filtration. By the semimartingale decomposition result, it is not possible for X to be an {\mathcal F_\cdot}-martingale. A consequence of this is that the natural filtration of X must be strictly smaller than the natural filtration of W.

The inspiration for this post was a comment by Gabe posing the following question: If we take {\mathbb F} to be the filtration generated by a standard Brownian motion W in {(\Omega,\mathcal F,{\mathbb P})}, and we define {\tilde W_t=W_t+\int_0^t\Theta_udu}, can we find an {\mathbb F}-adapted {\Theta} such that the filtration generated by {\tilde W} is smaller than {\mathbb F}? Our example gives an affirmative answer. Continue reading “A Process With Hidden Drift”

Completions of *-Probability Spaces

We previously defined noncommutative probability spaces as a *-algebra together with a nondegenerate state satisfying a completeness property. Justification for the stated definition was twofold. First, an argument similar to the construction of measurable random variables on classical probability spaces was used, by taking all possible limits for which an expectation can reasonably be defined. Second, I stated various natural mathematical properties of this construction, including the existence of completions and their functorial property, which allows us to pass from preprobability spaces, and homomorphisms between these, to the NC probability spaces which they generate. However, the statements were given without proof, so the purpose of the current post is to establish these results. Specifically, I will give proofs of each of the theorems stated in the post on noncommutative probability spaces, with the exception of the two theorems relating commutative *-probability spaces to their classical counterpart (theorems 2 and 10), which will be looked at in a later post. Continue reading “Completions of *-Probability Spaces”

Noncommutative Probability Spaces

In classical probability theory, we start with a sample space {\Omega}, a collection {\mathcal F} of events, which is a sigma-algebra on {\Omega}, and a probability measure {{\mathbb P}} on {(\Omega,\mathcal F)}. The triple {(\Omega,\mathcal F,{\mathbb P})} is a probability space, and the collection {L^\infty(\Omega,\mathcal F,{\mathbb P})} of bounded complex-valued random variables on the probability space forms a commutative algebra under pointwise addition and products. The measure {{\mathbb P}} defines an expectation, or integral with respect to {{\mathbb P}}, which is a linear map

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} &\displaystyle L^\infty(\Omega,\mathcal F,{\mathbb P})\rightarrow{\mathbb C},\smallskip\\ &\displaystyle X\mapsto{\mathbb E}[X]=\int X(\omega)d{\mathbb P}(\omega). \end{array}

In this post I provide definitions of probability spaces from the algebraic viewpoint. Statements of some of their first properties will be given in order to justify and clarify the definitions, although any proofs will be left until later posts. In the algebraic setting, we begin with a *-algebra {\mathcal A}, which takes the place of the collection of bounded random variables from the classical theory. It is not necessary for the algebra to be represented as a space of functions from an underlying sample space. Since the individual points {\omega\in\Omega} constituting the sample space are not required in the theory, this is a pointless approach. By allowing multiplication of `random variables’ in {\mathcal A} to be noncommutative, we incorporate probability spaces which have no counterpart in the classical setting, such as are used in quantum theory. The second and final ingredient is a state on the algebra, taking the place of the classical expectation operator. This is a linear map {p\colon\mathcal A\rightarrow{\mathbb C}} satisfying the positivity constraint {p(a^*a)\ge1} and, when {\mathcal A} is unitial, the normalisation condition {p(1)=1}. Algebraic, or noncommutative probability spaces are completely described by a pair {(\mathcal A,p)} consisting of a *-algebra {\mathcal A} and a state {p}. Noncommutative examples include the *-algebra of bounded linear operators on a Hilbert space with pure state {p(a)=\langle\xi,a\xi\rangle} for a fixed unit vector {\xi}. Continue reading “Noncommutative Probability Spaces”

The GNS Representation

As is well known, the space of bounded linear operators on any Hilbert space forms a *-algebra, and (pure) states on this algebra are defined by unit vectors. Considering a Hilbert space {\mathcal H}, the space of bounded linear operators {\mathcal H\rightarrow\mathcal H} is denoted as {B(\mathcal H)}. This forms an algebra under the usual pointwise addition and scalar multiplication operators, and involution of the algebra is given by the operator adjoint,

\displaystyle  \langle x,a^*y\rangle=\langle ax,y\rangle

for any {a\in B(\mathcal H)} and all {x,y\in\mathcal H}. A unit vector {\xi\in\mathcal H} defines a state {p\colon B(\mathcal H)\rightarrow{\mathbb C}} by {p(a)=\langle\xi,a\xi\rangle}.

The Gelfand-Naimark–Segal (GNS) representation allows us to go in the opposite direction and, starting from a state on an abstract *-algebra, realises this as a pure state on a *-subalgebra of {B(\mathcal H)} for some Hilbert space {\mathcal H}.

Consider a *-algebra {\mathcal A} and positive linear map {p\colon\mathcal A\rightarrow{\mathbb C}}. Recall that this defines a semi-inner product on the *-algebra {\mathcal A}, given by {\langle x,y\rangle=p(x^*y)}. The associated seminorm is denoted by {\lVert x\rVert_2=\sqrt{p(x^*x)}}, which we refer to as the {L^2}-seminorm. Also, every {a\in\mathcal A} defines a linear operator on {\mathcal A} by left-multiplication, {x\mapsto ax}. We use {\lVert a\rVert_\infty} to denote its operator norm, and refer to this as the {L^\infty}-seminorm. An element {a\in\mathcal A} is bounded if {\lVert a\rVert_\infty} is finite, and we say that {(\mathcal A,p)} is bounded if every {a\in\mathcal A} is bounded.

Theorem 1 Let {(\mathcal A,p)} be a bounded *-probability space. Then, there exists a triple {(\mathcal H,\pi,\xi)} where,

  • {\mathcal H} is a Hilbert space.
  • {\pi\colon\mathcal A\rightarrow B(\mathcal H)} is a *-homomorphism.
  • {\xi\in\mathcal H} satisfies {p(a)=\langle\xi,\pi(a)\xi\rangle} for all {a\in\mathcal A}.
  • {\xi} is cyclic for {\mathcal A}, so that {\{\pi(a)\xi\colon a\in\mathcal A\}} is dense in {\mathcal H}.

Furthermore, this representation is unique up to isomorphism: if {(\mathcal H^\prime,\pi^\prime,\xi^\prime)} is any other such triple, then there exists a unique invertible linear isometry of Hilbert spaces {L\colon\mathcal H\rightarrow\mathcal H^\prime} such that

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} &\displaystyle \pi^\prime(a)=L\pi(a)L^{-1},\smallskip\\ &\displaystyle \xi^\prime=L\xi. \end{array}

Continue reading “The GNS Representation”