In these notes, the approach taken to stochastic calculus revolves around stochastic integration and the theory of semimartingales. An alternative starting point would be to consider Markov processes. Although I do not take the second approach, all of the special processes considered in the current section are Markov, so it seems like a good idea to introduce the basic definitions and properties now. In fact, all of the special processes considered (Brownian motion, Poisson processes, Lévy processes, Bessel processes) satisfy the much stronger property of being Feller processes, which I will define in the next post.

Intuitively speaking, a process *X* is Markov if, given its whole past up until some time *s*, the future behaviour depends only its state at time *s*. To make this precise, let us suppose that *X* takes values in a measurable space and, to denote the past, let be the sigma-algebra generated by . The Markov property then says that, for any times and bounded measurable function , the expected value of conditional on is a function of . Equivalently,

(1) |

(almost surely). More generally, this idea makes sense with respect to any filtered probability space . A process *X* is Markov with respect to if it is adapted and (1) holds for times .

Continuous-time Markov processes are usually defined in terms of *transition functions*. These specify how the distribution of is determined by its value at an earlier time *s*. To state the definition of transition functions, it is necessary to introduce the concept of transition probabilities.

Definition 1A (transition) kernelNon a measurable space is a mapsuch that

- for each , the map is a measure.
- for each , the map is measurable.

If, furthermore, for all , thenNis atransition probability.

A transition probability, then, associates to each a probability measure on . This can be used to describe how the conditional distribution of a process at a time *t* depends on its value at an earlier time *s*

Given any such kernel *N* and , we denote the integral of a measurable function with respect to the measure by

Then, is itself a measurable function from *E* to . For the case of an indicator function of any , is measurable by definition. This extends to positive linear combinations of such indicator functions (the *simple* functions) and, then, by monotone convergence, measurability of extends to all nonnegative measurable *f*. In the case where *N* is a transition probability, is well-defined and bounded for all bounded and measurable .

Two kernels *M* and *N* can be combined by first applying *N* followed by *M* to get the kernel *MN*,

Suppose that *M*, *N* are transition probabilities describing how a process *X* goes from its state at time *s* to its conditional distribution at time *t* and from its state at time *t* to a distribution at time *u* respectively, for . Then, the combination *MN* describes how it transitions from its state at time *s* to *u*. This is just the tower rule for conditional expectation,

A Markov process is defined by a collection of transition probabilities , one for each , describing how it goes from its state at time *s* to a distribution at time *t*. I only consider the homogeneous case here, meaning that depends only on the size *t*–*s* of the time increment and not explicitly on the start or end times *s*, *t*, so the notation can be replaced by . This is not much of a restriction because, given an inhomogeneous Markov process *X*, it is always possible to look at its *space-time process* taking values in , which will be homogeneous Markov.

Definition 2A homogeneous transition function on is a collection of transition probabilities on such that for all .A process

XisMarkovwith transition function , and with respect to a filtered probability space if it is adapted and

(almost surely), for all times .

The identity is known as the Chapman-Kolmogorov equation, and is required so that the transition probabilities are consistent with the tower rule for conditional expectations. Alternatively forms a *semigroup*.

As an example, standard Brownian motion, *B*, has the defining property that is normal with mean 0 and variance *t*–*s* independently of , for times . Equivalently, it is a Markov process with the transition function

In practice, other than for a small number of special processes, it is not possible to write down transition functions explicitly. Instead, the processes are defined as solutions to stochastic differential equations, or the transition function is defined via an infinitesimal generator.

The distribution of a Markov process is determined uniquely by its transition function and *initial distribution*.

Lemma 3Suppose thatXis a Markov process on with transition functionPsuch that has distribution . Then, for any times and bounded measurable function ,

(2)

*Proof:* Let us start by showing that for ,

(3) |

where is the measurable function

In the case where is just a product , then

and (3) follows from the Markov property . The extension of (3) to arbitrary bounded measurable is now just a standard application of the monotone class theorem.

The proof of (2) follows from an induction on *n*. For the case , (2) just reduces to the statement that has distribution . So, suppose that and (2) holds for *n* replaced by *n*-1. Then, using (3)

Substituting in the expression for *g* gives the result. ⬜

The *finite distributions* of a process *X* taking values in *E* are defined to be the distributions, in , of for all finite sets of times . Expression (2) describes the finite distributions of a Markov process solely in terms of its initial distribution and transition function.

Corollary 4Suppose thatXandYare Markov processes, each with the same transition function and such that and have the same distribution on .

Then,XandYhave the same finite distributions.

It is important to know that Markov processes do indeed exist for any given transition function and initial distribution. This is a consequence of the Kolmogorov extension theorem.

Theorem 5Let be a measurable space, and be the space of functions . Denote its coordinate process byX,Also, let be the sigma-algebra generated by and, for each , let be the sigma-algebra generated by . So, is a filtration on the measurable space with respect to which

Xis adapted.

Then, for every transition function and probability distribution onE, there is a unique probability measure on under whichXis a Markov process with transition function and initial distribution .

The superscripts `0′ just denote the fact that we are using the uncompleted sigma-algebras. Once the probability measure has been defined, it is standard practice to complete the filtration, which does not affect the Markov property.

*Proof:* For any finite subset , let denote the sigma-algebra generated by . If for times , let denote the probability measure on given by (2). Note that, if for some then the identity

shows that, if is independent of , expression (2) for reduces to the definition of . Similarly, if then the fact that shows that definition (2) reduces to that of whenever does not depend on . In either case, is the restriction of to .

By successively removing elements from *S*, this shows that is the restriction of to for any with . The measures for finite subsets with are therefore consistent, and the Kolmogorov extension theorem implies the existence of a unique measure on agreeing with on for all such finite subsets .

We have now shown the existence and uniqueness of the probability measure satisfying (2). Only the Markov property remains. So, consider times and bounded measurable function . Also, let *Z* be the -measurable random variable for some and . Then, applying (2) to gives

By the monotone class theorem, this extends to all bounded and -measurable random variables *Z*, giving as required. ⬜

The unique measure with respect to which *X* is Markov with the given transition function and initial distribution is denoted by , and expectation with respect to this measure is denoted by . In particular, if consists of a single mass of weight 1 at a point then we write and, similarly, write for . It can be seen that, if *Z* is a bounded random variable, the map is measurable and

for an arbitrary probability measure on . In fact, this follows from expression (2).

It is sometimes useful to generalize Definition 2 to *sub-Markovian* transition functions. These are sets of kernels satisfying the identity as above but, instead of requiring that are transition probabilities, the inequality is imposed instead. Although such transition functions do not have probabilities summing to 1, they can be represented by Markov processes which are *killed* at some time. To do this, we adjoin a a new point , called the *cemetery* or *coffin state*, to *E* to form a new state space . The sigma-algebra consists of the sets such that is in . Then, a new (Markovian) transition function can be defined on by

(4) |

It can be checked that the identity is satisfied, and that . Then, represents a process which, over an interval has probability of being killed, in which case it jumps to the state and remains there.

Why the function f need to be non-negative in the transition kernel case, and it is bounded in the transition prob case?

For general transition kernels, integrals with respect to arbitrary measurable functions are not well defined. You need to restrict either to integrable or to nonnegative functions. For transition probabilities, bounded functions are integrable, so can be used without imposing an additional integrability condition.

It seems to me that there is a little mistake in the proof of lemma 3 in the case where f is just a product of u and v. Why depends P_t_n,t_n-1 only of one variable?

I did mix up u with v in a couple of places, so I fixed this. I’m not sure if this helps with your question, but it is looking ok to me now.

Yeah, i forgot that you only consider only the space and time homogenous case. Can you maybe add a few words which property the functions f must have in the defining property of a markov process and why (or if) it is eqivalent to the equation (the transition kernel exist if the state space is Borel)

Skip the first sentence of my previous comment.

I’ll have a look through this post again when I have some time and maybe clear up the points you mention.

Does every Markov process (defined by Eq(1)) have a transition function? I understood the converse is true (by Theorem 5). But can we construct a transition function from a given Markov process (defined by Eq(1))?