In these notes, the approach taken to stochastic calculus revolves around stochastic integration and the theory of semimartingales. An alternative starting point would be to consider Markov processes. Although I do not take the second approach, all of the special processes considered in the current section are Markov, so it seems like a good idea to introduce the basic definitions and properties now. In fact, all of the special processes considered (Brownian motion, Poisson processes, Lévy processes, Bessel processes) satisfy the much stronger property of being Feller processes, which I will define in the next post.
Intuitively speaking, a process X is Markov if, given its whole past up until some time s, the future behaviour depends only its state at time s. To make this precise, let us suppose that X takes values in a measurable space and, to denote the past, let
be the sigma-algebra generated by
. The Markov property then says that, for any times
and bounded measurable function
, the expected value of
conditional on
is a function of
. Equivalently,
(1) |
(almost surely). More generally, this idea makes sense with respect to any filtered probability space . A process X is Markov with respect to
if it is adapted and (1) holds for times
.
Continuous-time Markov processes are usually defined in terms of transition functions. These specify how the distribution of is determined by its value at an earlier time s. To state the definition of transition functions, it is necessary to introduce the concept of transition probabilities.
Definition 1 A (transition) kernel N on a measurable space
is a map
such that
- for each
, the map
is a measure.
- for each
, the map
is measurable.
If, furthermore,
for all
, then N is a transition probability.
A transition probability, then, associates to each a probability measure on
. This can be used to describe how the conditional distribution of a process at a time t depends on its value at an earlier time s
Given any such kernel N and , we denote the integral of a measurable function
with respect to the measure
by
Then, is itself a measurable function from E to
. For the case of an indicator function
of any
,
is measurable by definition. This extends to positive linear combinations of such indicator functions (the simple functions) and, then, by monotone convergence, measurability of
extends to all nonnegative measurable f. In the case where N is a transition probability,
is well-defined and bounded for all bounded and measurable
.
Two kernels M and N can be combined by first applying N followed by M to get the kernel MN,
Suppose that M, N are transition probabilities describing how a process X goes from its state at time s to its conditional distribution at time t and from its state at time t to a distribution at time u respectively, for . Then, the combination MN describes how it transitions from its state at time s to u. This is just the tower rule for conditional expectation,
A Markov process is defined by a collection of transition probabilities , one for each
, describing how it goes from its state at time s to a distribution at time t. I only consider the homogeneous case here, meaning that
depends only on the size t–s of the time increment and not explicitly on the start or end times s, t, so the notation
can be replaced by
. This is not much of a restriction because, given an inhomogeneous Markov process X, it is always possible to look at its space-time process
taking values in
, which will be homogeneous Markov.
Definition 2 A homogeneous transition function on
is a collection
of transition probabilities on
such that
for all
.
A process X is Markov with transition function
, and with respect to a filtered probability space
if it is adapted and
(almost surely), for all times
.
The identity is known as the Chapman-Kolmogorov equation, and is required so that the transition probabilities are consistent with the tower rule for conditional expectations. Alternatively
forms a semigroup.
As an example, standard Brownian motion, B, has the defining property that is normal with mean 0 and variance t–s independently of
, for times
. Equivalently, it is a Markov process with the transition function
In practice, other than for a small number of special processes, it is not possible to write down transition functions explicitly. Instead, the processes are defined as solutions to stochastic differential equations, or the transition function is defined via an infinitesimal generator.
The distribution of a Markov process is determined uniquely by its transition function and initial distribution.
Lemma 3 Suppose that X is a Markov process on
with transition function P such that
has distribution
. Then, for any times
and bounded measurable function
,
(2)
Proof: Let us start by showing that for ,
(3) |
where is the measurable function
In the case where is just a product
, then
and (3) follows from the Markov property . The extension of (3) to arbitrary bounded measurable
is now just a standard application of the monotone class theorem.
The proof of (2) follows from an induction on n. For the case , (2) just reduces to the statement that
has distribution
. So, suppose that
and (2) holds for n replaced by n-1. Then, using (3)
Substituting in the expression for g gives the result. ⬜
The finite distributions of a process X taking values in E are defined to be the distributions, in , of
for all finite sets of times
. Expression (2) describes the finite distributions of a Markov process solely in terms of its initial distribution and transition function.
Corollary 4 Suppose that X and Y are Markov processes, each with the same transition function
and such that
and
have the same distribution
on
.
Then, X and Y have the same finite distributions.
It is important to know that Markov processes do indeed exist for any given transition function and initial distribution. This is a consequence of the Kolmogorov extension theorem.
Theorem 5 Let
be a measurable space, and
be the space of functions
. Denote its coordinate process by X,
Also, let
be the sigma-algebra generated by
and, for each
, let
be the sigma-algebra generated by
. So,
is a filtration on the measurable space
with respect to which X is adapted.
Then, for every transition function
and probability distribution
on E, there is a unique probability measure
on
under which X is a Markov process with transition function
and initial distribution
.
The superscripts `0′ just denote the fact that we are using the uncompleted sigma-algebras. Once the probability measure has been defined, it is standard practice to complete the filtration, which does not affect the Markov property.
Proof: For any finite subset , let
denote the sigma-algebra generated by
. If
for times
, let
denote the probability measure on
given by (2). Note that, if
for some
then the identity
shows that, if is independent of
, expression (2) for
reduces to the definition of
. Similarly, if
then the fact that
shows that definition (2) reduces to that of
whenever
does not depend on
. In either case,
is the restriction of
to
.
By successively removing elements from S, this shows that is the restriction of
to
for any
with
. The measures
for finite subsets
with
are therefore consistent, and the Kolmogorov extension theorem implies the existence of a unique measure
on
agreeing with
on
for all such finite subsets
.
We have now shown the existence and uniqueness of the probability measure satisfying (2). Only the Markov property remains. So, consider times
and bounded measurable function
. Also, let Z be the
-measurable random variable
for some
and
. Then, applying (2) to
gives
By the monotone class theorem, this extends to all bounded and -measurable random variables Z, giving
as required. ⬜
The unique measure with respect to which X is Markov with the given transition function and initial distribution is denoted by , and expectation with respect to this measure is denoted by
. In particular, if
consists of a single mass of weight 1 at a point
then we write
and, similarly, write
for
. It can be seen that, if Z is a bounded random variable, the map
is measurable and
for an arbitrary probability measure on
. In fact, this follows from expression (2).
It is sometimes useful to generalize Definition 2 to sub-Markovian transition functions. These are sets of kernels satisfying the identity
as above but, instead of requiring that
are transition probabilities, the inequality
is imposed instead. Although such transition functions do not have probabilities summing to 1, they can be represented by Markov processes which are killed at some time. To do this, we adjoin a a new point
, called the cemetery or coffin state, to E to form a new state space
. The sigma-algebra
consists of the sets
such that
is in
. Then, a new (Markovian) transition function
can be defined on
by
(4) |
It can be checked that the identity is satisfied, and that
. Then,
represents a process which, over an interval
has probability
of being killed, in which case it jumps to the state
and remains there.
Why the function f need to be non-negative in the transition kernel case, and it is bounded in the transition prob case?
For general transition kernels, integrals with respect to arbitrary measurable functions are not well defined. You need to restrict either to integrable or to nonnegative functions. For transition probabilities, bounded functions are integrable, so can be used without imposing an additional integrability condition.
It seems to me that there is a little mistake in the proof of lemma 3 in the case where f is just a product of u and v. Why depends P_t_n,t_n-1 only of one variable?
I did mix up u with v in a couple of places, so I fixed this. I’m not sure if this helps with your question, but it is looking ok to me now.
Yeah, i forgot that you only consider only the space and time homogenous case. Can you maybe add a few words which property the functions f must have in the defining property of a markov process and why (or if) it is eqivalent to the equation
(the transition kernel exist if the state space is Borel)
Skip the first sentence of my previous comment.
I’ll have a look through this post again when I have some time and maybe clear up the points you mention.
Does every Markov process (defined by Eq(1)) have a transition function? I understood the converse is true (by Theorem 5). But can we construct a transition function from a given Markov process (defined by Eq(1))?