Random Walks & Stochastic Processes

Why is everything so normal?

Most people have heard of the normal distribution, the pdf of this distribution is a Gaussian function, specified by two parameters μ\mu and σ\sigma:

p(x;μ,σ)=12πσ2e(xμ)2/2σ2p(x;\mu,\sigma)=\frac{1}{ \sqrt {2\pi\sigma^{2}}}e ^ { -(x-\mu)^{2}/2\sigma ^{2}}

You can check that dxp(x;μ,σ)=1\int dx\,p(x;\mu,\sigma)=1 using that

dxex2/2=2π\int dx e^{-x ^{2}/2}=2\pi

an identity which is worth memorizing. Even in times of the internet. The function is bell-shaped with a peak at μ\mu and an approximate width of σ\sigma.

The shape of the Gaussian (red) and Cauchy (blue) distributions. They look very similar, but are quite different.

Now it’s true that this pdf appears everywhere in science. But why? Why not for example the Cauchy distribution that looks like this

p(x;μ,σ)=1π(σσ2+(xμ)2)p(x;\mu,\sigma)=\frac{1}{\pi}\left(\frac{\sigma}{\sigma^{2}+(x-\mu) ^{2}}\right)

This is also a Bell-shaped curve with a maximum at μ\mu and and a width that seems to be proportional to σ\sigma. It turns out that this pdf doesn’t have a mean or a variance. But nevertheless, that can’t be the reason why the Gaussian normal distribution is so widespread.

Sums of random variables

Here’s the reason why this is so. Let’s look at set of random numbers XnX_{n} with n=1,.....,Nn=1,.....,N and consider their sum

YN=nNXnY_{N}=\sum_{n}^{N}X_{n}

and let’s say, for the sake of simplicy that all the XnX_{n} are independent and drawn from the same distribution p(x)p(x) which we assume to be even (this assumption isn’t necessary but makes things easier):

p(x)=p(x).p(x)=p(-x).

For an even pdf the mean is zero, so X=0\left\langle X\right\rangle =0. Let’s assume that p(x)p(x) has some width

X2=σ2.\left\langle X^{2}\right\rangle =\sigma^{2}.

Other than that, we make no assumptions concerning the shape of the distribution. Now we would like to compute the statistics of YNY_{N}. First, the mean has to vanish as well because:

Y=nNXn=nNXn=0.\left\langle Y\right\rangle =\left\langle \sum_{n}^N X_{n}\right\rangle=\sum_{n}^{N}\left\langle X_{n}\right\rangle = 0.

The variance is a bit more tedious to compute:

Y2=nNmNXnXm\left\langle Y^2 \right\rangle =\sum_{n}^N \sum_{m} ^N \left\langle X_{n} X_{m} \right\rangle =nNmnNXnXm+m=nNXn2=\sum_{n} ^N \sum _{m \neq n}^N \left\langle X_{n} X_{m} \right\rangle +\sum_{ m = n } ^N \left\langle X_{n} ^2 \right\rangle =nNmnNXnXm+m=nNσ2=\sum_{n} ^N \sum_{m \neq n} ^N \left\langle X_{n} \right\rangle \left\langle X_{m} \right\rangle +\sum_{ m = n } ^N \sigma ^2 =0+Nσ2= 0+N\sigma^{2} =Nσ2= N\sigma^{2}

Here we first split the double sum into a double sum that omits the diagonal (m=nm=n) and the sum that captures those elements. Because XnX_{n} and XmX_{m} are independent if mnm\neq n the expectation XnXm\left\langle X_{n}X_{m}\right\rangle factorizes. And because Xn=0\left\langle X_{n}\right\rangle =0 the first double some vanishes. What remains is the second diagonal terms that all contribute a σ2\sigma^{2}. Nice. So the variance increases with NN.

Now if we interpret the individual XnX_{n} as random steps on the xx-axis drawn from the pdf p(x)p(x) then YY is the position after NN steps. The quantity

Y2=Nσ\sqrt{\left\langle Y^{2}\right\rangle }=\sqrt{N}\sigma

is the expected distance from the origin after NN steps. So, a random walk like this one typically is a distance Nσ\sqrt{N}\sigma away from the origin after NN steps. That means I need to wait 4 times longer to move away twice as far.

Now, although we discussed this in one dimension, the above result also holds in any dimension. A random walker that has taken NN steps is typically a distance N\sim\sqrt{N} away from the original starting point. You can see this in Panel 2 that shows a cloud of random walkers initially at the origin and the root-mean-square Y2\sqrt{\left\langle Y^{2}\right\rangle } of the position as a function of NN.

The left panel depicts the position of M=1000M=1000 random walkers that all start at the origin. In the right panel their mean distance to the origin is depicted and scales as t\sqrt{t} where tt is the number of steps.

The central limit theorem

There’s more: We can define a new variable

ZN=YN/σNZ_{N}=Y_{N}/\sigma\sqrt{N}

which is just the position devided by the scaling factor σN\sigma\sqrt{N}. Obviously the pdf for this random variable should depend on the specific functional properties of the pdf p(x)p(x) of the single steps. However, as we increase the step number NN the pdf for ZZ approaches a Gaussian and looks like this

p(z)=12πez2/2.p(z)=\frac{1}{\sqrt{2\pi}}e^{-z ^{2}/2}.

This is called the central limit theorem. Sloppily, we can say the sum of independent identically distributed random variables will be distributed like a Gaussian. So for the original YNY_{N} variable this implies

p(y)=12πσ2Ney2/2σ2Np(y)=\frac{1}{\sqrt{2\pi\sigma ^{2} N}}e ^{ -y ^{2}/2\sigma ^{2} N}

The central limit theorem is the reason why the Gaussian distribution is so abundant. Whenever we have increments that are independent or random forces that impact a system in one way or another, we can expect the outcome variable to be normally distributed. In a sense all the information and the structure in the statistics of the single steps, so all the functional characteristics in p(x)p(x) are washed away. Again, this is also true in any dimension. Even if two different random walks, each defined by its own single step pdf, look initially very different, as NN increases the trajectories of the walks not only appear to look very similar but the statistics of the position always approaches a Gaussian distribution, see the panel below for a geometric interpretation of the central limit theorem in two dimensions.

The central limit theorem: The simulation shows the trajectories of four different random walks that all start at the origin. Each random walk is characterized by a probability distribution p(x,y)p(x,y) for making a single step as illustrated next to the toggles on the right. They are chosen to have identical variance. As the step number increases the geometric differences between the walks disappear and on large scales the walks are no longer distinguishable.

Continuous Time

In our interpretation we identified nn as a temporal variable, the step number. If we now say that

tN=NΔtt_{N}=N\Delta t

we can perform a continuous time limit. We identify first

Y(tN)=YNY(t_{N})=Y_{N}

and when we look at the variance

YN2=Y(tN)2=Nσ2\left\langle Y_{N}^{2}\right\rangle =\left\langle Y(t_{N})^{2}\right\rangle =N\sigma^{2}

we get

Y(tN)2=tNσ2/Δt\left\langle Y(t_{N})^{2}\right\rangle =t_{N}\sigma^{2}/\Delta t

The solution p(x,t)p(x,t) to the diffusion equation tp(x,t)=Dx2p(x,t)\partial_{t}p(x,t)=D\partial_{x}^{2}p(x,t) for D=1D=1 and an initially sharply peaked p(x,0)p(x,0).

We can now let the number of steps NN\rightarrow\infty and Δt0\Delta t\rightarrow0 keeping tN=tt_{N}=t fixed. In this case we also have to decrease the variance of the single steps such that

σ2/ΔtD\sigma^{2}/\Delta t\rightarrow D

which yields

Y(t)2=Dt\left\langle Y(t)^{2}\right\rangle =Dt

and the pdf becomes

p(y,t)=12πDtey2/2Dtp(y,t)=\frac{1}{\sqrt{2\pi Dt}}e^{-y ^{2} /2Dt}

where we have made the time dependence explicit in the arguments of the pdf. So this is a Bell curve that spreads out and the width increases as t\sqrt{t}. It’s a good practice to differentiate p(x,t)p(x,t) (we now use the letter xx for the position) with respect to tt and twice with respect to xx because then we find that

tp(x,t)=12Dx2p(x,t)\partial_{t}p(x,t)=\frac{1}{2} D\partial_{x}^{2}p(x,t)

which is the diffusion equation. Below, we will derive it in a different way. This equation is very important as it can be used as the foundation for more complex systems in which particles of different types diffuse in space and interact, eventually yielding reaction diffusion systems.

Stochastic differential equations

The Ornstein-Uhlenbeck process

Now let’s look at the 1D dynamical system

x˙=kx=f(x)\dot{x}=-kx=f(x)

This is pretty much the simplest dynamical system we can think of. Physically this might describe a mass on a spring immersed in a viscous liquid like oil that yields an overdamped movement. It will always equilibrate to the x=0x=0 stationary state. Let’s aufdrösel this again

x(t+Δt)=x(t)Δt×k,x(t)x(t+\Delta t)=x(t)-\Delta t\times k\\,x(t)

So in the small time interval the location changes by Δt×k,x(t)-\Delta t\times k\\,x(t). Now let’s assume that in this time interval the mass is exposed to millions, billions and trillions of elastic collisions with small molecules with small mass that transfer energy and momentum to the object and change the position randomly by very very tiny increments ±δ\pm\delta. Because of what we said above the pdf for the displacement ΔW\Delta W is distributed like a Gaussian with a certain width and zero mean. The typical size of that displacement is given by

(ΔW)2=σ2Δt\left\langle \left(\Delta W\right)^{2}\right\rangle =\sigma^{2}\Delta t

because Δt\Delta t is the time interval that accumulates all the zillions of little kicks. So we think of the time evolution as

x(t+Δt)=x(t)Δt×k,x(t)+σΔWx(t+\Delta t)=x(t)-\Delta t\times k\\,x(t)+\sigma\Delta W

where the ΔW\Delta W is drawn from a normal pdf with a variance of Δt\Delta t:

p(w)=12πΔtew2/2Δtp(w)=\frac{1}{\sqrt{2\pi\Delta t}}e^{-w ^{2}/2\Delta t}

Now we can make Δt\Delta t as small as we want as long as we guarantee that in this small time interval there are very many small kicks happening. Then we get

dX=kXdt+σdWdX=-kXdt+\sigma dW

Note that we avoid deviding by dtdt. Some people do that. We don’t. The reason for this is that it makes sense to talk about dWdW but not about dW/dtdW/dt.

The Ornstein-Uhlenbeck process: This process is governed by one of the simplest Langevin equations: dX=kXdt+σdWdX=- k X dt+\sigma dW.A linear force is forcing a particle back to the origin, with a force constant kk. Gaussian noise of strength σ\sigma is driving the particle away from the origin.

The process above is known as the Ornstein-Uhlenbeck process. You can explore it in panel above. And the equation above is called a stochastic differential equation specifically a Langevin-equation.

In the Ornstein-Uhlenbeck process, the position variable is driven towards the origin by the linear force term but wiggles around that stable stationary state, driven by the stochastic noise. The larger that noise strength σ\sigma the more wiggle, the stronger the spring constant the smaller the wiggle.

Stationary Ornstein-Uhlenbeck Process

If you start an an ensemble of OUPs at some fixed position, say x(0)=x0x(0)=x_{0} eventually all trajectories will equilibrate to s stationary distribution around the mean X(t)=0\left\langle X(t)\right\rangle =0. The variance is given by

X(t)2=σ2/2k\left\langle X(t)^{2}\right\rangle =\sigma^{2}/2k

so it increases with the noise strength and decreases with the strength of the linear forces. The distribution around the mean is Gaussian

p(x)=1πσ2/kekx2/σ2p(x)=\frac{1}{\sqrt{\pi\sigma ^{2}/k} }e^{ -k x ^{2}/\sigma ^{2}}

and the autocorrelation function is given by

X(t)X(0)=σ22kekt\left\langle X(t)X(0)\right\rangle =\frac{\sigma ^{2} }{2k}e^{ -kt}

in equilibrium. Note that the stationary distribution can be writen as

p(x)=e2V(x)/σ2p(x)=e^{ -2V(x)/\sigma ^{2}}

where

V(x)=12kx2V(x)=\frac{1}{2}kx^{2}

is the potential function of the linear force f(x)=kx=V(x)f(x)=-kx=-V^{\prime}(x). This is a general result. For example, if we look at…

…the double well potential:

In general a stochastic differential equations often comes in this shape

dX=kf(X)dt+σdWdX=-kf(X)dt+\sigma dW

where the f(x)f(x) is the deterministic force. Let’s look at this interesting example. First let’s assume that we can compute the potential of the force f(x)f(x) according to

f(x)=V(x)f(x)=-V^{\prime}(x)

Let’s say that f(x)=kxx3f(x)=kx-x^{3} with a parameter kk that can be positive or negative. The potential is therefore

V(x)=12kx2+14x4V(x)=-\frac{1}{2}kx ^{2} +\frac{1}{4}x^{4}

Now, if we do an analysis of the deterministic system

x˙=kxx3=x(kx2)\dot{x}=kx-x^{3}=x(k-x ^{2} )

we see that this system has either only one stable fixpoint if k\<0k \< 0 that becomes unstable if k>0k>0 and two additional stable fixpoints emerge, a pitchfork bifurcation. We can now run simulations of the system

dX=(kxx3)dt+σdWdX=(kx-x^{3})dt+\sigma dW

to investigate the impact of noise. We find that trajectories, if k>0k>0 either concentrate around one or the other stable fixpoint and wiggle around it. Also the stationary distribution will have two peaks, because it is given by

p(x)e2V(x)/σ2p(x)\sim e^{2V(x)/\sigma ^{2} }

A particle that is trapped in the basin of attraction of one fixpoint can move to the other by the noise that is driving the system. The larger the noise, the more frequent the excursions to the “other side”. This also becomes easier as kk becomes smaller and smaller, eventually when k<0k<0 the system only has one fixpoint at the origin and all trajectories will concentrate there.

Diffusion in a double-well potential.

The Wiener Process

What if f=0f=0 ? In this case the Then the Langevin equation is particularly simple:

dX=σdWdX=\sigma dW

This means we have a process

X(t+Δt)=X(t)+σΔW(t)X(t+\Delta t)=X(t)+\sigma\Delta W(t)

So we are incrementing in a Gaussian increment every time, like a random walk in continuous time. This also means that at time tt we have

X(t)=σW(t)X(t)=\sigma W(t)

where W(t)W(t) is just the result of adding many Gaussian increments

W(t)=nNΔW(t)=0dW.W(t)=\sum_{n}^{N}\Delta W(t)=\int_{0}^{\infty}dW.

And in analogy to the random walk we have

W2(t)=t\left\langle W^{2}(t)\right\rangle =t

with a probability density

p(w,t)=12πtew2/2tp(w,t)=\frac{1}{\sqrt{2\pi t}}e^{-w ^{2} /2t}

as we discussed before. The process W(t)W(t) is known as the Wiener process.

Another road to the diffusion equation

The spreading Gaussian,

p(x,t)=12πtDex2/2tDp(x,t)=\frac{1}{\sqrt{2\pi tD}}e^{-x ^{2} /2tD}

as mentioned earlier, fullfills the diffusion equation

tp(x,t)=12Dx2p(x,t).\partial_{t}p(x,t)=\frac {1}{2}D\partial_{x}^{2}p(x,t).

Here’s another way of deriving it.

Let’s say we split the line that defines the xx coordinate into intervals that we interpret as containers at locations xn=nΔxx_{n}=n\Delta x and each container having a width Δx\Delta x. Now let’s assume that there are many particles distributed anywhere on the line. All the particles in the interval [xn,xn+Δx]\left[x_{n},x_{n}+\Delta x\right] belong to that container. We can count the particles in container nn and denote this numbner by un=u(xn)u_{n}=u(x_{n}) . Because particles can move between containers, we have un=un(t)u_{n}=u_{n}(t) or equivalently u(xn)=u(xn,t)u(x_{n})=u(x_{n},t). Now lets’s assume that every particle at nn can randomly move to the container next door, so to n+1n+1 and n1n-1 that means the container at nn, in a small time interval Δt\Delta t, may loose particles to the neighboring containers so that

Δun(t)=γun(t)Δt\Delta u_{n}^{-}(t)=-\gamma u_{n}(t)\Delta t

where γ\gamma is a rate constant that quantifies at which rate particles are randomly moving to the neighboring sites. The concentration unu_{n} can also increase due to incoming particles (from the left or the right) so that

Δun+(t+Δt)=γun+1(t)Δt+γun+1(t)Δt\Delta u_{n}^{+}(t+\Delta t)=\gamma u_{n+1}(t)\Delta t+\gamma u_{n+1}(t)\Delta t

So we have

un(t+Δt)=un(t)+Δun(t)+Δun+(t)u_{n}(t+\Delta t)=u_{n}(t)+\Delta u_{n}^{-}(t)+\Delta u_{n}^{+}(t)

which reads

un(t+Δt)=un(t)+γΔt×(un+1(t)Δt+un1(t)Δt2un(t)).u_{n}(t+\Delta t)=u_{n}(t)+\gamma\Delta t\times\left(u_{n+1}(t)\Delta t+u_{n-1}(t)\Delta t-2u_{n}(t)\right).

Now this yields

_tu(x_n,t)=γΔx2(u(x_n+Δx,t)2u(x_n,t)+u(x_nΔx,t)Δx2)\partial \_{t} u(x \_{n} ,t) = \gamma \Delta x ^{2} \left( \frac{u(x \_{n} +\Delta x,t)-2u(x\_{n},t)+u(x\_{n}-\Delta x,t)}{\Delta x^{2} }\right)

Now we we let

γΔx2=D/2\gamma\Delta x^{2}=D/2

and perform the limit Δx0\Delta x\rightarrow0 we get

tu(x,t)=12Dx2u(x,t)\partial_{t}u(x,t)=\frac{1}{2}D\partial_{x}^{2}u(x,t)

which again is the diffusion equation. Here, however we derived it not looking at a single randomly moving particle but rather using particle numbers / concentrations.

Random Events - Poisson process

Quite often, in dynamical processes we have a situation in which as time progresses random events occur. For example, let’s say time is at t=t0t=t_{0} and it advances to a later time t0+Δtt_{0}+\Delta t where Δt\Delta t is as small as we like. Now let’s assume that there’s a small probability that in this time interval [t0,t0+Δt]\left[t_{0},t_{0}+\Delta t\right] and event occurs, e.g. a collision of two particles, the birth of an animal, the death of an animal or the firing of neuronal spike. Let’s call this probability

p=αΔtp=\alpha\Delta t

The proportionality constant α\alpha is called a probability rate. Now let’s assume that every time we advance time by Δt\Delta t an event can occur. Let’s ask: What is the probability density that an event doesn’t occur for a time TT and occurs exactly in the time intervale [t0,t0+t+Δt]\left[t_{0},t_{0}+t+\Delta t\right]. We can split the time-interval into NN small segments of duration Δt\Delta t so NΔt=tN\Delta t=t. The probability that no event occurs is

(1p)N\left(1-p\right)^{N}

multiplied by the probability that the event occurs in [t0,t0+t+Δt]\left[t_{0},t_{0}+t+\Delta t\right] which is pp so

(1p)Np=(1αΔt)NαΔt\left(1-p\right)^{N}p=\left(1-\alpha\Delta t\right)^{N}\alpha\Delta t =(1αt/N)NαΔt=\left(1-\alpha t/N\right)^{N}\alpha\Delta t

In the limit NN\rightarrow\infty this becomes

p(t)dt=αeαtdtp(t)dt=\alpha e^{-\alpha t}dt

So the probability density for the time-interval between events is an exponential pdf. This process is known as the Poisson process.