0:00

[MUSIC]

Â In the previous videos we familiarised ourself with the Monte Carlo part.

Â So now, let's talk about the Markov Chain part.

Â Which is actually really cool idea on how to generate samples from

Â complicated multi-dimensional distributions efficiently.

Â So what is a Markov chain?

Â Imagine a frog which sits on a water lily.

Â And this frog has kind of a discrete time.

Â So at each time stamp, at each second, for example, it makes a decision,

Â whether to stay on its current lily or to jump to the other one.

Â And its decisions are kind of prefined in probabilistic way.

Â So if it's right now on the left water lily,

Â it will stay there with probability 0.3, and it will move with probability 0.7.

Â And if it's on the right one, it will stay or move of probability 0.5.

Â For example,

Â now we can say that we have kind of a dynamic system which has a state.

Â So the state will show us where the frog is now.

Â This dynamic system, it kind of evolves for time, its states changes.

Â And the rules by which these states changes are as follows.

Â So the probability to transition from left state to left state is 0.3 and etc.

Â So you can define the behavior of your dynamic system by using these transition

Â probabilities.

Â So we can simulate our dynamic system.

Â We can see that, for example, on the timestamp 1, the frog was on the left,

Â and then maybe jumped to the right on the timestamp 2.

Â And then maybe it stayed to the right.

Â Then maybe it went to the left, and then, for example, to the right.

Â And we can write down the path through the state space which our dynamic system made.

Â So it was left, right, right, left, right.

Â And this thing is called Markov chain because the next

Â state is kind of depends only on the previous state.

Â So the frog doesn't care where it was ten time steps ago,

Â the only thing it cares about is where it is now.

Â To make the decision, it just needs to know what is the state now.

Â Okay, so we can ask ourselves a question.

Â Like where the frog will be after 10 timestamps for example.

Â So let's say that we started from the timestamp 1, and the frog was on the left.

Â So with probability 1, it was on the left.

Â With probability 0, it was on the right.

Â What is the probability that will be on the left or

Â on the right in the timestamp 2?

Â Well, it's 0.3 and 0.7, right?

Â 2:55

Okay, and what about the timestamp 3?

Â It's going to trigger because we don't know where it was before.

Â But using the rule of some of probabilities we can kind of check all

Â the possibilities for the previous state and

Â use that to compute the next probabilities.

Â So the law of some tells us that the value of the state of the timestamp

Â 3 equals to the value of the state of the timestamp 3,

Â given that at the second timestamp it was on the left,

Â times the probability it was on the left on the second timestamp.

Â Plus the same thing, considering that the station,

Â it was on the right on the second timestamp.

Â And we can just, we know all these values because the conditional

Â probability is just the transition probability.

Â So the probability that the frog moves from left to right for example,

Â and the probability of x2 are computed in the previous line,

Â so it is what we have just computed.

Â And we can write it down, so it will be for probability of the left,

Â in the third timestamp.

Â It will be something like 0.3 because it's

Â the probability of transitioning from left to left,

Â hence the probability of 0.3.

Â So it's from the previous line of the table.

Â Plus the probability of the prior probability of being

Â on the right hand side of the previous timestamp,

Â which is 0.7 times the transition probability

Â of moving from right of, right to right, which is 0.5.

Â I'm sorry, from right to left which is 0.5.

Â And we can write the same thing for the for

Â being at the right state, in the timestamp 3, and

Â we'll have some numbers like is which are around 44 and 56.

Â And we can continue this table, for as long as we want.

Â And we'll found out that the table kind of

Â converged the numbers around 42 and 58.

Â So after 1 million steps,

Â the probability is that the frog is on the left is almost exactly 42.

Â And the probability that the frog is on the right is almost exactly 58.

Â Which basically means that we, if we stimulate our frog for long enough and

Â then write down where it ended up with the one millionth step.

Â Then we will kind of get a sample from this discreet distribution of the values.

Â So with probability 42 we'll get left, and with probability 58 we'll get right.

Â So let's look how it works.

Â Say we select our frog for like 1 million steps,

Â and the last step was for example left, okay?

Â We write down this left and then simulate another [INAUDIBLE] for the state space.

Â So begin simulation and the last state was for example right.

Â We are the down and by then phase several types.

Â We have a sample from distribution of the millionth step of

Â our dynamic system given that the frog started on the left.

Â And this thing will be close the probability 42 and 58.

Â So even here we can see that we have two out of five lefts,

Â and three out of five rights, which are close to 42 and 58.

Â So this way we can generate, it's a really lousy way to

Â generate samples from this distribution with the radius.

Â So there's a much simpler way which we have just discussed in the previous video.

Â But note that this way, so, build a dynamic system and simulate it and

Â see where it end up, works even for more complicated cases.

Â So what if you have for example ten lilies?

Â Or 1 billion lilies?

Â If you want to simulate your distribution by in naive way.

Â If you have billion values for in your variable, you'll have to spend

Â amount of time proportional to 1 billion to generate one sample from it.

Â If you use this kind of idea of simulate some dynamic system and

Â then write down the last position.

Â And if you do not that many steps like 1000 and hope that everything has

Â converged to the true distribution, then you can get a random number from this.

Â Complicated distribution without much trouble too much trouble

Â of the naive approach the example.

Â Or maybe a frog can be continuous, and

Â can jump from any point on your 2D plane to any other point.

Â In this case, you will be able to generate your samples from

Â a continuous distribution by simulating your dynamic system.

Â So the reason that you can still sample efficiently,

Â even if you have this continuous dynamic system,

Â by just writing down the path of this 2D plane, and writing down the last position.

Â So the idea Markov Chain simulation applied

Â to generating samples of random values is as follows.

Â We have a distribution we want to sample from.

Â It may be continuous, it may be discrete.

Â We build a Markov Chain that has this distribution,

Â that converges to this distribution, okay?

Â Then we start with from a initialization, x0, and then we simulate

Â our Markov Chain for long enough, for example for 1,000 iterations.

Â [COUGH] So we start with x0 then we generate x1,

Â by using the transition probability and etc.

Â And then we hope that after 1,000 steps,

Â you will converge to the true distribution p(x).

Â Because we built our Markov Chain this way.

Â Then, after these 1,000 steps,

Â you can throw away the first 1,000 samples, 1,000 states.

Â And then you can start writing down the states after these 1,000 steps.

Â Because they all will be from the distribution p(x),

Â or at least close to it, if your Markov Chain converged close enough.

Â 9:42

Okay, so this is the general idea.

Â And in the following we'll discuss how to build Markov Chain with this property.

Â So how to build Markov Chain that converge to the distribution you want to

Â sample from.

Â Now let's first discuss a little bit about whether a Markov Chain converge anywhere.

Â So in which case it does converge, and which it doesn't.

Â Well, the first observation here is that the Markov chain doesn't

Â have to converge to anywhere.

Â So for example, if you take a dynamic system like this,

Â the frog will always swap the lily at each time stamp.

Â And so if you look at the table, it will look like this.

Â The permute of being on the left on the first time step was 1 and

Â the p on the right is 0.

Â And then you swap things, so you're will be on the right on the next time stamp and

Â you will certainly be on the left on the next time stamp, and etc.

Â This thing will never converge this table.

Â So it will always swap if the place.

Â So this thing does work for us.

Â We want something that converge to some distribution.

Â And the question of whether or not given a Markov chain is converged,

Â is converging to something, is actually, pretty well studied.

Â But first of all, we'll need a definition of what does it mean for

Â Markov chain to converge to something in mathematical terms?

Â So this is something called a stationary distribution.

Â And this is a distribution of pi called stationary for

Â Markov chain T If the following equation holds.

Â Which basically tells us that if you start the distribution y and

Â if you marginalize out the current position x, then you will get

Â basically the distribution over position on the next time stamp.

Â This has to be pi again.

Â So basically, if you start from pi, if you sample a state from pi,

Â and then if you make just one transition, so one timestamp of the Markov chain,

Â then the distribution you will get after that is, again, pi.

Â This means that if we converge to, if we encounter this

Â pi at any point of our sampling, then we will stay at it.

Â We will not change this distribution.

Â And there is a nice theorem that tells us that If the transition probability

Â is non-zero for any two states, so if we can jump from any state to any other one,

Â with non-zero probability, then there exists a unique pi which is stationary.

Â And we will converge to this pi from any starting point.

Â So if we build a Markov chain that has non-zero transition probabilities,

Â and that has a stationary distribution which we

Â want sample from, then no matter where we start,

Â if we simulate our Markov chain long enough,

Â we will get eventually samples from the desired distribution y, [SOUND]

Â