In terms of the sizes of the cleats and the subsets, it's probably good to use

exact difference. You run into less problems this way and

if it fits in memory, it's probably going to be very fast.

If one is not in this fortunate situation, we've discussed.

Different types of algorithms that are approximate.

there's the range of algorithm's due message passing over the graph, of which

loopy message passing is one of the more common ones.

But not the only one. And then there is the class of sampling

methods that sample from the distribution.

And these are different categories of algorithms.

And frankly, it's often difficult to tell in advance which of them is going to work

better for a given class of models. and you talk a little bit about factors

that might favor one of these algorithms over another in a subsequent slide in a

couple minutes. What about algorithms for map?

Once again, we have algorithms that perform exact inference and in this case

its actually a broader spectrum then in the case of computing marginals.

So in addition, to cases where we have low tree width, which is the category for

which marginal algorithms work we've also seen other examples such as those with

associative or regular potentials and multiple other cases.

And once again if you can perform exact inference that's really often the best

way to go. you're · going to, it, it's often going

to give you the best performance if it's tractable.

Then there's the class of methods that are based on optimization.

And those can be exact methods which puts us in this category, but more often we're

going to find ourselves in the case where we have to use some kind of approximate

methods such as a duley composition algorithm that we've talked about/ And

those methods often as we've said lend themselves to some kind of at least being

able to estimate the performance of our algorithm relative to the optimum answer

even if we can't get to the optimum answer.

Finally, there is the range of algorithms that perform search over the space.

And this can be simple hill climbing, search which is is a fairly

straightforward application of standard search methodologies, but also it's,

is quite common to use some kind of sampling.

Like, Markov Chain Monte Carlo sampling to explore a range of different

assignments in the space, and then select among those the ones that have the

highest, or the one that has the highest score or

the highest yeah the highest log probability.

And that is quite commonly used technique, it doesn't provide the same

set of guarantees that you might get from the optimization methods.

But it's often very easy to implement and so's quite frequently used.

So if we're resorting to the use of approximate indifference.

What are some of the issues that might make our lives more complicated or might

favor one algorithm versus the other? So one complication has to do with a

connectivity structure. Just how densely connected the model is.

And by and large, the more densely connected the model is, the worse it is

for message-passing algorithms. So, message-passing algorithms don't like

densely connected models because the messages are propagated over very short

cycles, and that can cause both issues with convergence.

As well as with ac, with lack of accuracy.

So, lack of convergence and lack of accuracy.

Sampling methods are less effective by the connectivity structure.

The second com complicating factor is the strength of inference.

That is the extent to which nodes that are connected to each other have tight

coupling in terms of the preference for certain combinations of values.

In general, the stronger the influence, the harder it is for both categories of

oh of algorithms because it creates strong coupling between variables.

Which can complicate both message passing algorithms as well as sampling

algorithms, because for example if you think about plain gib sampling it makes

it very difficult to move away from the current configuration to a different one.

Strength of influence becomes an even worse problem when the influences go in

different directions. So for example, if you have loops where

one path is tending to make these variables take on one configuration.

And a different path is trying to make them take on a different comm combination

of configurations. So, for example, as we saw in the

misconception that where one path wants a pair of variables to agree on their

values and the other wants them to disagree on their values.

That really does create significant problems for both classes of methods.

And the reason for that, and this I think is at the heart of what makes approximate

inference hard, is when you think about the shape of the likelihood function, if

you have multiple peaks in the likelihood function that makes life difficult for

most approximate algorithms and the sharper these peaks the more complicated

things get. And so that is and, and multiple peaks

are generated by strong influences that go in opposite direction.

And so that's really where a lot of the complexity and approximate inference

comes from. Okay, so now what assuming you have a

model that has these problematic these problematic issues.

Well so how do we address them? First, is to look at the network and

identify the problem regions, that is subset of variables that are tightly

coupled and maybe are subject to opposing influences.

And then we try and think of how we might make the inference in these problem

regions. More exact, so if here we have some

tightly coupled region with maybe opposing influences. How do we prevent

our approximate inference algorithm from Falling into the trap of, of, of this

particular area being giving imprecise answers or, or lack of convergence?

So for example, for doing cluster graph methods we can put this entire problem

region in a cluster. It costs us something on the

computational side because we have to deal with larger clusters.

But it might be well worth it in terms of the improvement of performance.

In sampling methods, you might consider having proposal moves over multiple

variables. So, instead of sampling individual

variables, we can sample this entire block.

Using again, a somewhat more expensive sampling procedure, but again that might

end up being well worth while in terms of the overall performance profile of the

algorithm. And finally, if we're thinking of this in

the context of a math problem, we can put this entire set of variables in a single

slave, which again, costs us something, on the computational side, but can

significantly improve the convergence profile of the algorithm.

So really when we're faced with a problematic model, one that doesn't

immediately succumb to traditional inference techniques, it turns out that

the design of inference is often a bit of an art.

That is we need to study our model in depth, think about how to deal with

different pieces of it, and which inference method is best equipped to

handle the different pieces. And we often find that complicated

models. You can get the best performance by a

combination of different inference algorithms, where some pieces are handled

using exact inference, such as these larger plots, in the context perhaps of

an approximate inference method such as sampling or belief propagation.

And so one needs to think creatively about the inference problem in the

context of these more problematic models.