[SOUND] So let's plug in these model masses into the ranking function to see what we will get, okay? This is a general smoothing. So a general ranking function for smoothing with subtraction and you have seen this before. And now we have a very specific smoothing method, the JM smoothing method. So now let's see what what's a value for office of D here. And what's the value for p sub c here? Right, so we may need to decide this in order to figure out the exact form of the ranking function. And we also need to figure out of course alpha. So let's see. Well this ratio is basically this, right, so, here, this is the probability of c board on the top, and this is the probability of unseen war or, in other words basically 11 times basically the alpha here, this, so it's easy to see that. This can be then rewritten as this. Very simple. So we can plug this into here. And then here, what's the value for alpha? What do you think? So it would be just lambda, right? And what would happen if we plug in this value here, if this is lambda. What can we say about this? Does it depend on the document? No, so it can be ignored. Right? So we'll end up having this ranking function shown here. And in this case you can easy to see, this a precisely a vector space model because this part is a sum over all the matched query terms, this is an element of the query map. What do you think is a element of the document up there? Well it's this, right. So that's our document left element. And let's further examine what's inside of this logarithm. Well one plus this. So it's going to be nonnegative, this log of this, it's going to be at least 1, right? And these, this is a parameter, so lambda is parameter. And let's look at this. Now this is a TF. Now we see very clearly this TF weighting here. And the larger the count is, the higher the weighting will be. We also see IDF weighting, which is given by this. And we see docking the lan's relationship here. So all these heuristics are captured in this formula. What's interesting that we kind of have got this weighting function automatically by making various assumptions. Whereas in the vector space model, we had to go through those heuristic design in order to get this. And in this case note that there's a specific form. And when you see whether this form actually makes sense. All right so what do you think is the denominator here, hm? This is a math of document. Total number of words, multiplied by the probability of the word given by the collection, right? So this actually can be interpreted as expected account over word. If we're going to draw, a word, from the connection that we model. And, we're going to draw as many as the number of words in the document. If you do that, the expected account of a word, w, would be precisely given by this denominator. So, this ratio basically, is comparing the actual count, here. The actual count of the word in the document with expected count given by this product if the word is in fact following the distribution in the clutch this. And if this counter is larger than the expected counter in this part, this ratio would be larger than one. So that's actually a very interesting interpretation, right? It's very natural and intuitive, it makes a lot of sense. And this is one advantage of using this kind of probabilistic reasoning where we have made explicit assumptions. And, we know precisely why we have a logarithm here. And, why we have these probabilities here. And, we also have a formula that intuitively makes a lot of sense and does TF-IDF weighting and documenting and some others. Let's look at the, the Dirichlet Prior Smoothing. It's very similar to the case of JM smoothing. In this case, the smoothing parameter is mu and that's different from lambda that we saw before. But the format looks very similar. The form of the function looks very similar. So we still have linear operation here. And when we compute this ratio, one will find that is that the ratio is equal to this. And what's interesting here is that we are doing another comparison here now. We're comparing the actual count. Which is the expected account of the world if we sampled meal worlds according to the collection world probability. So note that it's interesting we don't even see docking the lens here and lighter in the JMs model. All right so this of course should be plugged into this part. So you might wonder, so where is docking lens. Interestingly the docking lens is here in alpha sub d so this would be plugged into this part. As a result what we get is the following function here and this is again a sum over all the match query words. And we're against the queer, the query, time frequency here. And you can interpret this as the element of a document vector, but this is no longer a single dot product, right? Because we have this part, I know that n is the name of the query, right? So that just means if we score this function, we have to take a sum over all the query words, and then do some adjustment of the score based on the document. But it's still, it's still clear that it does documents lens modulation because this lens is in the denominator so a longer document will have a lower weight here. And we can also see it has tf here and now idf. Only that this time the form of the formula is different from the previous one in JMs one. But intuitively it still implements TFIDF waiting and document lens rendition again, the form of the function is dictated by the probabilistic reasoning and assumptions that we have made. Now there are also disadvantages of this approach. And that is, there's no guarantee that there's such a form of the formula will actually work well. So if we look about at this geo function, all those TF-IDF waiting and document lens rendition for example it's unclear whether we have sub-linear transformation. Unfortunately we can see here there is a logarithm function here. So we do have also the, so it's here right? So we do have the sublinear transformation, but we do not intentionally do that. That means there's no guarantee that we will end up in this, in this way. Suppose we don't have logarithm, then there's no sub-linear transformation. As we discussed before, perhaps the formula is not going to work so well. So that's an example of the gap between a formal model like this and the relevance that we have to model, which is really a subject motion that is tied to users. So it doesn't mean we cannot fix this. For example, imagine if we did not have this logarithm, right? So we can take a risk and we're going to add one, or we can even add double logarithm. But then, it would mean that the function is no longer a proper risk model. So the consequence of the modification is no longer as predictable as what we have been doing now. So, that's also why, for example, PM45 remains very competitive and still, open channel how to use public risk models as they arrive, better model than the PM25. In particular how do we use query like how to derive a model and that would work consistently better than DM 25. Currently we still cannot do that. Still interesting open question. So to summarize this part, we've talked about the two smoothing methods. Jelinek-Mercer which is doing the fixed coefficient linear interpolation. Dirichlet Prior this is what add a pseudo counts to every word and is doing adaptive interpolation in that the coefficient would be larger for shorter documents. In most cases we can see, by using these smoothing methods, we will be able to reach a retrieval function where the assumptions are clearly articulate. So they are less heuristic. Explaining the results also show that these, retrieval functions. Also are very effective and they are comparable to BM 25 or pm lens adultation. So this is a major advantage of probably smaller where we don't have to do a lot of heuristic design. Yet in the end that we naturally implemented TF-IDF weighting and doc length normalization. Each of these functions also has precise ones smoothing parameter. In this case of course we still need to set this smoothing parameter. There are also methods that can be used to estimate these parameters. So overall, this shows by using a probabilistic model, we follow very different strategies then the vector space model. Yet, in the end, we end up uh,with some retrievable functions that look very similar to the vector space model. With some advantages in having assumptions clearly stated. And then, the form dictated by a probabilistic model. Now, this also concludes our discussion of the query likelihood probabilistic model. And let's recall what assumptions we have made in order to derive the functions that we have seen in this lecture. Well we basically have made four assumptions that I listed here. The first assumption is that the relevance can be modeled by the query likelihood. And the second assumption with med is, are query words are generated independently that allows us to decompose the probability of the whole query into a product of probabilities of old words in the query. And then, the third assumption that we have made is, if a word is not seen, the document or in the late, its probability proportional to its probability in the collection. That's a smoothing with a collection ama model. And finally, we made one of these two assumptions about the smoothing. So we either used JM smoothing or Dirichlet prior smoothing. If we make these four assumptions then we have no choice but to take the form of the retrieval function that we have seen earlier. Fortunately the function has a nice property in that it implements TF-IDF weighting and document machine and these functions also work very well. So in that sense, these functions are less heuristic compared with the vector space model. And there are many extensions of this, this basic model and you can find the discussion of them in the reference at the end of this lecture. [MUSIC]