Now, we'll talk a little bit about accuracy and representativeness.

Since randomisation is our gold standard,

we will strive to ensure randomisation as much as possible;

because randomisation also gives us representativeness.

And representativeness is when the surveillance results

accurately reflect the trends in the target population.

So, if we look at these targets here,

we can see that this one has a high accuracy and a high precision.

This means that we are on the target.

It means that our sample actually reflects the true value in the population.

But it also means that we are quite certain about it.

So this is very good. If we get there, it's very good.

Often, that is not possible,

but it is what we aim at in the end.

Here for instance, we have a low accuracy and a high precision.

This is probably like the worst case,

because we are off target.

Our sample is way off the true value.

And we are also very certain that it's up here.

So we are really misjudging the sample estimates that we get.

Here, it's a bit better.

We have a low precision,

but our accuracy is high because we are within the target,

even though we're not too sure about it because of the low precision,

the true value lies within our sample.

And finally, over here we are both off target and also not very certain.

And this is of course also kind of a worst case scenario.

Precision means how certain we are.

So the closer the bullets are here,

the most certain we are.

And precision is very much controlled by sample size,

which I will talk a little bit about now.

So, I'm often asked by people that design a study what is the appropriate sample size?

And they really just want a number like, 47.

But of course you cannot just give them a number without considering a lot of things.

You need to really consider the surveillance objectives.

So, do we want to detect a disease for instance only or do we also want to follow trends,

and what is the expected occurrence in the population?

For instance, what is the expected prevalence?

If it's very low and we want to detect and be sure to detect,

obviously we need to take more samples then if the prevalence is high.

And it's totally important how it's measured,

if we're measuring cases,

if we are measuring counts, gene counts and so on,

will also influence how many samples we need to take.

Variability between and within epidemiological units should also be considered.

The more variation we have between individuals or sampling units,

the more samples we need to take in order to show that there is actually a difference.

And the same goes for the genetic variation

between microorganisms that we want to survey.

So sometimes if we are on fairly new ground, as we often are with metagenomic projects,

we really need pilot studies to investigate the variation before we kind of

Settle ourselves on a sample size.

This is just showing the very statistics behind sample sizes.

Each bar shows or each line shows at different true prevalences,

how many samples is needed to get a certain probability

over here to actually detect if a disease is present.

So for instance, if we have this here,

the red one is the true prevalence of five percent.

How many samples do we need to detect

a disease occurring in that population with 95 percent probability?

We will need around 60 samples.

And for instance, if we take the one with a prevalence of

One, the purple one, and we have 95 and we kind of go down here,

we will need around 300 samples, so the green one.

So, as you can see it's pretty depending on

sample size and if you have a very low prevalence you will need

much more than a thousand samples to actually get a high probability of detection.

Also, sometimes you want to show if there's

a difference in prevalence between two populations.

So here, we have two populations;

one where the true value is point two and the other is point four,

but if you only take ten samples and we then know

our margin of error or our uncertainty about the true value,

we can really not distinguish these two populations.

So, we cannot say that they are different.

On the other hand, if we take 100 samples then suddenly we get much more precise,

our margin of error gets much smaller,

and we can clearly see that there is a difference between the two populations.

In microbiological studies and also in animal studies,

sometimes it can be an advantage to pool samples.

Because it saves costs and time.

So by pooling we mean that,

if we for instance want to say something about the farm,

we want to take samples from the pigs,

for instance random samples from different pigs.

And then, we want to pool all these samples into the same sample,

to a single sample, maybe more samples,

but still we pool more samples into fewer samples and then we want to analyze

the fewer samples because it's more time efficient and cost efficient.

However, the disadvantage of this can be that we reduce the methodological sensitivity.

Because, if now we say it’s only this pig that has

the infection that we're looking for, and when we are then pooling all the samples we

Are actually diluting the samples and our chance of

actually detecting that infection might then be lowered.

However, if we want to say something at

the farm level or at a upper epidemiological unit level,

then it makes sense to pool if the loss of sensitivity is not too big.

And particularly also if you're only interested in detection and we are not

really interested in knowing how many pigs in the farm are infected.

But really, if it's present in the farm or not.

Very briefly, just to illustrate the effect of pooling.

This graph shows the expected prevalence

when we have 60 samples and the number of positive pools that we find.

And the dark red one is really corresponding to taking six individual samples.

So, the pool size is just one.

And then we have three, five, twelve and thirty.

And basically, what this shows is kind of logic.

But as you increase your pool size,

the more samples you pool together,

the more uncertain you get on the results.

So at a pool size of three is obviously more similar to the original with one,

than a pool size of 30.

It should take 60 samples and you have only analyzed two pools.

Then, you are very uncertain.

So, this is of course something to

consider if you're deciding to do a pool sampling study.

Finally, about sensitivity and specificity of surveillance.

Sensitive surveillance is characterized by

the ability to detect in a timely manner changes over time and space,

but also change in the genetic patterns of

the microbiological population that you're looking for.

And over here, we see that this is

the true status of the population, whether its present or

Absent, and these other surveillance results.

So, whether the surveillance, (let's say again,

it's pig herds), whether they are positive or negative.

So, if the surveillance accurately identify true positives,

and they do that the vast majority of time, it has a high sensitivity because

sensitivity is defined as true prevalence

divided by true prevalence plus false negative.

On the other hand, specificity is defined

as the true negative divided by the false positive plus the true negative.

So it says something about how good the surveillance system is actually to

not detect those that are really also correctly negative.

Sensitivity and specificity, particularly sensitivity,

depends on occurrence and sample size.

Again, if we want to detect something with a low prevalence and we want to

have many true positives then we also need to increase the sample size.

But lab methods and also

the Bioinformatic approaches that you will hear more about is important.

Because lab methods may turn into false negative and false positive.

In bioinformatics, we rely on mapping up

to reference genes and sometimes these can also be

a false positive and false negative results depending on

how much of the genome do we want to have to match up

to the reference gene in order to conclude that

for instance the microorganism is present in the sample or not.

But you will hear much more about that later in the course.

This is just to introduce the terms sensitivity and specificity.

This is just references that I've used.

Also, references for images and figures and then I just want to

say thank you for listening in.