In this section, we will talk about statistics and bioinformatics applied in evidence-based approaches. After attending this lesson, you should be able to define biometry and biostatistics. You should know what are the difference between descriptive and inferential statistics. You should know about what are typical types of data and effect measures in evidence-based approaches. What is a p-value, α, β, and power in inferential statistics, and why it's not allowed to compare p-values in evidence-based approaches. So in general, statistics can be defined as a collection, analysis, interpretation, and presentation of data. Biostatistics, sometimes also called biometry, is a section of statistics which deals with the application of statistical analysis to biological data. And in general, biometry, biostatistics can be divided into four parts which is, study design, descriptive methods, which means statistic that quantitatively describe and summarize features of a collection of information. And inferential statistics, which normally means that you are testing hypothesis and deriving population estimates, so things which are normally end with a p-value. And also, tools of data mining. As a start, I would like to look a little bit into data scales of biometrical data, and then in the next step, apply that to the specific types of data and effect measures in evidence-based approaches. In general, biometrical data can be divided into two groups, which is categorical and numerical. So let's first talk a bit about numerical. So, you can have continuous data, which means it is not centered on the left or the right side. Typical examples are OD measurement, Optical Density measurement, height, and blood pressure. And on the other hand, you have discrete measurements, that means things that you can count. Which means amount of lectures attended, for example, days ill pay year, or fraction of cells which have been transformed in an essay. This is called a discrete distribution and it can be characterized that you have integer numbers. On the other hand, you have categorical data, which is divided into nominal and ordinal scales. On the nominal scale, you have categories which do not have an order, or they cannot be ranked. They are equal to each other, on an equal level, which is for example, sex, blood group, hair colour. On the other hand, you have ordinal data, which have clear given ranks, which is for example, a disease stage, cancer stage, toxic potency of a substance, or things which can in general be classified as bad, medium, good. After this general introduction of data types, now I would like to have a look a little bit more at the different types of data, give you some examples for it, and the possible effect measures in evidence-based approaches and meta-analysis. So in total, we have five categories of data. This is very much following to what we have seen on the slide before. So first of all, you can have dichotomous data, which means categorical data, which are things like yes/no, dead/alive, clinical improvement yes or no, and these data normally can be given on so-called 2×2 cross tables, and you will see later examples of 2×2 cross tables. Typical effective measures for that would be the relative risk, odds ratio, or risk difference. The relative risk is defined as the risk in the experimental group divided by the risk in the control group. Odds ratio is defined as the odds in the experimental group divided by the odds in the control group. And the risk difference is the risk in the experimental group minus the risk in the control group. Second type of data is continuous data, like weight, blood pressure, IQ, things like that. Typically effect measures for that would be mean difference, or more precisely said, difference of the means, or standardized mean differences. The third category is ordinal data, which means things which can be classified and ranked, like mild, moderate, or severe. Typical effect measures for that would be proportional odds ratios, or they are simply treated like continuous data, or a third way to deal with them would be to reduce the categories to dichotomous data and then apply the effect measures which we have seen for dichotomous data. Fourth type of data are so-called counts and rate. This is, in general, called Poisson data, because they are following a Poisson distribution. And this is normally used for events that can happen more than one time during the exposition, or adverse reaction. And for that, typical effect measures are rates, rates ratios, or they are also simply treated like continuous data as we have seen before. The fifth category is time-to-event. This is also called survival analysis, and this data is normally characterized that it is censored. Not all participants in a study show an event, that means for example, not all the participants in the study have died before the end of the study. Typical example for that would be cancer occurrence over time, and effect measures that you normally use for that are hazard ratios, or you collapse the data at one time point and induce it by that dichotomous data. You should all have heard about the p-value. And now I would like to define formally the underlying statistical test theory, and why a p-value cannot be used in evidence-based approaches to compare the outcome of different studies. So what is the statistical test? The basis of the statistical test are hypothesis, which means that zero hypothesis, and normally an alternative hypothesis, are called H1. So typical question which are underlying these two hypothesis would be, have female different shoe sizes than males? So the zero hypothesis in this case would be the arithmetic mean of the female shoe size is the same like the arithmetic mean of the male shoe size, as you can see in here, H0, versus the alternative hypothesis that the arithmetic mean of the female shoe size is different from the arithmetic mean of the male shoe size. So, statistical test is a decision procedure. You have two choices in the end. Which means you state, and still believe in the zero hypothesis, and the validity of the zero hypothesis, or you leave the zero hypothesis and decide for the validity of the alternative hypothesis. If you stay with the zero hypothesis, this is called a weak decision, and if you are able to leave the zero hypothesis and believe in the validity of the alternative hypothesis, this is called a strong decision. But why is that so? Let's have a look at the 2×2 table, which are given on the right lower side of the slide. So, you can see that four entries which are characterized by two columns and two rows. So, you can make two correct decisions, which means in reality, the zero hypothesis is true and you decided to stay on the zero hypothesis, or the alternative hypothesis is true and you decided to stay with the alternative hypothesis. And you can do two wrong decisions, which is the Alpha error, or the so-called wrong alarm, which means in reality, the zero hypothesis is true but you decided to believe that the alternative hypothesis is true, or the so-called wrong alarm, the Beta error, which means the alternative hypothesis is true but you missed to decide for the alternative hypothesis and you stayed with the zero hypothesis. In the statistical test, we have a dilemma. We can only calculate and estimate the Alpha error, but not the Beta error. So, the famous p-value is an estimate of the Alpha error, and if the estimate of this Alpha error is lower than a given threshold, which is for example, 5%, then we do not believe any longer in the zero hypothesis and we decide the alternative hypothesis is true. However, as I already said, Beta cannot be estimated in parallel. And one minus Beta is the power, which means if we stay on the zero hypothesis, we do not know how big the Beta error is. We have no idea about it, which means we do not know whether we have taken the right decision to stay in the hypothesis or we have done a Beta error, which means we have missed the chance. This is the first row in the 2×2 table. One additional word, as we have in principle two types of statistical tests. One is a parametric test, which means you assume a distribution in the data, and the second one is the so-called non-parametric test, which means you want to first step, transform the data into ranks and then do the statistics on the ranks, and this kind of statistics are much more robust against assumptions in the distribution of the data. Typical examples for that are the Mann-Whitney test. So, why I'm raising that point is because the power of most studies is unknown, but very likely much less than one. And in this situation, the fickle p-value generates irreproducible results, which means the p-value is a random manifestation and has a lot of variability between studies. And therefore, p-values are not the right evidence estimates of studies to be used and compared in evidence-based approaches, but other effect size measures should be used as described before. So, here you can see a sentence originally written in German by a Swiss guy called Paracelsus. "Alle Dinge sind Gift und nichts ist ohne Gift; allein die Dosis macht, dass ein Ding kein Gift ist." All things are poison and nothing is without poison, only the dose makes a thing not a poison. In comparisons to evidence-based medicine approaches, the dose plays a very important role in evidence-based toxicology. And normally, you do not have just results for one dose but for several doses. So, you need to take very much into account what are the doses in the studies, what are the different answers at the different doses. If you do an evidence-based toxicology approach, you need to think very much about the dose and how to deal with it.