[MUSIC] Hello, this is Thomas Hartung from the Bloomberg School of Public Health. Welcome to our lecture series on evidence-based toxicology. Today, we are going to talk about. Something fundamental in science, which is quality, and one of the most important reason for low quality science is bias. So, we want to discuss how we can handle risk of bias in evidence-based toxicology to improve our risk assessments and the risk management which follows from it. In the first part of the lecture, I would like to introduce the concept of risk of bias and the role it plays in the critical appraisal of studies. In several of the lectures of the series, heard about systematic reviews and the different steps it takes. What we are doing today is fundamental to the fifth step which is the assessment of study quality, and here, several aspects are important, the risk of bias, the quality of the evidence and the strength of the evidence. I'm going to talk today, mainly about risk of bias, and somewhat about quality of evidence. But I will not touch on strength of evidence. I think this is a self evident concept, and we have to handle this also in integration meta analysis of evidence which is part of other lectures. Bias is a problem in science, and we should be clear that bias is different to imprecision, different to aspects of quality, and to aspects of reporting, but is overlapping with all of them. Imprecision is the consequence of random error due to sampling variation. So, it is inherent in the methodologies we are using, and it results in variability. It will impact on the confidence intervals of our measures, so we have a way of handling this quantitatively. The quality of a study is very closely linked to biases because bias will typically lead to low quality, but we can also have bias in well-conducted studies, and there can be flaws in studies then which are not introducing a bias because they are simply not affecting the result. And there's also a close correlation of relationship to reporting. You can use perfect methods, but you can describe them not well, and this will in the end, determine the utility of your work, and will give it a low quality score or will, in the end, consider this something which is biased though it was well done. And just not properly described. As you know, we are developing evidence-based toxicology of the context of evidence-based medicine. The Cochrane collaboration as the forerunner of this. So, I want to introduce for this reason now, study validity, in order to understand what bias is about in the way Cochrane is understanding this. Study validity has two components, it has an external validity and an internal validity aspect. The external validity describes to which extent the study we're analyzing can be generalized and applied, so this depends obviously on the purpose. For a given purpose, one study can be valid, while it is not valid for another purpose. So, this is framing, or the our assessment. And it depends on the overall views, whether this external validity can be assigned to a study or not. Today we want to discuss internal validity which is the inherent quality, the extend of which study minimizes systematic errors or biases, so it's done in high quality, the best possible we can do. So, this is resonating with the methodological or study quality, the extent to which the highest standards have been applied. And this includes aspects which are unlikely to impact on the internal validity directly. Examples of this are, have I done an ethical approval for the study? This will not change the quality but it's an important aspect, obviously for the legitimacy of the study. And that's being aspect like, have I done a prior sample size calculation? I can have a perfectly poet study with enough animals or patients without assessing and calculating before how big my sample is, but I'm avoiding the problems of finding out, I should have used more study objects. And also, has the aspects of reporting again, reporting and the quality are independent, but I can spoil the best study by a bad report. So, why should we assess internal validity? Because low methodological quality studies are more likely biased. They're more likely giving us wrong results. This is an example here of a study which was assessing my cartel infection. Not important to us. But this study did show that those experiments, well, no randomization took place, but likely to deliver higher effects, wrong high effects. And in the same way, if the examiner did know what was the treatment group and what was the non-treated group, so no concealment of this took place. Again, there was likely a higher study effect. And this is demonstrating that consciously and unconsciously, if we do know what the treatment is, if you're not taking care that we have absolutely identical study populations, we might observe practically different results. And this is very important for our systematic review. Because the systematic review conclusions depend directly on this methodological quality. And the entire assessment may be misleading if it's based on internally invalid studies. The old saying, garbage in, garbage out, trash in, trash out. We have to be careful what we include at our systematic assessments. So, how to assess internal validity? And we put a sign here, it's a slippery slope, it's a difficult thing, and it's a permanent struggle to do so. Because what really isn't internally valid and unbiased study, and how to distinguish it from the gray zone of the little mistakes, is a permanent trying for the better. Because only if we have an internally valid study, we can expect that we have a true cause effect of the intervention or exposure we are studying. And I would like just to alert you here again that in our lecture on causation, we will talk more about cause and effect relationships, and how fundamental they are to science, and how important this is in the context of evidence-based toxicology. So, we are need to identify if we assess internal validity, the non-random differences between groups. So, anything which can, beside the treatment or the exposure, be important and impacting on our investigative result. So, we are need to identify aspects which potentially result in bias. We most probably will not be able to identify what they actually did. But we can assess whether they have been considered and controlled for. And this requires that for each aspect of the study, which could result in systematic errors by the researchers, that we need to identify whether there is a risk of no, low, high or unknown bias. And this is the process I want to describe, the tools which are used and which are increasingly translated from evidence based medicine to evidence based toxicology. In the area of evidence based medicine, the actual impact of the various biases has been systematically studied among others by the agency for healthcare research. And there's a document, the Empirical Evidence of Bias and Trials Measuring Treatment Differences. It's a very nice resource because it shows empirically what is the impact of various biases in clinical trials. So the area of randomized clinical trials as we know, the very cause of evidence based medicine has identified four critical biases which are dominating in the discussion. And these are selection bias, performance bias, detection bias and attrition bias and we want to discuss this four in the next slides. So, we do have different types of biases which I'm going to describe a little bit further then expand, and measures to reduce them. And we can assess whether these measures have been taken. So, what is a selection bias? If there is a systematic differences in the baseline characteristics of the groups compared. I'm not exactly having the same patients being treated or not treated. I'm not completely carefully assigning the animals to the different treatment groups that result in homogenous group I am running in to the problem that I have an additional impact on. So it is randomization and the allocation of concealment, and so the hiding of the treatment before the assessment which helps us that the selection bias is minimized. And then have performance biases. Any systematic difference in the exposure to factors other than the dimension of interest plays the role here. We'll come up with some examples in a second and again randomization and blinding is very important because a lot of this comes from our handling and if you don't know which animal, which patient. And also in extension, which of our cell cultures is actually being treated and which are not, helps a lot to not to introduce this type of biases. Then our detection biases. Any systematic difference in how we compare the outcomes, how we measure the outcomes, and randomization and blinding also help us here because if I don't know what was treated how, I will not come with my personal biases consciously unconsciously. Think of the pathologist evaluating a tissue, it makes a tremendous difference whether he or she knows what was treated, what was diseased, and which was not. Then there's attrition bias. Any systematic difference between the groups with regard to dropouts and how they were handled respectively to outcome data. If patients who have been treated are more likely to stop the study, because they had side effects, this will lead to an attrition, will change the basic group compositions which we are comparing to. So it is important that such dropouts, everything which was lost to analysis is reported properly. And then we have the reporting bias, which addresses the systematic differences between reported and unreported outcomes. So if we are not honestly and completely reporting about the outcomes, we are introducing bias in the way we're doing so the over reporting of positive results is a typical example here, and here the publication of protocols. The registration of protocols which is increasingly a standard in clinical medicine, is something which is very important because we tend not to publish things which have had no effect. And this is introducing a tremendous reporting bias. And we should at least be aware that analysts studies or preclinical mechanistic studies with cell cultures and others are as much prone to bias in our reporting. And there's other biases, systematic differences we don't even know about the known unknowns and the unknown unknowns as they have been termed by Rumsfeld. There's many more things but I think we can leave it for the moment to these major influential factors because those which we don't know we cannot really intervene. Let's go through some examples to introduce the concepts lubid more in detail, to put some flesh on the bones. Selection biases. Any known entirely random allocation of individuals/animals, or exposures to the treatment groups in such an example. Let's see for example, if former general population we would compare three different exposures. Each of the exposure here has a nice control. But if before saving money reasons would eliminate two of the control groups and use only the third control group as a control group for all of our comparisons. We are introducing a selection bias which is here most probably economic reasons. We are saving money not testing, but it is introducing a bias if this control is not identical to the two other control groups. So if measures then to help here and these are first of all the random allocation process. We need to generate a sequence of allocating our study objects to the different groups. We have to describe the relevant group characteristics, and to demonstrate similarities that we have properly randomized them into treatment groups, let's say that the weight of our patients, the age of the patients, the severity of disease is not different. Or in case of our animals, it is about the weight of the animals. It is the proper distribution of gender and others. We have to implement measures to guarantee randomness among others by allocation of concealment. So, let's take a few examples for the different study types. If you have a human randomized trial, the assignment of envelopes which might have a different weight for control and treatment. Because if different piece of information to these people, so the enrolling physician tends to assign the treatment to healthier patients, and this is something we obviously must avoid. Human observational studies, to study risk factors of a rare disease, patients are enrolled nationwide. If the controls are enrolled locally, because I have easy access to controls and they are not representative of the patient population I have recruited, I made my life easier but the outcome of the study worse. An experimental animal studies when rats are assigned when they show differences in activity that inactive rats are assigned to one group when the more active ones are assigned to another one. This would introduce a bias. Similar like different weights, different ages, mechanistic studies cell culture work. Various flasks of cells are grown to confluence, but the confluence varies. If the more confluence cells for example, which differ in their common style of communication because the density of the cells is different, are assigned to treatment than to controls, we might have an impact. So you see, in each and every study type, we can make the mistake of selection biases and introducing additional factors. Performance biases. This refers to differences in care of the study groups. For example, if we are treating animals differently the way we are housing them, depending on what treatment they receive. The amount of attention we give them. For example, we are observing the animals more carefully because, we are expecting certain effects of the treatment. And if you want to see it. This can lead to the respective results, and again, randomness and blinding of caregivers and researchers are critical here, to avoid that such conscious and unconscious biases take place. You should note that in some cases this is not possible. If you have interventions which are leading to obvious signs like surgery or compromises as effect of exposure, then it's very difficult to control for this performance bias. So there's a high risk of bias if no such planning can take place. An example for an experimental animal study for such performance biases. Treated animals were placed on a higher shelf in an animal room, and control animals were placed at the lowest shelf. So the environmental conditions might vary. Temperature, light intensity they were systematically different between the two groups and could possibly lead to systematic effect differences. Detection biases, this refers to knowledge of treatment or exposure. Which may affect the outcome assessor's work especially for subjective measurements, pathology is again a good example. If I know what to expect or to look for I'm more likely to find it in these types of settings. And the tools no different to others is, again, the random selection and blinding of outcome assessors if I don't know what the treated animals and the untreated animals were, I'm only based on the factual findings I am confronted with. Again, this is not always possible. Blinding of outcomes assessors can, for example, not be possible if mortality is an outcome, or if the intervention leaves obvious marks on the study group. But it is again important to be aware, report this possible detection bias. An example from an experimental animal study, the outcomes are first assessed for controls and then for treatment animals. A typical mistake done because if the treatment was effective, the outcome assessors will suspect the non random assessment sequence. The blinding is compromised and might introduce significant difference in the outcome assessments. Because it will suddenly notice so far they were all the same, and now I'm suddenly observing certain effects. And this is then reinforcing the assignment of effects. Attrition biases refer to incomplete outcome data, or incomplete accounting of all individual animals which have originally entered the study. Here, complete tracking of all intended to treat patients or animals is important. It's important to report the attrition and also justification for any exclusion where animals humanely killed because of the consequences of exposure or treatment, how was this determined? How were their data included in the data set? And how have these then been handled? When we have incomplete treatment groups, at the end of our study, what have we done with the drops outs, is the animals we had to sacrifice during the treatment. Again, an example, the group of treated animals, showing aggressive behavior was not included in the data analysis. There might be reason to do so but inclusion of this group would have led to a different outcome assessment or could possibly have, and it's important that such type of decisions are at least justified and documented. And then reporting biases. Selective reporting, the omission of certain outcomes is important here. We don't really have tools for our animal studies or tools for our mechanistic cell culture type of work. But in the clinical field, it is more and more common to publish and register protocols already so that the scientific communities aware that certain treatments studies took place. They can understand the publication buyers of the studies which have resulted in positive results better if they understand out of how many trials they are actually reporting. This was a pretty interesting article in the British Medical Journal which interviewed trialist trying to understand how they handled outcome reporting and how they could be biased. And this is just one example of somebody reporting yeah when we looked at the data it actually showed an increase in harm amongst those who got the active treatment. And we ditched it because we weren't expecting it. And we were concerned that the presentation of these data would have an impact on people's understanding of the study findings. So, this is obviously unacceptable. It is demonstrating, however, that people are not always conscious about the consequences of not reporting certain sub-findings, or not reporting the overall study outcome once it is negative or is not meeting expectations. So, how does a risk of bias look in practice, simply explained it is the four eyes principle it is not, I have looked at it twice as this cartoon says it is obviously that everything has to be done by. Two people, two reviewers independently assessing each study. If conflicting results come out, you need either consensus or a third person in order to resolve this. It is important that risk of bias is condensed into something which is analyzable or visualizing it. Figures and tables are very helpful here and we will see some examples in the next section. It is also verify to consider data synthesis for example, carrying out sensitivity analysis which means to try to understand which of the parameters are critically influencing results which are not. But this is something we have to leave for more specialized discussions. In 2014, a study published in the environmental health perspectives showed as how risk of bias assessments could translate to toxicology. They carried actually a systematic review of proposes for risk of bias assessment, and concluded in the end that the review highlights a number of risk of bias assessment criteria that have been empirically tested in animal research, including randomization, concealment of allocation, blinding, and accounting for all animals. In addition, there is a need for empirically testing additional methodological criteria and assessing the validity and reliability of a standard risk of bias assessment instrument. So, they took a snapshot where we are in the year of 2014 with regard to risk of bias tools for the animal studies, which are so typical for toxicological assessments. And it was a systematic review. So you have already seen similar types of graphs before. They identified 3,731 studies which were potentially relevant based on search criteria. They were able to do first screening, excluding the large majority of articles based on review of the abstracts which left them with 88 citations. For which then a full text evaluation took place. These studies could then be subtracted to inclusion criteria, studies which reviewed preexisting instrument only end, I think it's reporting on the applications of the instrument. This left the study with two more included after in assessment of the bibliography of this 60 studies with total 30. Hits and these studies have been systematically analyzed for what are they considering for risk of bias on animal studies of toxicology. I don't want you, and you cannot even read this slide, but want you might be able to see is that a lot of the risk of biases which we identified were not used in all of the different studies. Yes, Y, does indicate it has been considered in a given study, represented by the different lines. And no does not. And we see a lot of no's, which means there is not yet consensus about what are biases to consider when assessing animal studies. I would not like to leave you with this assessment without telling you this initiated also some discussions. It was only a first starter for discussion. This is a response published in the same year by some members of the Evidence Based Toxicology Consortium which commented on further publications which have not been considered here, which are adding to the discussion of risk of biases. And you will hear later on that the group actually embarked on a broader assessment of quality assessment tools in the field of toxicology. And you will come now in the next sections of our lecture to a variety of tools which help us to assess internal validity, who assess this risk of bias. And I'm going to talk about three of them in more detail. And we'll talk first of all about the Cochrane risk of bias tool which was developed for randomized trials, but which is the grandmother of all of these tools, which is the one which everybody refers and which we are developing further for other types of uses. I'll then talk about SYRCLE's Risk of Bias tool which was specifically developed for experimental animal studies. And I will talk about the National Toxicology programs Office of Health Assessment and Translation Risk of Bias tool which is both considering experimental and human studies, and which is trying to integrate the different evidence streams via using and toxicology. Noteworthy, there's not yet a tool for risk of bias for mechanistic and in vitro studies. So these type of tools will have to be developed hopefully when there's increasing use in integrating evidences of in vitro type of studies cell culture experiments. [MUSIC]