I will chat now with Gordon Guyatt. He's a distinguished University professor in the department of health research methods evidence and impact and medicine at McMaster University in Hamilton Ontario Canada. Dr. Guyatt played a major role in the development of GRADE the Grading of Recommendations Assessment Development Evaluation, a tool to assess the quality of evidence and strength of recommendation along other landmarks in evidence-based medicine dr. Gordon can you share with us the main motivators for the development of GRADE in the year 2000? >> So, proximately a decade maybe two before GRADE started meeting in 2000 the organizations that develop clinical practice guidelines had decided that it might be a good idea to have a formal rating of quality of evidence or strength of recommendations and since one organization got this idea lots of organizations got this idea and that was good probably it was a wise it was a good idea but it had a downside and the downside is that every organization developed their own system for doing this so that there were literally dozens of systems for rating quality of evidence and strength of recommendations all of them were different some of them were more sensible some of them were less sensible but there were so many of them that no clinician could ever make possibly any sense of it so it was a very confusing environment so a group of people got together who were guideline developers and systematic review authors and methodologists and they said "Do you think it would be possible for us to develop a system that was so well thought out and coherent and possibly reasonably easy to use that it could that everybody would buy in and it would replace these dozens of systems and if that were possible and it was a good system then that would clearly make things much easier and easier to understand and it would advance it for users of guidelines and it would advance the whole science of guideline methodology", so that was the motivation. >>One challenge I like to do with students is try to see if they can find a domain that GRADE does not cover about the quality of evidence, how was the process of work, so since you decided it was possible to reach this one tool but how was this process of work that brought us such a complete tool to grade the certainty of evidence? >> Well so the first task that the GRADE working group set itself was to consider all the issues that were relevant to quality or certainty of evidence that was the first task we set ourselves and so the first thing we said is study design and one of the goals was to make a system as simple as possible and so we said all right we're only going to have two types of study designs one is randomised trials and the other is observational studies that's it okay and we decided that there were going to be four levels of quality of evidence high moderate low and very low and the randomized trials would start as high and observational studies would start as low and then we said what possible broad categories or domains might make us lower the quality of the evidence and we eventually ended up with these five domains of risk of bias, imprecision, inconsistency and directness and publication bias, and then we thought if there's an observational study sometimes observational studies can lead to high quality evidence what would make us say an observational study as I called the evidence and came up with two key domains one being the large or very large treatment effect and the other a dose-response gradient and how did it happen that we ended up with a comprehensive approach? The answer is that we met for four years before we came up and we met typically three times a year a group of us at different places in the world interacted in between so there were many... oh! and we went through literally several hundred systematic reviews and looking at them and saying how does it work if we apply it so the the answer to why it was comprehensive in the end is it took a long time it had a lot of smart people working on it and we worked through many many many many examples to make sure the system was working properly. >> So it's not lucky, it's just work! >> Yeah, that's right, a lot of thought, a lot of hard work! >> That's amazing! Do you think systematic reviews of interventions do adopt a GRADE to inform the quality of evidence as they should? >> Well, you know, our goal would be every systematic review ever published uses GRADE and we're a long way from that but very big deal that Cochrane Collaboration asks all its reviewers to use GRADE, that was that's a huge deal to have the leading guideline, sorry, the leading systematic review organization in the world say you've got to be using GRADE and we now have over a hundred and ten organizations in the world that are using GRADE and they include very prestigious organizations like the World Health Organization I've mentioned the Cochrane Collaboration there's a leading electronic textbook worldwide that clinicians use something called up-to-date has over 10,000 graded recommendations its competitor Dinamed that is also using GRADE some major prior organizations like the Scottish intercollegiate guideline organization or sign switched from its prior to GRADE leading American organizations like the American College of Physicians or the American Thoracic Society are using it so the bottom line is that the goal of having everybody every systematic review group in the world and every guideline group using GRADE is actually an insight and I think now then there would be very few people who would even raise the question of whether GRADE is the preeminent approach so it was I told you how carefully it was thought out in the first place and we now have six papers in the BMJ for users of grade to understand it and a series in the journal of clinical epidemiology which is so far up to 22 papers describing details so this is way beyond what anybody else who looks at quality of evidence or strength of recommendations so we have this very detailed guidance so it really... so there's no question that it's the preeminent system and I think more and more people already consider ourselves with this more than 110 organizations including Cochrane we consider ourselves pretty good success and and I think it will gradually everybody will be using it. >> Yeah it's like a wave So since those major players are into them, eventually everybody will be. In your experience what's the main difficulties authors have or researchers have using GRADE? Which domain do you believe is the most difficult to assess? >> Well, the most difficult domain for everybody including the most expert people is "publication bias" because publication bias deals with what is there but you don't know it's there so it's kind of difficult to say to look for what is not there or you don't know it's there that might be there and so you know our way of wording for the other domains is "no serious concern", "serious concerns", "very serious concerns", for publication bias we say "undetected" >> Yeah, or "highly suspected" >>Yeah that's right, "undetected" or "highly suspected", so it's different wording and it acknowledges that publication bias is the most challenging you're absolutely right so at the end, each of the domains has its challenges risk of bias relatively straightforward but it still has the challenge of how bad does the risk of bias have to be and how many studies before you rate it down precision where is your how wide does the confidence interval be before you rated that and so on so there's challenges in all the domains but publication bias is the most difficult. >> Okay! As a subjective method as we were talking each it may happen that the same body of evidence has divergent grading in different systematic reviews for example. What are your views about this and how to criticaly assess the use of GRADE itself? >> Well so in invariably in any system of this sort, judgment is involved and there are going to be close calls right so there's going to be a situation where it might be perfectly reasonable to call it high or moderate or call it moderate or low and indeed one of the things great is thought about is to change our four categories to seven categories in other words high, high/moderate, moderate, moderate/low, and low. because when you think about it quality of evidence is a continuum right and our thresholds you know our cuts between high and moderate, moderate and low, low and very low are somewhat arbitrary and sometimes it's really close and perfectly reasonable to say high and perfectly reasonable to say moderate, so that's one reason for disagreement that is inevitable right when it's when it's close calls and then you have close calls in each domain you can have close calls under risk of bias and imprecision and so on. So it's no surprise that there's going to be disagreement whenever you totally legitimate disagreement whenever there's a close call and then you have some people who frankly don't really know what they're doing when they apply grade and that might be another reason for disagreement but you know at any time there's judgment and one of the points I make about GRADE: GRADE does not mean that there's a right answer and GRADE does not mean that everybody's going to agree what it means is we have an explicit and transparent approach to making the judgments so that everybody's on the same page you're speaking the same language you're making your decisions according to the same criteria very reasonable that people will disagree but they're making decisions on the same criteria that's the great merit of the approach. >> Yeah, you're right! Do you want to highlight any current or planned development of GRADE that you find relevant to a low evidence-based public policy implementation and other relevant fields of decision-making in public health ? >> Yes, well GRADE continues to work and as a matter of fact the success of the system has attracted a lot of people to apply it to new areas or to their particular area so there are close to, we call them project GRADE project groups there are close to 20 GRADE project groups that are working and there are still, there's lots of challenges. So when grade started it was just about therapy or you know treatment one or screening test one or screening procedure one versus another right it was just really focused on their therapeutic decisions and then we found that the grade criteria worked quite well for diagnosis both for diagnostic impact what's the impact of using test one versus test two but also for diagnostic accuracy and we also found that the system works well for prognosis so prognosis is one of the great areas that I'm working in now so we published one paper in the BMJ about prognosis for broad populations and we've now produced great guidance which we're trying to get published for individual risk factors you know age is a risk factor for something what's our quality of evidence about that and we're now just starting to work on quality of evidence for prognostic systems, for clinical decision rules, so that is one area the work on GRADE, there're still work going on with great for diagnostic tests people are exploring great for public health as great really started with a clinical focus so people are thinking about how to apply it people are thinking of how to apply GRADE to animal studies very different you know how can the principles be applied to animal studies and in another area that I'm working on where I think we have some potentially very exciting developments is in network meta-analysis Network meta-analysis only came around a decade ago we published our first paper about GRADE for Network meta-analysis in 2014 and because it's new we are of course learning all the time we've already published updated GRADE guidance for for Network meta-analysis, we think which improves it and another couple of papers related to specific issues and we now are working on what I think might be a breakthrough on how to present Network meta-analysis the summary that comes out of it in a way that makes sense to clinicians and the logically coherent because at the moment we don't. We think, the phrase we use is "we think we may have cracked the nut" and we hope to publish something about this in the next few months. >> Certainly you will help a lot of people! >> I'm quite excited about that when I think it could make a big difference >> Yes, you're right! So what would be your tips for researchers that you use and develop systematic reviews and use GRADE and are interested arranging the quality of evidence of certainty of evidence? >>Well if you want to do it right you need to read the Journal of Clinical Epidemiology Series very carefully and the first 13 articles are for systematic reviewers. So systematic reviewers if you want to do it right you have to read those articles carefully. The last two, if you're in a guideline panel, the last two 14 and 15, are for if you're going to move from certainty or quality of evidence to recommendations and then we're up to 22 now and those be more that are very specific issues like for diagnosis and so on. So you can look at those but the ones they're really important ones for systematic reviews are the first thirteen and then two more and the other thing is ... Oh! Join the GRADE working group, anybody can join the great work I think there's a thousand members as a great work and then you can find and I think many of us are open to questions so if you're doing if you're working in GRADE for a systematic review or a guideline and your stuck, you can get in touch with some of the leaders of the GRADE working group and say "hey could you give us, I don't mind getting such emails I'm interested into helping people out" and you know if you have a methodological issue we're interested in methodological issues, if you have a question about GRADE we can help to sort it out. >> So thank you Gordon for sharing with us such a rich discussion! Bye! >> Bye-bye!