Agreement Between Raters

There are a number of statistics that can be used to determine the reliability of interramas. Different statistics are adapted to different types of measurement. Some options are the common probability of an agreement, Cohens Kappa, Scott`s pi and the Fleiss`Kappa associated with it, inter-rate correlation, correlation coefficient, intra-class correlation and Krippendorff alpha. In order to better understand why the different indicators of coherence and association existing vary according to the approaches, we conducted a simulation study to examine the performance of the different approaches. We generated 1000 random records for each simulation scenario. For each simulated dataset, we generated random effects for 250 subject-100 slices from N (0, 5) and N (0, 1) distributions. According to the equation (1), we used the cumulative distribution function of the standard standard to create the probability that each subject in Category C would be assigned for c-1, …, 5. Using these probabilities, the classification of each subject`s test result was randomly assigned to one of the c-1, …, 5 categories. We simulated seven scenarios in which each scenario varied in the underlying prevalence of the disease, ranging from a low prevalence of the disease, with 80% of Category 1 and 5% category 5, to a high prevalence of the disease with 5% of subjects in category 1 and 80% of subjects in category 5. Prevalence for each of the seven scenarios is shown in Table 4. For each simulated dataset, the following dimensions of match and association are calculated: the average pair of Kappa Cohens, Fleiss` Kappa and Nelson, based on the compliance model, and the average by pairs of Cappa weighted cohens (with weightweight), CCI and Nelson`s model (with square weights) for the association.

The Mielke method was not used because of the large number of advisors (J-100). If two instruments or techniques are used to measure the same variable on a continuous scale, Bland Altman plots can be used to estimate match. This diagram is a diagram of the difference between the two measurements (axis Y) with the average of the two measurements (X axis). It therefore offers a graphic representation of distortion (average difference between the two observers or techniques) with approval limits of 95%. These are indicated by the formula: The field in which you work determines the acceptable level of consent. If it is a sporting competition, you can accept a 60% agreement to nominate a winner. However, if you look at the data from oncologists who choose to take a treatment, you need a much higher agreement – more than 90%. In general, more than 75% are considered acceptable in most areas. In 1968, Cohen introduced a weighted kappa, the proportion of the observed weighted agreement, which was corrected by an agreement that was refused [25]. The general form of the statistic is similar to that of the unweighted version: this data set is considered a classic example of evaluation of the concordance between several advisors each classifying a sample of the test results of the subjects according to an ordinal classification scale [34].

It serves as an ideal dataset, as each of the 118 histological slides of the subjects is evaluated by each of the seven advisors and provides balanced (or complete) data with a relatively small number of spleens (J-7). We were able to apply all methods to this optimal dataset (a subset in Appendix II). All agreements ranged from 0.127 to 0.366, indicating a slight to fair agreement between the seven pathologists [16]. [Table 3] Again, the cohen and Fleiss` Kappa pair averages showed comparable estimates (0.366 and 0.354, respectively). The Mielke method showed a much lower estimate of concordance (0.127) and the Nelson model approach gave a lower estimate (0.266) than Cohen and Fleiss cappa, but higher than Mielke`s lack of conformation among the seven pathologists.