Cohen's Kappa and Weighted Kappa

Computes the agreement rates Cohen's kappa and weighted kappa and their confidence intervals.

CohenKappa(x, y = NULL, weights = c("Unweighted", "Equal-Spacing", "Fleiss-Cohen"),
           conf.level = NA, ...)

Arguments

x

can either be a numeric vector or a confusion matrix. In the latter case x must be a square matrix.

y

NULL (default) or a vector with compatible dimensions to x. If y is provided, table(x, y, ...) is calculated. In order to get a square matrix, x and y are coerced to factors with synchronized levels. (Note, that the vector interface can not be used together with weights.)

weights

either one out of "Unweighted" (default), "Equal-Spacing", "Fleiss-Cohen", which will calculate the weights accordingly, or a user-specified matrix having the same dimensions as x containing the weights for each cell.

conf.level

confidence level of the interval. If set to NA (which is the default) no confidence intervals will be calculated.

...

further arguments are passed to the function table, allowing i.e. to set useNA. This refers only to the vector interface.

Details

Cohen's kappa is the diagonal sum of the (possibly weighted) relative frequencies, corrected for expected values and standardized by its maximum value.
The equal-spacing weights (see Cicchetti and Allison 1971) are defined by $$1 - \frac{|i - j|}{r - 1}$$ r being the number of columns/rows, and the Fleiss-Cohen weights by $$1 - \frac{(i - j)^2}{(r - 1)^2}$$ The latter attaches greater importance to closer disagreements.

Data can be passed to the function either as matrix or data.frame in x, or as two numeric vectors x and y. In the latter case table(x, y, ...) is calculated. Thus NAs are handled the same way as table does. Note that tables are by default calculated without NAs. The specific argument useNA can be passed via the ... argument.
The vector interface (x, y) is only supported for the calculation of unweighted kappa. This is because we cannot ensure a safe construction of a confusion table for two factors with different levels, which is independent of the order of the levels in x and y. So weights might lead to inconsistent results. The function will raise an error in this case.

Value

if no confidence intervals are requested: the estimate as numeric value

else a named numeric vector with 3 elements

kappa: estimate
lwr.ci: lower confidence interval
upr.ci: upper confidence interval

References

Cohen, J. (1960) A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37-46.

Everitt, B.S. (1968), Moments of statistics kappa and weighted kappa. The British Journal of Mathematical and Statistical Psychology, 21, 97-103.

Fleiss, J.L., Cohen, J., and Everitt, B.S. (1969), Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72, 332-327.

Cicchetti, D.V., Allison, T. (1971) A New Procedure for Assessing Reliability of Scoring EEG Sleep Recordings American Journal of EEG Technology, 11, 101-109.

Author

David Meyer <david.meyer@r-project.org>, some changes and tweaks Andri Signorell <andri@signorell.net>

Examples

# from Bortz et. al (1990) Verteilungsfreie Methoden in der Biostatistik, Springer, pp. 459
m <- matrix(c(53,  5, 2,
              11, 14, 5,
               1,  6, 3), nrow=3, byrow=TRUE,
            dimnames = list(rater1 = c("V","N","P"), rater2 = c("V","N","P")) )

# confusion matrix interface
CohenKappa(m, weight="Unweighted")
#> [1] 0.4285714

# vector interface
x <- Untable(m)
CohenKappa(x$rater1, x$rater2, weight="Unweighted")
#> [1] 0.4285714

# pairwise Kappa
rating <- data.frame(
  rtr1 = c(4,2,2,5,2, 1,3,1,1,5, 1,1,2,1,2, 3,1,1,2,1, 5,2,2,1,1, 2,1,2,1,5),
  rtr2 = c(4,2,3,5,2, 1,3,1,1,5, 4,2,2,4,2, 3,1,1,2,3, 5,4,2,1,4, 2,1,2,3,5),
  rtr3 = c(4,2,3,5,2, 3,3,3,4,5, 4,4,2,4,4, 3,1,1,4,3, 5,4,4,4,4, 2,1,4,3,5),
  rtr4 = c(4,5,3,5,4, 3,3,3,4,5, 4,4,3,4,4, 3,4,1,4,5, 5,4,5,4,4, 2,1,4,3,5),
  rtr5 = c(4,5,3,5,4, 3,5,3,4,5, 4,4,3,4,4, 3,5,1,4,5, 5,4,5,4,4, 2,5,4,3,5),
  rtr6 = c(4,5,5,5,4, 3,5,4,4,5, 4,4,3,4,5, 5,5,2,4,5, 5,4,5,4,5, 4,5,4,3,5)
)

PairApply(rating, FUN=CohenKappa, symmetric=TRUE)
#>            rtr1      rtr2      rtr3      rtr4      rtr5       rtr6
#> rtr1 1.00000000 0.6511628 0.3838254 0.2583436 0.1881919 0.08088235
#> rtr2 0.65116279 1.0000000 0.6311475 0.4392523 0.3633952 0.17105263
#> rtr3 0.38382542 0.6311475 1.0000000 0.7260274 0.6401799 0.33333333
#> rtr4 0.25834363 0.4392523 0.7260274 1.0000000 0.8569157 0.51923077
#> rtr5 0.18819188 0.3633952 0.6401799 0.8569157 1.0000000 0.64824121
#> rtr6 0.08088235 0.1710526 0.3333333 0.5192308 0.6482412 1.00000000

# Weighted Kappa
cats <- c("<10%", "11-20%", "21-30%", "31-40%", "41-50%", ">50%")
m <- matrix(c(5,8,1,2,4,2, 3,5,3,5,5,0, 1,2,6,11,2,1,
              0,1,5,4,3,3, 0,0,1,2,5,2, 0,0,1,2,1,4), nrow=6, byrow=TRUE,
            dimnames = list(rater1 = cats, rater2 = cats) )
CohenKappa(m, weight="Equal-Spacing")
#> [1] 0.3156685


# supply an explicit weight matrix
ncol(m)
#> [1] 6
(wm <- outer(1:ncol(m), 1:ncol(m), function(x, y) {
        1 - ((abs(x-y)) / (ncol(m)-1)) } ))
#>      [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,]  1.0  0.8  0.6  0.4  0.2  0.0
#> [2,]  0.8  1.0  0.8  0.6  0.4  0.2
#> [3,]  0.6  0.8  1.0  0.8  0.6  0.4
#> [4,]  0.4  0.6  0.8  1.0  0.8  0.6
#> [5,]  0.2  0.4  0.6  0.8  1.0  0.8
#> [6,]  0.0  0.2  0.4  0.6  0.8  1.0
CohenKappa(m, weight=wm, conf.level=0.95)
#>     kappa    lwr.ci    upr.ci 
#> 0.3156685 0.1968117 0.4345252 


# however, Fleiss, Cohen and Everitt weight similarities
fleiss <- matrix(c(
  106, 10,  4,
  22,  28, 10,
   2,  12,  6
  ), ncol=3, byrow=TRUE)

#Fleiss weights the similarities
weights <- matrix(c(
 1.0000, 0.0000, 0.4444,
 0.0000, 1.0000, 0.6666,
 0.4444, 0.6666, 1.0000
 ), ncol=3)

CohenKappa(fleiss, weights)
#> [1] 0