Intraclass Correlations (ICC1, ICC2, ICC3 From Shrout and Fleiss)

The Intraclass correlation is used as a measure of association when studying the reliability of raters. Shrout and Fleiss (1979) outline 6 different estimates, that depend upon the particular experimental design. All are implemented and given confidence limits.

ICC(x, type = c("all", "ICC1", "ICC2", "ICC3", "ICC1k", "ICC2k", "ICC3k"),
    conf.level = NA, na.rm = FALSE)

# S3 method for class 'ICC'
print(x, digits = 3, ...)

Arguments

x: \(n \times m\) matrix or dataframe, k subjects (in rows) m raters (in columns).
type: one out of "all", "ICC1", "ICC2", "ICC3", "ICC1k", "ICC2k", "ICC3k". See details.
conf.level: confidence level of the interval. If set to NA (which is the default) no confidence intervals will be calculated.
na.rm: logical, indicating whether NA values should be stripped before the computation proceeds. If set to TRUE only the complete cases of the ratings will be used. Defaults to FALSE.
digits: number of digits to use in printing
...: further arguments to be passed to or from methods.

Details

Shrout and Fleiss (1979) consider six cases of reliability of ratings done by k raters on n targets.

ICC1	Each target is rated by a different judge and the judges are selected at random.
	(This is a one-way ANOVA fixed effects model and is found by (MSB- MSW)/(MSB+ (nr-1)*MSW))
ICC2	A random sample of k judges rate each target. The measure is one of absolute agreement
	in the ratings. Found as (MSB- MSE)/(MSB + (nr-1)MSE + nr(MSJ-MSE)/nc)
ICC3	A fixed set of k judges rate each target. There is no generalization to a larger population
	of judges. (MSB - MSE)/(MSB+ (nr-1)*MSE)

Then, for each of these cases, is reliability to be estimated for a single rating or for the average of k ratings? (The 1 rating case is equivalent to the average intercorrelation, the k rating case to the Spearman Brown adjusted reliability.)

ICC1 is sensitive to differences in means between raters and is a measure of absolute agreement.

ICC2 and ICC3 remove mean differences between judges, but are sensitive to interactions of raters by judges.
The difference between ICC2 and ICC3 is whether raters are seen as fixed or random effects.

ICC1k, ICC2k, ICC3K reflect the means of k raters.

The intraclass correlation is used if raters are all of the same “class". That is, there is no logical way of distinguishing them. Examples include correlations between pairs of twins, correlations between raters. If the variables are logically distinguishable (e.g., different items on a test), then the more typical coefficient is based upon the inter-class correlation (e.g., a Pearson r) and a statistic such as alpha or omega might be used.

Value

if method is set to "all", then the result will be

results: A matrix of 6 rows and 8 columns, including the ICCs, F test, p values, and confidence limits
summary: The anova summary table
stats: The anova statistics
MSW: Mean Square Within based upon the anova

if a specific type has been defined, the function will first check, whether no confidence intervals are requested: if so, the result will be the estimate as numeric value

else a named numeric vector with 3 elements

ICCx: estimate (name is the selected type of coefficient)
lwr.ci: lower confidence interval
upr.ci: upper confidence interval

References

Shrout, P. E., Fleiss, J. L. (1979) Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin, 86, 420-3428.

McGraw, K. O., Wong, S. P. (1996) Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1, 30-46. + errata on page 390.

Revelle, W. (in prep) An introduction to psychometric theory with applications in R Springer. (working draft available at http://personality-project.org/r/book/

Author

William Revelle <revelle@northwestern.edu>, some editorial amendments Andri Signorell <andri@signorell.net>

Note

The results for the lower and upper Bounds for ICC(2,k) do not match those of SPSS 9 or 10, but do match the definitions of Shrout and Fleiss. SPSS seems to have been using the formula in McGraw and Wong, but not the errata on p 390. They seem to have fixed it in more recent releases (15).

Examples

sf <- matrix(c(
      9, 2, 5, 8,
      6, 1, 3, 2,
      8, 4, 6, 8,
      7, 1, 2, 6,
      10,5, 6, 9,
      6, 2, 4, 7),
      ncol=4, byrow=TRUE,
      dimnames=list(paste("S", 1:6, sep=""), paste("J", 1:4, sep=""))
)

sf  #example from Shrout and Fleiss (1979)
#>    J1 J2 J3 J4
#> S1  9  2  5  8
#> S2  6  1  3  2
#> S3  8  4  6  8
#> S4  7  1  2  6
#> S5 10  5  6  9
#> S6  6  2  4  7
ICC(sf)
#> 
#> Intraclass correlation coefficients 
#>                          type   est F-val df1 df2    p-val lwr.ci upr.ci
#> Single_raters_absolute   ICC1 0.166  1.79   5  18 0.164769     NA     NA
#> Single_random_raters     ICC2 0.290 11.03   5  15 0.000135     NA     NA
#> Single_fixed_raters      ICC3 0.715 11.03   5  15 0.000135     NA     NA
#> Average_raters_absolute ICC1k 0.443  1.79   5  18 0.164769     NA     NA
#> Average_random_raters   ICC2k 0.620 11.03   5  15 0.000135     NA     NA
#> Average_fixed_raters    ICC3k 0.909 11.03   5  15 0.000135     NA     NA
#> 
#>  Number of subjects = 6     Number of raters = 4