Goodman Kruskal Lambda

Calculate symmetric and asymmetric Goodman Kruskal lambda and their confidence intervals. Lamdba is a measure of proportional reduction in error in cross tabulation analysis. For any sample with a nominal independent variable and dependent variable (or ones that can be treated nominally), it indicates the extent to which the modal categories and frequencies for each value of the independent variable differ from the overall modal category and frequency, i.e. for all values of the independent variable together

Lambda(x, y = NULL, direction = c("symmetric", "row", "column"), conf.level = NA, ...)

Arguments

x: a numeric vector, a matrix or a table.
y: NULL (default) or a vector with compatible dimensions to x. If y is provided, table(x, y, ...) is calculated.
direction: type of lambda. Can be one out of "symmetric" (default), "row", "column" (abbreviations are allowed). If direction is set to "row" then Lambda(R|C) (column dependent) will be reported. See details.
conf.level: confidence level for the returned confidence interval, restricted to lie between 0 and 1.
...: further arguments are passed to the function table, allowing i.e. to set
useNA = c("no", "ifany", "always").

Details

Asymmetric lambda is interpreted as the probable improvement in predicting the column variable Y given knowledge of the row variable X.
The nondirectional lambda is the average of the two asymmetric lambdas, Lambda(C|R) and Lambda(R|C). Lambda (asymmetric and symmetric) has a scale ranging from 0 to 1.

Data can be passed to the function either as matrix or data.frame in x, or as two numeric vectors x and y. In the latter case table(x, y, ...) is calculated. Thus NAs are handled the same way as table does. Note that tables are by default calculated without NAs (which breaks the package's law to in general not omit NAs silently). The specific argument useNA can be passed via the ... argument.
PairApply can be used to calculate pairwise lambdas.

Value

if no confidence intervals are requested: the estimate as numeric value

else a named numeric vector with 3 elements

lambda: estimate
lwr.ci: lower confidence interval
upr.ci: upper confidence interval

References

Agresti, A. (2002) Categorical Data Analysis. John Wiley & Sons

Goodman, L. A., Kruskal W. H. (1979) Measures of Association for Cross Classifications. New York: Springer-Verlag (contains articles appearing in J. Amer. Statist. Assoc. in 1954, 1959, 1963, 1972).
http://www.nssl.noaa.gov/users/brooks/public_html/feda/papers/goodmankruskal1.pdf (might be outdated)

Liebetrau, A. M. (1983) Measures of Association, Sage University Papers Series on Quantitative Applications in the Social Sciences, 07-004. Newbury Park, CA: Sage, pp. 17–24

Author

Andri Signorell <andri@signorell.net> based on code from Antti Arppe <antti.arppe@helsinki.fi>,
Nanina Anderegg (confidence interval symmetric lambda)

Examples

# example from Goodman Kruskal (1954)
m <- as.table(cbind(c(1768,946,115), c(807,1387,438), c(189,746,288), c(47,53,16)))
dimnames(m) <- list(paste("A", 1:3), paste("B", 1:4))
m
#>      B 1  B 2  B 3  B 4
#> A 1 1768  807  189   47
#> A 2  946 1387  746   53
#> A 3  115  438  288   16

# direction default is "symmetric"
Lambda(m)
#> [1] 0.2076188
Lambda(m, conf.level=0.95)
#>    lambda    lwr.ci    upr.ci 
#> 0.2076188 0.1871747 0.2280629 

Lambda(m, direction="row")
#> [1] 0.2241003
Lambda(m, direction="column")
#> [1] 0.1923949