Generate Dummy Codes for a Factor

Generate a matrix of dummy codes (class indicators) for a given factor.

Dummy(x, method = c("treatment", "sum", "helmert", "poly", "full"),
      base = 1, levels = NULL)

Arguments

x: factor or vector of classes for cases.
method: defines the method of the contrasts being formed. Can be one out of "treatment", "sum", "helmert", "poly", "full", whereas "treatment" is the default one. Abbreviations are accepted.
The option "full" returns a full set of class indicators, say a dummy factor for each level of x. Note that this would be redundant for lm() and friends!
base: an integer specifying which group is considered the baseline group.
levels: an optional vector of the values (as character strings) that x might have taken. The default is the unique set of values taken by as.character(x), sorted into increasing order of x.
This is directly passed on to factor.

Details

For reverting dummy codes see the approach in the examples below.

Value

a matrix with the dummy codes. The number of rows correspond to the number of elements in x and the number of columns to the number of its levels - 1, respectively to the number of levels given as argument -1.

When method = "full" is chosen the number of columns will correspond to the number of levels.

References

Venables, W N and Ripley, B D (2002): Modern Applied Statistics with S. Fourth edition. Springer.

Author

Andri Signorell <andri@signorell.net>

Examples

x <- c("red","blue","green","blue","green","red","red","blue")
Dummy(x)
#>   green red
#> 1     0   1
#> 2     0   0
#> 3     1   0
#> 4     0   0
#> 5     1   0
#> 6     0   1
#> 7     0   1
#> 8     0   0
#> attr(,"base")
#> [1] "blue"
Dummy(x, base=2)
#>   blue red
#> 1    0   1
#> 2    1   0
#> 3    0   0
#> 4    1   0
#> 5    0   0
#> 6    0   1
#> 7    0   1
#> 8    1   0
#> attr(,"base")
#> [1] "green"

Dummy(x, method="sum")
#>   green red
#> 1    -1  -1
#> 2     1   0
#> 3     0   1
#> 4     1   0
#> 5     0   1
#> 6    -1  -1
#> 7    -1  -1
#> 8     1   0
#> attr(,"base")
#> [1] "blue"


y <- c("Max","Max","Max","Max","Max","Bill","Bill","Bill")

Dummy(y)
#>   Max
#> 1   1
#> 2   1
#> 3   1
#> 4   1
#> 5   1
#> 6   0
#> 7   0
#> 8   0
#> attr(,"base")
#> [1] "Bill"
Dummy(y, base="Max")
#>   Bill
#> 1    0
#> 2    0
#> 3    0
#> 4    0
#> 5    0
#> 6    1
#> 7    1
#> 8    1
#> attr(,"base")
#> [1] "Max"

Dummy(y, base="Max", method="full")
#>   Bill Max
#> 1    0   1
#> 2    0   1
#> 3    0   1
#> 4    0   1
#> 5    0   1
#> 6    1   0
#> 7    1   0
#> 8    1   0
#> attr(,"base")
#> [1] NA


# "Undummy" (revert the dummy coding)
m <- Dummy(y, method="full")
m
#>   Bill Max
#> 1    0   1
#> 2    0   1
#> 3    0   1
#> 4    0   1
#> 5    0   1
#> 6    1   0
#> 7    1   0
#> 8    1   0
#> attr(,"base")
#> [1] NA
z <- apply(m, 1, function(x) colnames(m)[x==1])
z
#>      1      2      3      4      5      6      7      8 
#>  "Max"  "Max"  "Max"  "Max"  "Max" "Bill" "Bill" "Bill" 
identical(y, as.vector(z))
#> [1] TRUE

m <- Dummy(y)
m
#>   Max
#> 1   1
#> 2   1
#> 3   1
#> 4   1
#> 5   1
#> 6   0
#> 7   0
#> 8   0
#> attr(,"base")
#> [1] "Bill"
z <- apply(m, 1, function(x) ifelse(sum(x)==0, attr(m,"base"), colnames(m)[x==1]))
z
#>      1      2      3      4      5      6      7      8 
#>  "Max"  "Max"  "Max"  "Max"  "Max" "Bill" "Bill" "Bill"