Shannon Entropy and Mutual Information

Computes Shannon entropy and the mutual information of two variables. The entropy quantifies the expected value of the information contained in a vector. The mutual information is a quantity that measures the mutual dependence of the two random variables.

Entropy(x, y = NULL, base = 2, ...)

MutInf(x, y, base = 2, ...)

Arguments

x: a vector or a matrix of numerical or categorical type. If only x is supplied it will be interpreted as contingency table.
y: a vector with the same type and dimension as x. If y is not NULL then the entropy of table(x, y, ...) will be calculated.
base: base of the logarithm to be used, defaults to 2.
...: further arguments are passed to the function table, allowing i.e. to set useNA.

Details

The Shannon entropy equation provides a way to estimate the average minimum number of bits needed to encode a string of symbols, based on the frequency of the symbols.
It is given by the formula \(H = - \sum(\pi log(\pi))\) where \(\pi\) is the probability of character number i showing up in a stream of characters of the given "script".
The entropy is ranging from 0 to Inf.

Value

a numeric value.

References

Shannon, Claude E. (July/October 1948). A Mathematical Theory of Communication, Bell System Technical Journal 27 (3): 379-423.

Ihara, Shunsuke (1993) Information theory for continuous systems, World Scientific. p. 2. ISBN 978-981-02-0985-8.

Author

Andri Signorell <andri@signorell.net>

Examples


Entropy(as.matrix(rep(1/8, 8)))
#> [1] 3

# http://r.789695.n4.nabble.com/entropy-package-how-to-compute-mutual-information-td4385339.html
x <- as.factor(c("a","b","a","c","b","c")) 
y <- as.factor(c("b","a","a","c","c","b")) 

Entropy(table(x), base=exp(1))
#> [1] 1.098612
Entropy(table(y), base=exp(1))
#> [1] 1.098612
Entropy(x, y, base=exp(1))
#> [1] 1.791759

# Mutual information is 
Entropy(table(x), base=exp(1)) + Entropy(table(y), base=exp(1)) - Entropy(x, y, base=exp(1))
#> [1] 0.4054651
MutInf(x, y, base=exp(1))
#> [1] 0.4054651

Entropy(table(x)) + Entropy(table(y)) - Entropy(x, y)
#> [1] 0.5849625
MutInf(x, y, base=2)
#> [1] 0.5849625

# http://en.wikipedia.org/wiki/Cluster_labeling
tab <- matrix(c(60,10000,200,500000), nrow=2, byrow=TRUE)
MutInf(tab, base=2) 
#> [1] 0.0002806552

d.frm <- Untable(as.table(tab))
str(d.frm)
#> 'data.frame':	510260 obs. of  2 variables:
#>  $ Var1: Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ Var2: Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
#>  - attr(*, "out.attrs")=List of 2
#>   ..$ dim     : int [1:2] 2 2
#>   ..$ dimnames:List of 2
#>   .. ..$ Var1: chr [1:2] "Var1=A" "Var1=B"
#>   .. ..$ Var2: chr [1:2] "Var2=A" "Var2=B"
MutInf(d.frm[,1], d.frm[,2])
#> [1] 0.0002806552

table(d.frm[,1], d.frm[,2])
#>    
#>          A      B
#>   A     60  10000
#>   B    200 500000

MutInf(table(d.frm[,1], d.frm[,2]))
#> [1] 0.0002806552


# Ranking mutual information can help to describe clusters
#
#   r.mi <- MutInf(x, grp)
#   attributes(r.mi)$dimnames <- attributes(tab)$dimnames
# 
#   # calculating ranks of mutual information
#   r.mi_r <- apply( -r.mi, 2, rank, na.last=TRUE )
#   # show only first 6 ranks
#   r.mi_r6 <- ifelse( r.mi_r < 7, r.mi_r, NA) 
#   attributes(r.mi_r6)$dimnames <- attributes(tab)$dimnames
#   r.mi_r6