`GTest.Rd`

`GTest`

performs chi-squared contingency table tests
and goodness-of-fit tests.

```
GTest(x, y = NULL, correct = c("none", "williams", "yates"),
p = rep(1/length(x), length(x)), rescale.p = FALSE) <!-- % , simulate.p.value = FALSE, B = 2000 -->
```

- x
a numeric vector or matrix.

`x`

and`y`

can also both be factors.- y
a numeric vector; ignored if

`x`

is a matrix. If`x`

is a factor,`y`

should be a factor of the same length.- correct
one out of

`"none"`

(default),`"williams"`

,`"yates"`

. See Details.- p
a vector of probabilities of the same length of

`x`

. An error is given if any entry of`p`

is negative.- rescale.p
a logical scalar; if

`TRUE`

then p is rescaled (if necessary) to sum to 1. If rescale.p is`FALSE`

, and p does not sum to 1, an error is given.

The G-test is also called "Likelihood Ratio Test" and is asymptotically equivalent to the Pearson ChiSquare-test but not usually used when analyzing 2x2 tables. It is used in logistic regression and loglinear modeling which involves contingency tables. The G-test is also reported in the standard summary of `Desc`

for tables.

If `x`

is a matrix with one row or column, or if `x`

is a
vector and `y`

is not given, then a *goodness-of-fit test*
is performed (`x`

is treated as a one-dimensional
contingency table). The entries of `x`

must be non-negative
integers. In this case, the hypothesis tested is whether the
population probabilities equal those in `p`

, or are all equal if
`p`

is not given.

If `x`

is a matrix with at least two rows and columns, it is
taken as a two-dimensional contingency table: the entries of `x`

must be non-negative integers. Otherwise, `x`

and `y`

must
be vectors or factors of the same length; cases with missing values
are removed, the objects are coerced to factors, and the contingency
table is computed from these. Then G-test is
performed on the null hypothesis that the joint distribution of the
cell counts in a 2-dimensional contingency table is the product of the
row and column marginals.

Test of independence Yates' correction taken from Mike Camann's 2x2 G-test function. Goodness of Fit Yates' correction as described in Zar (2000).

A list with class `"htest"`

containing the following
components:

- statistic
the value the chi-squared test statistic.

- parameter
the degrees of freedom of the approximate chi-squared distribution of the test statistic,

`NA`

if the p-value is computed by Monte Carlo simulation.- p.value
the p-value for the test.

- method
a character string indicating the type of test performed, and whether Monte Carlo simulation or continuity correction was used.

- data.name
a character string giving the name(s) of the data.

- observed
the observed counts.

- expected
the expected counts under the null hypothesis.

Hope, A. C. A. (1968)
A simplified Monte Carlo significance test procedure.
*J. Roy, Statist. Soc. B* **30**, 582–598.

Patefield, W. M. (1981)
Algorithm AS159. An efficient method of generating r x c tables
with given row and column totals.
*Applied Statistics* **30**, 91–97.

Agresti, A. (2007)
*An Introduction to Categorical Data Analysis, 2nd ed.*,
New York: John Wiley & Sons.
Page 38.

Sokal, R. R., F. J. Rohlf (2012) *Biometry: the principles and practice of statistics in biological research*. 4th edition. W. H. Freeman and Co.: New York. 937 pp.

```
## From Agresti(2007) p.39
M <- as.table(rbind(c(762, 327, 468), c(484,239,477)))
dimnames(M) <- list(gender=c("M","F"),
party=c("Democrat","Independent", "Republican"))
(Xsq <- GTest(M)) # Prints test summary
#>
#> Log likelihood ratio (G-test) test of independence without correction
#>
#> data: M
#> G = 30.017, X-squared df = 2, p-value = 0.0000003034
#>
Xsq$observed # observed counts (same as M)
#> party
#> gender Democrat Independent Republican
#> M 762 327 468
#> F 484 239 477
Xsq$expected # expected counts under the null
#> Democrat Independent Republican
#> M 703.6714 319.6453 533.6834
#> F 542.3286 246.3547 411.3166
## Testing for population probabilities
## Case A. Tabulated data
x <- c(A = 20, B = 15, C = 25)
GTest(x)
#>
#> Log likelihood ratio (G-test) goodness of fit test
#>
#> data: x
#> G = 2.5267, X-squared df = 2, p-value = 0.2827
#>
GTest(as.table(x)) # the same
#>
#> Log likelihood ratio (G-test) goodness of fit test
#>
#> data: as.table(x)
#> G = 2.5267, X-squared df = 2, p-value = 0.2827
#>
x <- c(89,37,30,28,2)
p <- c(40,20,20,15,5)
try(
GTest(x, p = p) # gives an error
)
#> Error in GTest(x, p = p) : probabilities must sum to 1.
# works
p <- c(0.40,0.20,0.20,0.19,0.01)
# Expected count in category 5
# is 1.86 < 5 ==> chi square approx.
GTest(x, p = p) # maybe doubtful, but is ok!
#>
#> Log likelihood ratio (G-test) goodness of fit test
#>
#> data: x
#> G = 5.8414, X-squared df = 4, p-value = 0.2113
#>
## Case B. Raw data
x <- trunc(5 * runif(100))
GTest(table(x)) # NOT 'GTest(x)'!
#>
#> Log likelihood ratio (G-test) goodness of fit test
#>
#> data: table(x)
#> G = 5.2361, X-squared df = 4, p-value = 0.2639
#>
```