Create a table summarizing continuous, categorical and dichotomous variables, optionally stratified by one or more variables, while performing adequate statistical tests.
a data.frame containing all the variables to be included in the table.
the grouping variable.
logical. If set to TRUE
(default), a row with the
group sizes will be inserted as first row of the table.
a vector of column names for the result table.
a vector of variable names to be placed in the first column instead of the real names.
logical (default TRUE
), defines whether the results
should also be displayed for the whole, ungrouped variable.
the character on whose position the strings will be aligned.
Left alignment can be requested by setting sep = "\l"
, right
alignment by "\r"
and center alignment by "\c"
. Mind the
backslashes, as if they are omitted, strings would be aligned to the
character l, r or c respectively. Default value
is "\l"
, thus left alignment.
the function to be used as location and dispersion measure for
numeric (including integer) variables (mean
/sd
is default,
alternatives as median
/IQR
are possible by defining a
function). See examples.
a list of functions to be used to test the variables. Must be
named as "num"
, "cat"
and "dich"
and be defined as
function with arguments (x, g)
, generating something similar to a
p-value. Use TEST=NA
to suppress test. (See examples.)
one out of "high"
(default) or "low"
, defining
which value of a dichotomous numeric or logical variable should be reported.
Usually this will be 1
or TRUE
. Setting it to "low"
will report the lower value 0
or FALSE
.
format codes for absolute, numeric and percentage values, and for the p-values of the tests.
a character matrix
In research the characteristics of study populations are often characterised through some kind of a "Table 1", containing descriptives of the used variables, as mean/standard deviation for continuous variables, and proportions for categorical variables. In many cases, a comparison is made between groups within the framework of the scientific question.
var Brent Camden Westminster n 474 (39.5
(31.8
Butcher 72 (15.2
(58.2
11 (2.9
77 (20.3
(50.3
exact test, "') Chi-Square test Signif. codes: 0 '***' 0.001 '**' 0.01 '*'
0.05 '.' 0.1 ' ' 1
Creating such a table can be very time consuming and there's a need for a
flexible function that helps us to solve the task. TOne()
is designed
to be easily used with sensible defaults, and yet flexible enough to allow
free definition of the essential design elements.
This is done by breaking down the descriptive task to three types of variables: quantitative (numeric, integer), qualitative (factor, characters) and dichotomous variables (the latter having exactly two values or levels). Depending on the variable type, the descriptives and the according sensible tests are chosen. By default mean/sd are chosen to describe numeric variables.
FUN = function(x) gettextf("
Format(mean(x, na.rm = TRUE), digits = 1), Format(sd(x, na.rm = TRUE),
digits = 3))
Their difference is tested with the Kruskal-Wallis test. For categorical
variables the absolute and relative frequencies are calculated and tested
with a chi-square test.
The tests can be changed with the argument
TEST
. These must be organised as list containing elements named
"num"
, "cat"
and "dich"
. Each of them must be a
function with arguments (x, g)
, returning something similar to a
p-value.
TEST = list( num = list(fun = function(x,
g){summary(aov(x ~ g))[[1]][1, "Pr(>F)"]}, lbl = "ANOVA"), cat = list(fun =
function(x, g){chisq.test(table(x, g))$p.val}, lbl = "Chi-Square test"),
dich = list(fun = function(x, g){fisher.test(table(x, g))$p.val}, lbl =
"Fisher exact test"))
The legend text of the test, which is appended to
the table together with the significance codes, can be set with the variable
lbl
.
Great importance was attached to the free definition of the number formats.
By default, the optionally definable format templates of DescTools
are used. Deviations from this can be freely passed as arguments to the
function. Formats can be defined for integers, floating point numbers,
percentages and for the p-values of statistical tests. All options of the
function Format()
are available and can be provided as a list.
See examples which show several different implementations.
fmt = list(abs = Fmt("abs"), num = Fmt("num"), per = Fmt("per"), pval =
as.fmt(fmt = "*", na.form = " "))
The function returns a character matrix as result, which can easily be
subset or combined with other matrices. An interface for
ToWrd()
is available such that the matrix can be transferred
to MS-Word. Both font and alignment are freely selectable in the Word table.
WrdTable()
, ToWrd.TOne()
options(scipen = 8)
opt <- DescToolsOptions()
# define some special formats for count data, percentages and numeric results
# (those will be supported by TOne)
Fmt(abs = as.fmt(digits = 0, big.mark = "'")) # counts
#> $abs
#> Description: Number format for counts
#> Definition: digits=0, big.mark="'"
#> Example: 314'159
#>
#> $per
#> Description: Percentage number format
#> Definition: digits=1, fmt='%'
#> Example: 31415926.5%
#>
#> $num
#> Description: Number format for floats
#> Definition: digits=3, big.mark="'"
#> Example: 314'159.265
#>
#> $nob
#> Description: Number format
#> Definition: digits=5, na.form='nodat'
#> Example: 314159.26536
#>
Fmt(per = as.fmt(digits = 1, fmt = "%")) # percentages
#> $abs
#> Description: Number format
#> Definition: digits=0, big.mark="'"
#> Example: 314'159
#>
#> $per
#> Description: Percentage number format
#> Definition: digits=1, fmt='%'
#> Example: 31415926.5%
#>
#> $num
#> Description: Number format for floats
#> Definition: digits=3, big.mark="'"
#> Example: 314'159.265
#>
#> $nob
#> Description: Number format
#> Definition: digits=5, na.form='nodat'
#> Example: 314159.26536
#>
Fmt(num = as.fmt(digits = 1, big.mark = "'")) # numeric
#> $abs
#> Description: Number format
#> Definition: digits=0, big.mark="'"
#> Example: 314'159
#>
#> $per
#> Description: Number format
#> Definition: digits=1, fmt='%'
#> Example: 31415926.5%
#>
#> $num
#> Description: Number format for floats
#> Definition: digits=3, big.mark="'"
#> Example: 314'159.265
#>
#> $nob
#> Description: Number format
#> Definition: digits=5, na.form='nodat'
#> Example: 314159.26536
#>
TOne(x = d.pizza[, c("temperature", "delivery_min", "driver", "wine_ordered")],
grp = d.pizza$quality)
#>
#> var total low medium high
#> n 1'008 156 (15.5%) 356 (35.3%) 496 (49.2%)
#> temperature 47.9 (9.9) 32.9 (7.8) 45.6 (7.4) 53.6 (6.5) *** '
#> delivery_min 25.7 (10.8) 33.9 (11.7) 26.5 (10.1) 22.6 (9.5) *** '
#> driver *** ""
#> Butcher 79 (8.0%) 10 (6.5%) 36 (10.1%) 33 (6.7%)
#> Carpenter 225 (22.6%) 59 (38.1%) 90 (25.4%) 76 (15.4%)
#> Carter 196 (19.4%) 11 (7.1%) 72 (20.3%) 113 (22.9%)
#> Farmer 94 (9.7%) 10 (6.5%) 26 (7.3%) 58 (11.7%)
#> Hunter 130 (13.0%) 8 (5.2%) 43 (12.1%) 79 (16.0%)
#> Miller 109 (10.4%) 16 (10.3%) 35 (9.9%) 58 (11.7%)
#> Taylor 171 (16.9%) 41 (26.5%) 53 (14.9%) 77 (15.6%)
#> wine_ordered (= 1) 161 (16.1%) 32 (20.8%) 63 (17.9%) 66 (13.4%) . ""
#> ---
#> ') Kruskal-Wallis test, ") Fisher exact test, "") Chi-Square test
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
# the same but no groups now...
TOne(x = d.pizza[, c("temperature", "delivery_min", "driver", "wine_ordered")])
#>
#> var total
#> n 1'209
#> temperature 47.9 (9.9)
#> delivery_min 25.7 (10.8)
#> driver
#> Butcher 96 (8.0%)
#> Carpenter 272 (22.6%)
#> Carter 234 (19.4%)
#> Farmer 117 (9.7%)
#> Hunter 156 (13.0%)
#> Miller 125 (10.4%)
#> Taylor 204 (16.9%)
#> wine_ordered (= 1) 187 (15.6%)
#>
# define median/IQR as describing functions for the numeric variables
TOne(iris[, -5], iris[, 5],
FUN = function(x) {
gettextf("%s / %s",
Format(median(x, na.rm = TRUE), digits = 1),
Format(IQR(x, na.rm = TRUE), digits = 3))
}
)
#>
#> var total setosa versicolor virginica
#> n 150 50 (33.3%) 50 (33.3%) 50 (33.3%)
#> Sepal.Length 5.8 / 1.300 5.0 / 0.400 5.9 / 0.700 6.5 / 0.675 *** '
#> Sepal.Width 3.0 / 0.500 3.4 / 0.475 2.8 / 0.475 3.0 / 0.375 *** '
#> Petal.Length 4.3 / 3.500 1.5 / 0.175 4.3 / 0.600 5.5 / 0.775 *** '
#> Petal.Width 1.3 / 1.500 0.2 / 0.100 1.3 / 0.300 2.0 / 0.500 *** '
#> ---
#> ') Kruskal-Wallis test, ") Fisher exact test, "") Chi-Square test
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
# replace kruskal.test by ANOVA and report the p.value
# Change tests for all the types
TOne(x = iris[, -5], grp = iris[, 5],
FUN = function(x) gettextf("%s / %s",
Format(mean(x, na.rm = TRUE), digits = 1),
Format(sd(x, na.rm = TRUE), digits = 3)),
TEST = list(
num = list(fun = function(x, g){summary(aov(x ~ g))[[1]][1, "Pr(>F)"]},
lbl = "ANOVA"),
cat = list(fun = function(x, g){chisq.test(table(x, g))$p.val},
lbl = "Chi-Square test"),
dich = list(fun = function(x, g){fisher.test(table(x, g))$p.val},
lbl = "Fisher exact test")),
fmt = list(abs = Fmt("abs"), num = Fmt("num"), per = Fmt("per"),
pval = as.fmt(fmt = "*", na.form = " "))
)
#>
#> var total setosa versicolor virginica
#> n 150 50 (33.3%) 50 (33.3%) 50 (33.3%)
#> Sepal.Length 5.8 / 0.828 5.0 / 0.352 5.9 / 0.516 6.6 / 0.636 *** '
#> Sepal.Width 3.1 / 0.436 3.4 / 0.379 2.8 / 0.314 3.0 / 0.322 *** '
#> Petal.Length 3.8 / 1.765 1.5 / 0.174 4.3 / 0.470 5.6 / 0.552 *** '
#> Petal.Width 1.2 / 0.762 0.2 / 0.105 1.3 / 0.198 2.0 / 0.275 *** '
#> ---
#> ') ANOVA, ") Fisher exact test, "") Chi-Square test
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
t1 <- TOne(x = d.pizza[,c("temperature", "driver", "rabate")],
grp = d.pizza$area,
align = " ",
total = FALSE,
FUN = function(x) gettextf("%s / %s (%s)",
Format(mean(x, na.rm = TRUE), digits = 1),
Format(sd(x, na.rm = TRUE), digits = 3),
Format(median(x, na.rm = TRUE), digits = 1)),
TEST = NA,
fmt = list(abs = as.fmt(big.mark = " ", digits=0),
num = as.fmt(big.mark = " ", digits=1),
per = as.fmt(fmt=function(x)
StrPad(Format(x, fmt="%", d=1), width=5, adj = "r")),
pval = as.fmt(fmt = "*", na.form = " "))
)
# add a userdefined legend
attr(t1, "legend") <- "numeric: mean / sd (median)), factor: n (n%)"
t1
#>
#> var Brent Camden Westminster
#> n 474 (39.5%) 344 (28.7%) 381 (31.8%)
#> temperature 51.1 / 8.734 (53.4) 47.4 / 10.111 (50.3) 44.3 / 9.836 (45.9)
#> driver
#> Butcher 72 (15.2%) 1 ( 0.3%) 22 ( 5.8%)
#> Carpenter 29 ( 6.1%) 19 ( 5.6%) 221 (58.2%)
#> Carter 177 (37.4%) 47 (13.8%) 5 ( 1.3%)
#> Farmer 19 ( 4.0%) 87 (25.5%) 11 ( 2.9%)
#> Hunter 128 (27.1%) 4 ( 1.2%) 24 ( 6.3%)
#> Miller 6 ( 1.3%) 41 (12.0%) 77 (20.3%)
#> Taylor 42 ( 8.9%) 142 (41.6%) 20 ( 5.3%)
#> rabate (= TRUE) 235 (50.3%) 172 (50.3%) 184 (48.7%)
#> ---
#> numeric: mean / sd (median)), factor: n (n%)
#>
# dichotomous integer or logical values can be reported by the high or low value
x <- sample(x = c(0, 1), size = 100, prob = c(0.3, 0.7), replace = TRUE)
y <- sample(x = c(0, 1), size = 100, prob = c(0.3, 0.7), replace = TRUE) == 1
z <- factor(sample(x = c(0, 1), size = 100, prob = c(0.3, 0.7), replace = TRUE))
g <- sample(x = letters[1:4], size = 100, replace = TRUE)
d.set <- data.frame(x = x, y = y, z = z, g = g)
TOne(d.set[1:3], d.set$g, intref = "low")
#> Warning: Chi-squared approximation may be incorrect
#> Warning: Chi-squared approximation may be incorrect
#> Warning: Chi-squared approximation may be incorrect
#>
#> var total a b c d
#> n 100 23 (23.0%) 23 (23.0%) 34 (34.0%) 20 (20.0%)
#> x (= 0) 23 (23.0%) 7 (30.4%) 3 (13.0%) 11 (32.4%) 2 (10.0%) ""
#> y (= FALSE) 23 (23.0%) 6 (26.1%) 2 (8.7%) 10 (29.4%) 5 (25.0%) ""
#> z (= 0) 24 (24.0%) 6 (26.1%) 5 (21.7%) 11 (32.4%) 2 (10.0%) ""
#> ---
#> ') Kruskal-Wallis test, ") Fisher exact test, "") Chi-Square test
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
TOne(d.set[1:3], d.set$g, intref = "high")
#> Warning: Chi-squared approximation may be incorrect
#> Warning: Chi-squared approximation may be incorrect
#> Warning: Chi-squared approximation may be incorrect
#>
#> var total a b c d
#> n 100 23 (23.0%) 23 (23.0%) 34 (34.0%) 20 (20.0%)
#> x (= 1) 77 (77.0%) 16 (69.6%) 20 (87.0%) 23 (67.6%) 18 (90.0%) ""
#> y (= TRUE) 77 (77.0%) 17 (73.9%) 21 (91.3%) 24 (70.6%) 15 (75.0%) ""
#> z (= 1) 76 (76.0%) 17 (73.9%) 18 (78.3%) 23 (67.6%) 18 (90.0%) ""
#> ---
#> ') Kruskal-Wallis test, ") Fisher exact test, "") Chi-Square test
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
# intref would not control factors, use relevel to change reported value
TOne(data.frame(z = relevel(z, "1")), g)
#> Warning: Chi-squared approximation may be incorrect
#>
#> var total a b c d
#> n 100 23 (23.0%) 23 (23.0%) 34 (34.0%) 20 (20.0%)
#> z (= 0) 24 (24.0%) 6 (26.1%) 5 (21.7%) 11 (32.4%) 2 (10.0%) ""
#> ---
#> ') Kruskal-Wallis test, ") Fisher exact test, "") Chi-Square test
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
TOne(data.frame(z = z), g)
#> Warning: Chi-squared approximation may be incorrect
#>
#> var total a b c d
#> n 100 23 (23.0%) 23 (23.0%) 34 (34.0%) 20 (20.0%)
#> z (= 1) 76 (76.0%) 17 (73.9%) 18 (78.3%) 23 (67.6%) 18 (90.0%) ""
#> ---
#> ') Kruskal-Wallis test, ") Fisher exact test, "") Chi-Square test
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
options(opt)
if (FALSE) { # \dontrun{
# Send the whole stuff to Word
wrd <- GetNewWrd()
ToWrd(
TOne(x = d.pizza[, c("temperature", "delivery_min", "driver", "wine_ordered")],
grp = d.pizza$quality,
fmt = list(num=Fmt("num", digits=1))
),
font = list(name="Arial narrow", size=8),
align = c("l","r") # this will be recycled: left-right-left-right ...
)
} # }