Frequency Table for a Single Variable

Calculates absolute and relative frequencies of a vector x. Continuous (numeric) variables will be cut using the same logic as used by the function hist. Categorical variables will be aggregated by table. The result will contain single and cumulative frequencies for both, absolute values and percentages.

Freq(x, breaks = hist(x, plot = FALSE)$breaks, include.lowest = TRUE,
     ord = c("level", "desc", "asc", "name"),
     useNA = c("no", "ifany", "always"), ...)

# S3 method for class 'Freq'
print(x, digits = NULL, ...)

Arguments

x: the variable to be described, can be any atomic type.
breaks: either a numeric vector of two or more cut points or a single number (greater than or equal to 2) giving the number of intervals into which x is to be cut. Default taken from the function hist(). This is ignored if x is not of numeric type.
include.lowest: logical, indicating if an x[i] equal to the lowest (or highest, for right = FALSE) "breaks" value should be included. Ignored if x is not of numeric type.
ord: how should the result be ordered? Default is "level", other choices are 'by frequency' ("descending" or "ascending") or 'by name of the levels' ("name"). The argument can be abbreviated. This is ignored if x is numeric.
useNA: one out of "no", "ifany", "always". Defines whether to include extra NA levels in the table. Defaults to "no" which is the table() default too.
digits: integer, determining the number of digits used to format the relative frequencies.
...: further arguments are passed to the function cut(). Use dig.lab to control the format of numeric group names. Use the argument right to define if the intervals should be closed on the right (and open on the left) or vice versa.
In print.Freq the dots are not used.

Details

By default only the valid cases are considered for the frequencies, say NA values are excluded. (This is in accordance with the default behavior of the R function table, which seemed a reasonable reference.) If the NAs should be included you can set the useNA argument to either "ifany" or "always".

For numeric variables, if breaks is specified as a single number, the range of the data is divided into breaks pieces of equal length, and then the outer limits are moved away by 0.1% of the range to ensure that the extreme values both fall within the break intervals. (If x is a constant vector, equal-length intervals are created that cover the single value.) See cut.

Value

an object of type "Freq", which is basically a data.frame with 5 columns (earning a specific print routine), containing the following components:

level: factor. The levels of the grouping variable.
freq: integer. The absolute frequencies.
perc: numeric. The relative frequencies (percent).
cumfreq: integer. The cumulative sum of the absolute frequencies.
cumperc: numeric. The cumulative sum of the relative frequencies.

Author

Andri Signorell <andri@signorell.net>

Examples

data(d.pizza)

# result is a data.frame
d.freq <- Freq(d.pizza$price)
d.freq
#>         level  freq   perc  cumfreq  cumperc
#> 1      [0,10]     4   0.3%        4     0.3%
#> 2     (10,20]    96   8.0%      100     8.4%
#> 3     (20,30]   183  15.3%      283    23.6%
#> 4     (30,40]   147  12.3%      430    35.9%
#> 5     (40,50]   263  22.0%      693    57.9%
#> 6     (50,60]   169  14.1%      862    72.0%
#> 7     (60,70]   119   9.9%      981    82.0%
#> 8     (70,80]   109   9.1%    1'090    91.1%
#> 9     (80,90]    68   5.7%    1'158    96.7%
#> 10   (90,100]    22   1.8%    1'180    98.6%
#> 11  (100,110]     7   0.6%    1'187    99.2%
#> 12  (110,120]     6   0.5%    1'193    99.7%
#> 13  (120,130]     3   0.3%    1'196    99.9%
#> 14  (130,140]     1   0.1%    1'197   100.0%

# it is printed by default with 3 digits for the percent values,
# but the number of digits can be defined in the print function
print(d.freq, digits=5)
#>         level  freq       perc  cumfreq     cumperc
#> 1      [0,10]     4   0.33417%        4    0.33417%
#> 2     (10,20]    96   8.02005%      100    8.35422%
#> 3     (20,30]   183  15.28822%      283   23.64244%
#> 4     (30,40]   147  12.28070%      430   35.92314%
#> 5     (40,50]   263  21.97160%      693   57.89474%
#> 6     (50,60]   169  14.11863%      862   72.01337%
#> 7     (60,70]   119   9.94152%      981   81.95489%
#> 8     (70,80]   109   9.10610%    1'090   91.06099%
#> 9     (80,90]    68   5.68087%    1'158   96.74185%
#> 10   (90,100]    22   1.83793%    1'180   98.57978%
#> 11  (100,110]     7   0.58480%    1'187   99.16458%
#> 12  (110,120]     6   0.50125%    1'193   99.66583%
#> 13  (120,130]     3   0.25063%    1'196   99.91646%
#> 14  (130,140]     1   0.08354%    1'197  100.00000%

# sorted by frequency
Freq(d.pizza$driver, ord="desc")
#>        level  freq   perc  cumfreq  cumperc
#> 1  Carpenter   272  22.6%      272    22.6%
#> 2     Carter   234  19.4%      506    42.0%
#> 3     Taylor   204  16.9%      710    59.0%
#> 4     Hunter   156  13.0%      866    71.9%
#> 5     Miller   125  10.4%      991    82.3%
#> 6     Farmer   117   9.7%    1'108    92.0%
#> 7    Butcher    96   8.0%    1'204   100.0%

# sorted by name using all the observations, say including NAs
Freq(d.pizza$driver, ord="name", useNA="ifany")
#>        level  freq   perc  cumfreq  cumperc
#> 1    Butcher    96   7.9%       96     7.9%
#> 2  Carpenter   272  22.5%      368    30.4%
#> 3     Carter   234  19.4%      602    49.8%
#> 4     Farmer   117   9.7%      719    59.5%
#> 5     Hunter   156  12.9%      875    72.4%
#> 6     Miller   125  10.3%    1'000    82.7%
#> 7     Taylor   204  16.9%    1'204    99.6%
#> 8       <NA>     5   0.4%    1'209   100.0%

# percentages and cumulative frequencies for a vector of count data
Freq(as.table(c(2,4,12,8)))
#>    level  freq   perc  cumfreq  cumperc
#> 1      A     2   7.7%        2     7.7%
#> 2      B     4  15.4%        6    23.1%
#> 3      C    12  46.2%       18    69.2%
#> 4      D     8  30.8%       26   100.0%