Freq.Rd
Calculates absolute and relative frequencies of a vector x
. Continuous (numeric) variables
will be cut using the same logic as used by the function hist
.
Categorical variables will be aggregated by table
. The result will contain single and cumulative frequencies for both, absolute values and percentages.
the variable to be described, can be any atomic type.
either a numeric vector of two or more cut points or a single number (greater than or equal to 2)
giving the number of intervals into which x is to be cut. Default taken from the function hist()
.
This is ignored if x is not of numeric type.
logical, indicating if an x[i] equal to the lowest (or highest, for right = FALSE
) "breaks"
value should be included. Ignored if x is not of numeric type.
how should the result be ordered? Default is "level"
, other choices are 'by frequency' ("descending"
or "ascending"
)
or 'by name of the levels' ("name"
). The argument can be abbreviated. This is ignored if x is numeric.
one out of "no"
, "ifany"
, "always"
. Defines whether to include extra NA
levels in the table.
Defaults to "no"
which is the table()
default too.
integer, determining the number of digits used to format the relative frequencies.
further arguments are passed to the function cut()
. Use dig.lab
to control the format of numeric group names. Use the argument right
to define if the intervals should be closed on the right (and open on the left) or vice versa.
In print.Freq
the dots are not used.
By default only the valid cases are considered for the frequencies, say NA
values are excluded. (This is in accordance with the default behavior of the R function table
, which seemed a reasonable reference.) If the NA
s should be included you can set the useNA
argument to either "ifany"
or "always"
.
For numeric variables, if breaks
is specified as a single number, the range of the data is divided into breaks pieces of equal length,
and then the outer limits are moved away by 0.1% of the range to ensure that the extreme values both fall
within the break intervals.
(If x
is a constant vector, equal-length intervals are created that cover the single value.) See cut
.
an object of type "Freq"
, which is basically a data.frame with 5 columns (earning a specific print routine), containing the following components:
factor. The levels of the grouping variable.
integer. The absolute frequencies.
numeric. The relative frequencies (percent).
integer. The cumulative sum of the absolute frequencies.
numeric. The cumulative sum of the relative frequencies.
data(d.pizza)
# result is a data.frame
d.freq <- Freq(d.pizza$price)
d.freq
#> level freq perc cumfreq cumperc
#> 1 [0,10] 4 0.3% 4 0.3%
#> 2 (10,20] 96 8.0% 100 8.4%
#> 3 (20,30] 183 15.3% 283 23.6%
#> 4 (30,40] 147 12.3% 430 35.9%
#> 5 (40,50] 263 22.0% 693 57.9%
#> 6 (50,60] 169 14.1% 862 72.0%
#> 7 (60,70] 119 9.9% 981 82.0%
#> 8 (70,80] 109 9.1% 1'090 91.1%
#> 9 (80,90] 68 5.7% 1'158 96.7%
#> 10 (90,100] 22 1.8% 1'180 98.6%
#> 11 (100,110] 7 0.6% 1'187 99.2%
#> 12 (110,120] 6 0.5% 1'193 99.7%
#> 13 (120,130] 3 0.3% 1'196 99.9%
#> 14 (130,140] 1 0.1% 1'197 100.0%
# it is printed by default with 3 digits for the percent values,
# but the number of digits can be defined in the print function
print(d.freq, digits=5)
#> level freq perc cumfreq cumperc
#> 1 [0,10] 4 0.33417% 4 0.33417%
#> 2 (10,20] 96 8.02005% 100 8.35422%
#> 3 (20,30] 183 15.28822% 283 23.64244%
#> 4 (30,40] 147 12.28070% 430 35.92314%
#> 5 (40,50] 263 21.97160% 693 57.89474%
#> 6 (50,60] 169 14.11863% 862 72.01337%
#> 7 (60,70] 119 9.94152% 981 81.95489%
#> 8 (70,80] 109 9.10610% 1'090 91.06099%
#> 9 (80,90] 68 5.68087% 1'158 96.74185%
#> 10 (90,100] 22 1.83793% 1'180 98.57978%
#> 11 (100,110] 7 0.58480% 1'187 99.16458%
#> 12 (110,120] 6 0.50125% 1'193 99.66583%
#> 13 (120,130] 3 0.25063% 1'196 99.91646%
#> 14 (130,140] 1 0.08354% 1'197 100.00000%
# sorted by frequency
Freq(d.pizza$driver, ord="desc")
#> level freq perc cumfreq cumperc
#> 1 Carpenter 272 22.6% 272 22.6%
#> 2 Carter 234 19.4% 506 42.0%
#> 3 Taylor 204 16.9% 710 59.0%
#> 4 Hunter 156 13.0% 866 71.9%
#> 5 Miller 125 10.4% 991 82.3%
#> 6 Farmer 117 9.7% 1'108 92.0%
#> 7 Butcher 96 8.0% 1'204 100.0%
# sorted by name using all the observations, say including NAs
Freq(d.pizza$driver, ord="name", useNA="ifany")
#> level freq perc cumfreq cumperc
#> 1 Butcher 96 7.9% 96 7.9%
#> 2 Carpenter 272 22.5% 368 30.4%
#> 3 Carter 234 19.4% 602 49.8%
#> 4 Farmer 117 9.7% 719 59.5%
#> 5 Hunter 156 12.9% 875 72.4%
#> 6 Miller 125 10.3% 1'000 82.7%
#> 7 Taylor 204 16.9% 1'204 99.6%
#> 8 <NA> 5 0.4% 1'209 100.0%
# percentages and cumulative frequencies for a vector of count data
Freq(as.table(c(2,4,12,8)))
#> level freq perc cumfreq cumperc
#> 1 A 2 7.7% 2 7.7%
#> 2 B 4 15.4% 6 23.1%
#> 3 C 12 46.2% 18 69.2%
#> 4 D 8 30.8% 26 100.0%