Large.Rd
Find the kth smallest, resp. largest values from a vector x
and return the values and their frequencies.
Small(x, k = 5, unique = FALSE, na.last = NA)
Large(x, k = 5, unique = FALSE, na.last = NA)
HighLow(x, nlow = 5, nhigh = nlow, na.last = NA)
a numeric
vector
an integer >0 defining how many extreme values should be returned. Default is k = 5
. If k > length(x)
, all values will be returned.
logical, defining if unique values should be considered or not. If this is set to TRUE
, a list with the k
extreme values and their frequencies is returned. Default is FALSE
(as unique is a rather expensive function).
for controlling the treatment of NA
s.
If TRUE
, missing values in the data are put last; if
FALSE
, they are put first; if NA
, they are removed.
a single integer. The number of the smallest elements of a vector to be printed. Defaults to 5.
a single integer. The number of the greatest elements of a vector to be printed. Defaults to the number of nlow
.
This does not seem to be a difficult problem at first sight. We could simply tabulate and sort the vector and finally take the first or last k values. However sorting and tabulating the whole vector when we're just interested in the few smallest values is a considerable waste of resources. This approach becomes already impracticable for medium vector lengths (~105). There are several points and solutions of this problem discussed out there. The present implementation is based on highly efficient C++ code and proved to be very fast.
HighLow combines the two upper functions and reports the k extreme values on both sides together with their frequencies in parentheses. It is used for describing univariate variables and is interesting for checking the ends of the vector, where in real data often wrong values accumulate. This is in essence a printing routine for the highest and the lowest values of x.
if unique
is set to FALSE
: a vector with the k most extreme values,
else: a list, containing the k most extreme values and their frequencies.
x <- sample(1:10, 1000, rep=TRUE)
Large(x, 3)
#> [1] 10 10 10
Large(x, k=3, unique=TRUE)
#> $value
#> [1] 8 9 10
#>
#> $frequency
#> [1] 85 99 106
#>
# works fine up to x ~ 1e6
x <- runif(1000000)
Small(x, 3, unique=TRUE)
#> $value
#> [1] 0.0000006391201 0.0000011108350 0.0000048894435
#>
#> $frequency
#> [1] 1 1 1
#>
Small(x, 3, unique=FALSE)
#> [1] 0.0000006391201 0.0000011108350 0.0000048894435
# Both ends
cat(HighLow(d.pizza$temperature, na.last=NA))
#> lowest : 19.3, 19.4, 20, 20.2 (2), 20.35
#> highest: 63.8, 64.1, 64.6, 64.7, 64.8