Winsorize.Rd
Winsorizing a vector means that a predefined quantum of the smallest and/or the largest values are replaced by less extreme values. Thereby the substitute values are the most extreme retained values.
Winsorize(x, minval = NULL, maxval = NULL, probs = c(0.05, 0.95),
na.rm = FALSE, type = 7)
a numeric vector to be winsorized.
the low border, all values being lower than this will be replaced by this value. The default is set to the 5%-quantile of x.
the high border, all values being larger than this will be replaced by this value. The default is set to the 95%-quantile of x.
numeric vector of probabilities with values in [0,1] as used in quantile
.
should NAs be omitted to calculate the quantiles?
Note that NAs in x are preserved and left unchanged anyway.
an integer between 1 and 9 selecting one of the nine quantile algorithms detailed in quantile
to be used.
A vector of the same length as the original data
x
containing the winsorized data.
The winsorized vector is obtained by
$$g(x) = \left\{\begin{array}{ll} -c &\textup{for }x \le -c\\ x &\textup{for } |x| < c\\ c &\textup{for }x \ge c \end{array}\right. $$
You may also want to consider standardizing (possibly robustly) the data before you perform a winsorization.
## generate data
set.seed(1234) # for reproducibility
x <- rnorm(10) # standard normal
x[1] <- x[1] * 10 # introduce outlier
## Winsorize data
x
#> [1] -12.071 0.277 1.084 -2.346 0.429 0.506 -0.575 -0.547 -0.564
#> [10] -0.890
Winsorize(x)
#> [1] -7.694 0.277 0.824 -2.346 0.429 0.506 -0.575 -0.547 -0.564 -0.890
# use Large and Small, if a fix number of values should be winsorized (here k=3):
Winsorize(x, minval=tail(Small(x, k=3), 1), maxval=head(Large(x, k=3), 1))
#> [1] -0.890 0.277 0.429 -0.890 0.429 0.429 -0.575 -0.547 -0.564 -0.890