Create a factor variable using the quantiles of a continous variable.

CutQ(x, breaks = quantile(x, seq(0, 1, by = 0.25), na.rm = TRUE), 
     labels = NULL, na.rm = FALSE, ...)

Arguments

x

continous variable.

breaks

the breaks for creating groups. By default the quartiles will be used, say quantile seq(0, 1, by = 0.25) quantiles. See quantile for details. If breaks is given as a single integer it is interpreted as the intended number of groups, e.g. breaks=10 will return x cut in deciles.

labels

labels for the levels of the resulting category. By default, labels are defined as Q1, Q2 to the length of breaks - 1. The parameter ist passed to cut, so if labels are set to FALSE, simple integer codes are returned instead of a factor.

na.rm

Boolean indicating whether missing values should be removed when computing quantiles. Defaults to TRUE.

...

Optional arguments passed to cut.

Details

This function uses quantile to obtain the specified quantiles of x, then calls cut to create a factor variable using the intervals specified by these quantiles.

It properly handles cases where more than one quantile obtains the same value, as in the second example below. Note that in this case, there will be fewer generated factor levels than the specified number of quantile intervals.

Value

Factor variable with one level for each quantile interval given by q.

Author

Gregory R. Warnes <greg@warnes.net>, some slight modifications Andri Signorell <andri@signorell.net>

See also

Examples

# create example data
set.seed(1234)
x <- rnorm(1000)

# cut into quartiles
quartiles <- CutQ(x)
table(quartiles)
#> quartiles
#>  Q1  Q2  Q3  Q4 
#> 250 250 250 250 

# cut into deciles
deciles <- CutQ(x, breaks=10, labels=NULL)
table(deciles)
#> deciles
#>  Q1  Q2  Q3  Q4  Q5  Q6  Q7  Q8  Q9 Q10 
#> 100 100 100 100 100 100 100 100 100 100 

# show handling of 'tied' quantiles.
x <- round(x)  # discretize to create ties
stem(x)        # display the ties
#> 
#>   The decimal point is at the |
#> 
#>   -3 | 0000000000000
#>   -2 | 
#>   -2 | 000000000000000000000000000000000000000000000000
#>   -1 | 
#>   -1 | 00000000000000000000000000000000000000000000000000000000000000000000+181
#>   -0 | 
#>   -0 | 
#>    0 | 00000000000000000000000000000000000000000000000000000000000000000000+310
#>    0 | 
#>    1 | 00000000000000000000000000000000000000000000000000000000000000000000+140
#>    1 | 
#>    2 | 000000000000000000000000000000000000000000000000000000000000000
#>    2 | 
#>    3 | 00000
#> 
deciles <- CutQ(x, breaks=10)

table(deciles) # note that there are only 5 groups (not 10) 
#> deciles
#> [-3,-1)      -1       0       1   (1,3] 
#>      61     261     390     220      68 
               # due to duplicates