Evaluates a Function Groupwise

Split the vector x into partitions and apply the function to each partition separately. Computation restarts for each partition.
The logic is the same as the OLAP functions in SQL, e.g. SUM(x) OVER (PARTITION BY group).

DoBy(x, ...)

# S3 method for class 'formula'
DoBy(formula, data = parent.frame(), subset, na.action,
     vnames = NULL, ...)
# Default S3 method
DoBy(x, by, FUN, vnames = NULL, collapse = FALSE, ...)

Arguments

x: a vector that should be operated.
by: list of one or more factors, each of same length as x. If by is not a factor, the elements are coerced to factors by as.factor().
FUN: Function to apply for each factor level combination.
formula: a formula of the form lhs ~ rhs where lhs gives the data values and rhs the corresponding groups.
data: an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from the parent.frame().
subset: an optional vector specifying a subset of observations to be used.
na.action: a function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").
vnames: name for the new variables.
collapse: logical, determining if the results should be collapsed to groups. Default is FALSE.
...: optional arguments to FUN: See the "Note" section.

Note

Optional arguments to FUN supplied by the ... argument are not divided into cells. It is therefore inappropriate for FUN to expect additional arguments with the same length as x.

Details

This is more or less the same as the function ave, with the arguments organized a bit different and offering more flexibility.

Value

a data.frame with the same number of rows as length as x containing the groupwise results of FUN and the used group factors.
The attribute response denotes the name of the response variable in case the formula interface was used.

Author

Andri Signorell <andri@signorell.net>

Examples

d.frm <- data.frame(x=rep(1:4,3), v=sample(x=1:3, size=12, replace=TRUE),
                    g=gl(4,3,labels=letters[1:4]), m=gl(3,4,labels=LETTERS[1:3]))

# SQL-OLAP: sum() over (partition by g)
DoBy(d.frm$x, d.frm$g, FUN=sum)
#>    d.frm$x by sum.d.frm$x
#> 1        1  a           6
#> 2        2  a           6
#> 3        3  a           6
#> 4        4  b           7
#> 5        1  b           7
#> 6        2  b           7
#> 7        3  c           8
#> 8        4  c           8
#> 9        1  c           8
#> 10       2  d           9
#> 11       3  d           9
#> 12       4  d           9
# DoBy(d.frm$x, FUN=sum)

# more than 1 grouping variables are organized as list as in tapply:
DoBy(d.frm$x, list(d.frm$g, d.frm$m), mean)
#>    d.frm$x
#> 1        1
#> 2        2
#> 3        3
#> 4        4
#> 5        1
#> 6        2
#> 7        3
#> 8        4
#> 9        1
#> 10       2
#> 11       3
#> 12       4
#>    structure.c.1L..1L..1L..2L..2L..2L..3L..3L..3L..4L..4L..4L...levels...c..a...
#> 1                                                                              a
#> 2                                                                              a
#> 3                                                                              a
#> 4                                                                              b
#> 5                                                                              b
#> 6                                                                              b
#> 7                                                                              c
#> 8                                                                              c
#> 9                                                                              c
#> 10                                                                             d
#> 11                                                                             d
#> 12                                                                             d
#>    structure.c.1L..1L..1L..1L..2L..2L..2L..2L..3L..3L..3L..3L...levels...c..A...
#> 1                                                                              A
#> 2                                                                              A
#> 3                                                                              A
#> 4                                                                              A
#> 5                                                                              B
#> 6                                                                              B
#> 7                                                                              B
#> 8                                                                              B
#> 9                                                                              C
#> 10                                                                             C
#> 11                                                                             C
#> 12                                                                             C
#>    mean.d.frm$x
#> 1           2.0
#> 2           2.0
#> 3           2.0
#> 4           4.0
#> 5           1.5
#> 6           1.5
#> 7           3.5
#> 8           3.5
#> 9           1.0
#> 10          3.0
#> 11          3.0
#> 12          3.0

# count
d.frm$count <- DoBy(d.frm$x, d.frm$g, length)

# rank
d.frm$rank <- DoBy(d.frm$v, d.frm$g, rank)
d.frm$dense_rank <- DoBy(d.frm$v, d.frm$g, Rank, ties.method="dense")
d.frm$rank_desc <- DoBy(d.frm$x, d.frm$g, function(x) rank(-x))

# row_number
d.frm$row_number <- DoBy(d.frm$v, d.frm$g, function(x) order(x))
d.frm
#>    x v g m count.d.frm$x count.by count.length.d.frm$x rank.d.frm$v rank.by
#> 1  1 3 a A             1        a                    3            3       a
#> 2  2 2 a A             2        a                    3            2       a
#> 3  3 2 a A             3        a                    3            2       a
#> 4  4 1 b A             4        b                    3            1       b
#> 5  1 1 b B             1        b                    3            1       b
#> 6  2 1 b B             2        b                    3            1       b
#> 7  3 2 c B             3        c                    3            2       c
#> 8  4 3 c B             4        c                    3            3       c
#> 9  1 3 c C             1        c                    3            3       c
#> 10 2 3 d C             2        d                    3            3       d
#> 11 3 2 d C             3        d                    3            2       d
#> 12 4 1 d C             4        d                    3            1       d
#>    rank.rank.d.frm$v dense_rank.d.frm$v dense_rank.by dense_rank.Rank.d.frm$v
#> 1                3.0                  3             a                       2
#> 2                1.5                  2             a                       1
#> 3                1.5                  2             a                       1
#> 4                2.0                  1             b                       1
#> 5                2.0                  1             b                       1
#> 6                2.0                  1             b                       1
#> 7                1.0                  2             c                       1
#> 8                2.5                  3             c                       2
#> 9                2.5                  3             c                       2
#> 10               3.0                  3             d                       3
#> 11               2.0                  2             d                       2
#> 12               1.0                  1             d                       1
#>    rank_desc.d.frm$x rank_desc.by rank_desc.function(x) rank(-x).d.frm$x
#> 1                  1            a                                      3
#> 2                  2            a                                      2
#> 3                  3            a                                      1
#> 4                  4            b                                      1
#> 5                  1            b                                      3
#> 6                  2            b                                      2
#> 7                  3            c                                      2
#> 8                  4            c                                      1
#> 9                  1            c                                      3
#> 10                 2            d                                      3
#> 11                 3            d                                      2
#> 12                 4            d                                      1
#>    row_number.d.frm$v row_number.by row_number.function(x) order(x).d.frm$v
#> 1                   3             a                                       2
#> 2                   2             a                                       3
#> 3                   2             a                                       1
#> 4                   1             b                                       1
#> 5                   1             b                                       2
#> 6                   1             b                                       3
#> 7                   2             c                                       1
#> 8                   3             c                                       2
#> 9                   3             c                                       3
#> 10                  3             d                                       3
#> 11                  2             d                                       2
#> 12                  1             d                                       1