# R – Getting the top values by group

data tabledplyrr

Here's a sample data frame:

``````d <- data.frame(
x   = runif(90),
grp = gl(3, 30)
)
``````

I want the subset of `d` containing the rows with the top 5 values of `x` for each value of `grp`.

Using base-R, my approach would be something like:

``````ordered <- d[order(d\$x, decreasing = TRUE), ]
splits <- split(ordered, ordered\$grp)
##              x grp
## 1.19 0.8879631   1
## 1.4  0.8844818   1
## 1.12 0.8596197   1
## 1.26 0.8481809   1
## 1.18 0.8461516   1
## 1.29 0.8317092   1
## 2.31 0.9751049   2
## 2.34 0.9269764   2
## 2.57 0.8964114   2
## 2.58 0.8896466   2
## 2.45 0.8888834   2
## 2.35 0.8706823   2
## 3.74 0.9884852   3
## 3.73 0.9837653   3
## 3.83 0.9375398   3
## 3.64 0.9229036   3
## 3.69 0.8021373   3
## 3.86 0.7418946   3
``````

Using `dplyr`, I expected this to work:

``````d %>%
arrange_(~ desc(x)) %>%
group_by_(~ grp) %>%
``````

but it only returns the overall top 5 rows.

Swapping `head` for `top_n` returns the whole of `d`.

``````d %>%
arrange_(~ desc(x)) %>%
group_by_(~ grp) %>%
top_n(n = 5)
``````

How do I get the correct subset?

From dplyr 1.0.0, "`slice_min()` and `slice_max()` select the rows with the minimum or maximum values of a variable, taking over from the confusing `top_n().`"

``````d %>% group_by(grp) %>% slice_max(order_by = x, n = 5)
# # A tibble: 15 x 2
# # Groups:   grp [3]
#     x grp
# <dbl> <fct>
#  1 0.994 1
#  2 0.957 1
#  3 0.955 1
#  4 0.940 1
#  5 0.900 1
#  6 0.963 2
#  7 0.902 2
#  8 0.895 2
#  9 0.858 2
# 10 0.799 2
# 11 0.985 3
# 12 0.893 3
# 13 0.886 3
# 14 0.815 3
# 15 0.812 3
``````

Pre-`dplyr 1.0.0` using `top_n`:

From `?top_n`, about the `wt` argument:

The variable to use for ordering [...] defaults to the last variable in the tbl".

The last variable in your data set is "grp", which is not the variable you wish to rank, and which is why your `top_n` attempt "returns the whole of d". Thus, if you wish to rank by "x" in your data set, you need to specify `wt = x`.

``````d %>%
group_by(grp) %>%
top_n(n = 5, wt = x)
``````

### Data:

``````set.seed(123)
d <- data.frame(
x = runif(90),
grp = gl(3, 30))
``````