group by then summarize in r

To group a dataset by one or more columns and apply a summary function to each group in R, you can use the dplyr, data.table, or datatable libraries. Here's an example using the dplyr package:

main.r
library(dplyr)
data %>%
  group_by(column_name) %>%
  summarise(new_column = mean(column_to_summarise))
104 chars
5 lines

This code will group the data data frame by column_name and then calculate the mean of column_to_summarise for each group using the summarise() function. The result will be a new data frame with one row per group and a new column called new_column.

If using the data.table package, the code would look like this:

main.r
library(data.table)
data <- as.data.table(data)
data[, .(new_column = mean(column_to_summarise)), by=.(column_name)]
117 chars
4 lines

This code uses the . notation to refer to columns in the data table, and specifies the by argument to group the data by column_name.

If using the datatable package, the code would look like this:

main.r
library(datatable)
data <- as.data.table(data)
data[, mean(column_to_summarise), by=.(column_name)]
100 chars
4 lines

This code is very similar to the code for data.table, but does not require the . notation to refer to columns in the data table, and uses the [] operator instead of data.frame() to specify the data to operate on.

gistlibby LogSnag