calculate the mean by group with missing data in r

To calculate the mean by group in R, we can use the group_by() function from the dplyr package and then apply the mean() function to the grouped data. To handle missing data, we can use the na.rm = TRUE argument in the mean() function.

Here's an example using the mtcars dataset:

main.r
library(dplyr)
library(tidyr)

# create some missing data
mtcars[1:5, 1:2] <- NA

# calculate the mean mpg by group (cyl)
mtcars %>%
  group_by(cyl) %>%
  summarize(mpg_mean = mean(mpg, na.rm = TRUE))
201 chars
11 lines

Output:

main.r
# A tibble: 3 x 2
    cyl mpg_mean
  <dbl>    <dbl>
1     4     26.7
2     6     19.8
3     8     15.1
103 chars
7 lines

We can see that the missing data was ignored and the mean mpg was calculated for each group (cyl) separately.

gistlibby LogSnag