how to create a shapiro test to a specific set of columns from a table in r

To perform the Shapiro-Wilk test on a specific set of columns from a dataset in R, we first need to extract those columns and store them in a separate dataframe. For example, given a dataframe df with columns "col1", "col2", "col3", we can create a new dataframe df_sub with only "col2" and "col3" using df_sub <- df[, c("col2", "col3")].

Once we have our subsetted data, we can use the shapiro.test() function to perform the Shapiro-Wilk test on each column. We can do this in a loop, or using apply functions. Here is an example using a loop:

main.r
# create a dataframe with some numeric data
df <- data.frame(col1 = rnorm(100),
                 col2 = rnorm(100, mean = 5),
                 col3 = rpois(100, lambda = 3))

# extract columns 2 and 3 into a new dataframe
df_sub <- df[, c("col2", "col3")]

# loop through each column and perform Shapiro-Wilk test
for (col in names(df_sub)) {
  test_result <- shapiro.test(df_sub[[col]])
  print(paste("Shapiro-Wilk test for", col))
  print(test_result)
}
456 chars
15 lines

This will output the results of each test, showing the test statistic, p-value, and a message indicating whether the data can be assumed to be normally distributed.

Alternatively, we can use lapply() to apply the shapiro.test() function to each column:

main.r
# create a dataframe with some numeric data
df <- data.frame(col1 = rnorm(100),
                 col2 = rnorm(100, mean = 5),
                 col3 = rpois(100, lambda = 3))

# extract columns 2 and 3 into a new dataframe
df_sub <- df[, c("col2", "col3")]

# apply Shapiro-Wilk test to each column using lapply
lapply(df_sub, shapiro.test)
340 chars
11 lines

This will output a list of test results, with one result for each column, showing the same information as the loop approach above.

gistlibby LogSnag