extract significant variables in r

There are many ways to extract significant variables in R, depending on the specific problem and method you are using. Here are some examples:

  1. Linear regression:
main.r
fit <- lm(y ~ ., data = mydata)
summary(fit)$coef[summary(fit)$coef[, "Pr(>|t|)"] < 0.05, ]
92 chars
3 lines

This will fit a linear regression model with response variable y and all other variables in mydata, and extract the coefficients of all significant predictors (with p-value < 0.05) from the summary output.

  1. Random forest:
main.r
library(randomForest)
fit <- randomForest(y ~ ., data = mydata)
importance(fit, type = 1)
90 chars
4 lines

This will fit a random forest model with response variable y and all other variables in mydata, and extract the Gini importance scores of all predictors. The type argument specifies the type of importance measure to use.

  1. Lasso regression:
main.r
library(glmnet)
fit <- cv.glmnet(x, y, alpha = 1)
coef(fit, s = "lambda.min")
78 chars
4 lines

This will fit a Lasso regression model with response variable y and predictors in matrix x, using cross-validation to select the optimal regularization parameter. The coef function will extract the coefficients of all non-zero predictors at the minimum lambda value.

  1. Principal component analysis (PCA):
main.r
fit <- prcomp(mydata, scale = TRUE)
summary(fit)$importance[,"Proportion of Variance"]
87 chars
3 lines

This will perform PCA on mydata, scaling the variables to have mean 0 and standard deviation 1, and extract the proportion of variance explained by each principal component. This can help identify the most important variables in the data based on their contribution to the first few PC's.

These are just a few examples of the many methods available for variable selection in R. The best approach will depend on the specific problem and data, and often involves a combination of different methods.

gistlibby LogSnag