discision tree in r

Decision trees are a powerful algorithm that can be used for both regression and classification tasks. They are particularly popular in the data mining and machine learning communities for their simplicity and interpretability.

To build a decision tree in R, we can use the rpart package. Here’s an example using the built-in mtcars dataset:

main.r
# load the rpart package
library(rpart)

# build the decision tree
fit <- rpart(mpg ~ ., data=mtcars)

# plot the decision tree
plot(fit)
text(fit)
148 chars
10 lines

The first line loads the rpart package into our R session. We then build the decision tree using the rpart function. The formula mpg ~ . tells R to predict the mpg variable using all other variables in the mtcars dataset.

The plot function creates a graphical representation of the decision tree, while the text function adds labels to each split. In some cases, the tree may be too big to fit on one page, so we can adjust the plot settings to make it more readable.

Once we have our decision tree, we can use the predict function to make predictions on new data. For example:

main.r
# make predictions on new data
newdata <- data.frame(mpg=0, cyl=8, disp=307, hp=130, drat=3.9, wt=3.84, qsec=15.4, vs=0, am=0, gear=3, carb=4)
predict(fit, newdata)
165 chars
4 lines

This code creates a new dataset with values for each variable, then uses the predict function to estimate the mpg value based on the decision tree we built earlier.

Finally, we can use the ggplot2 package to customize the plot of the decision tree, as shown in the code below:

main.r
# load the ggplot2 package
library(ggplot2)

# plot the decision tree using ggplot2
ggplot(data=fortify(fit), aes(x=variable, y=yval, ymin=yval - yval2, ymax=yval + yval2)) +
  geom_ribbon(alpha=0.2) +
  geom_text(aes(label=label), hjust=1.1, size=3.5) +
  coord_flip() +
  scale_y_continuous(breaks=unique(fit$cptable[, "yval"])) +
  labs(x="", y="", title="Decision Tree")
375 chars
11 lines

This code uses the fortify function to convert the decision tree into a format that can be plotted using ggplot2. We then use geom_ribbon to create shaded areas for each split, and geom_text to add labels to each node.

The coord_flip function rotates the plot so that the variable names are on the y-axis, and we use scale_y_continuous to adjust the y-axis labels.

Finally, we use labs to add a title to the plot.

gistlibby LogSnag