Decision trees are a powerful algorithm that can be used for both regression and classification tasks. They are particularly popular in the data mining and machine learning communities for their simplicity and interpretability.
To build a decision tree in R, we can use the rpart
package. Here’s an example using the built-in mtcars
dataset:
main.r148 chars10 lines
The first line loads the rpart
package into our R session. We then build the decision tree using the rpart
function. The formula mpg ~ .
tells R to predict the mpg
variable using all other variables in the mtcars
dataset.
The plot
function creates a graphical representation of the decision tree, while the text
function adds labels to each split. In some cases, the tree may be too big to fit on one page, so we can adjust the plot settings to make it more readable.
Once we have our decision tree, we can use the predict
function to make predictions on new data. For example:
main.r165 chars4 lines
This code creates a new dataset with values for each variable, then uses the predict
function to estimate the mpg
value based on the decision tree we built earlier.
Finally, we can use the ggplot2
package to customize the plot of the decision tree, as shown in the code below:
main.r375 chars11 lines
This code uses the fortify
function to convert the decision tree into a format that can be plotted using ggplot2. We then use geom_ribbon
to create shaded areas for each split, and geom_text
to add labels to each node.
The coord_flip
function rotates the plot so that the variable names are on the y-axis, and we use scale_y_continuous
to adjust the y-axis labels.
Finally, we use labs
to add a title to the plot.
gistlibby LogSnag