Here's an implementation of a Naive Bayes spam classifier in Python:
main.py2685 chars73 lines
In this implementation, we first define a helper function make_Dictionary
that creates a dictionary of the most commonly occurring words in the training data. The function then processes each email in the training set and extracts its features using the extract_features
function, which returns a matrix of feature vectors and a vector of labels.
We then split the feature vectors and labels into training and testing sets using train_test_split
from the sklearn
library. We create an instance of the Multinomial Naive Bayes classifier using MultinomialNB()
and train it on the training data using the fit
method.
Finally, we use the trained classifier to make predictions on the test data and compute the accuracy score of our model using accuracy_score
from sklearn.metrics
.
gistlibby LogSnag