Publication:
Performance Analysis of Naϊve Bayes Classification, Support Vector Machines and Neural Networks for Spam Categorization

Loading...
Thumbnail Image

Advisor

Journal Title

Journal ISSN

Volume Title

Publisher

Springer Science and Business Media LLC

Research Projects

Organizational Units

Journal Issue

Abstract

Spam mail recognition is a new growing field which brings together the topic of natural language processing and machine learning as it is in essence a two class classification of natural language texts. An important feature of spam recognition is that it is a cost-sensitive classification: misclassification of a non-spam mail as spam is generally a more severe error than misclassifying a spam mail as nonspam. In order to be compared, the methods applied to this field should be all evaluated with the same corpus and within the same cost-sensitive framework. In this paper, the performances of Support Vector Machines (SVM), Neural Networks (NN) and Naive Bayes (NB) techniques are compared using a publicly available corpus (LINGSPAM) for different cost scenarios. The training time complexities of the methods are also evaluated. The results show that NN has significantly better performance than the two other, having acceptable training times. NB gives better results than SVM when the cost is extremely high while in all other cases SVM outperforms NB.

Description

Subject

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By

Related Goal

1

Views

0

Downloads
View PlumX Details