An Investigation of Authorship Authentication in Short Messages from a Social Networking Site

Jenny S Li, Pace University

Abstract

An authorship authentication system was presented in this research to assess authorship authentication in short text that was extracted from sample posts of a social networking site. Authorship authentication is one of the trending security problems for social networking sites. Stylometry has been a well-known technology for authenticating an author to a disputed document in question. Authorship authentication in short text from social networking sites is a relatively newer domain to be explored. The goal of this research is to determine the degree to which social networking postings can be authenticated as coming from the purported user and not from an intruder. Facebook data was used for illustration. ^ The proposed research solution is an authorship authentication system that includes the use of 233 features (a combination of 227 stylometric features and 6 social network specific features), Support Vector Machine (SVM) Linear kernel function, and the Leave-One-Out method. Various sets of stylometry and ad hoc social networking specific features were developed to categorize short messages from thirty Facebook authors as authentic or non-authentic using SVM. ^ The challenges of applying traditional stylometry on short messages were discussed. The full set of 233 features achieved the best accuracy rate of 79.6% over any of its subsets. The social network-specific features showed marginal accuracy improvement when added to stylometric features. However, users who adopted these features were more distinguishable in writing styles. The test results showed the impact of sample size, features, and user writing style on the effectiveness of authorship authentication, indicating varying degrees of success compared to previous studies in authorship authentication in short text. The proposed stylometric features and method were also tested on 300 sample long book data. SVM showed better accuracy rate than k-Nearest Neighbor (k-NN) on Facebook data, while k-NN showed a better accuracy rate than SVM on book data. Finally, a comparison of a number of commonly used classification methods was tested on Facebook data to assess their performance for short text authorship authentication. Decision tree showed the best accuracy rate followed by SVM with a linear kernel function. ^

Subject Area

Web studies|Computer science

Recommended Citation

Jenny S Li, "An Investigation of Authorship Authentication in Short Messages from a Social Networking Site" (January 1, 2015). ETD Collection for Pace University. Paper AAI3711057.
http://digitalcommons.pace.edu/dissertations/AAI3711057

Share

COinS

Remote User: Click Here to Login (must have Pace University remote login ID and password. Once logged in, click on the View More link above)