A Dictionary-Based Approach to Identifying Malicious Machine-Generated Text

Tianyu Wang, Pace University

Abstract

The primary focus of my dissertation is to study how to apply Artificial Intelligence (AI)in cybersecurity to identify malicious machine-generated text. I define the scope ofmalicious machine-generated text in two forms: 1) A malicious domain name, and 2)misinformation content in social media. I analyze machine-generated textual content andinvestigate the relationships between the linguistic and information characteristics ofdynamic generated content. Thus, my research problem is to distinguish generated textualcontent between machine (bots) and humans. More specifically, the research problem canbe divided into several sub-problems: First, after testing 39 DGA-family domain namesand comparing results with other similar research, my method which utilizes N-gram baseddictionary features from Alexa and English dictionary outperformed the detection inmalicious domain names generated using domain-generated algorithms. Second, Iproposed a method which combines a sequence of word frequencies and informationentropy of the content to generate features for machine learning algorithms in social mediaaccount detection. By detecting such accounts early, this method can stop the spread offalse information in a timely manner. Third, I presented an algorithm by incorporating anew similarity-based feature which only extracted from the text content, content-basedfeatures, and user-based features in LSTM model. Furthermore, to extend the rational ofensemble learning, I combined two LSTM models by using a conditional meta-classifierin spam detection. The exhaustive experiment showed that this model outperforms all otherbaselines on an imbalanced dataset and achieves comparable results as a modern model onbalanced dataset.

Subject Area

Computer science|Artificial intelligence|Information science

Recommended Citation

Wang, Tianyu, "A Dictionary-Based Approach to Identifying Malicious Machine-Generated Text" (2021). ETD Collection for Pace University. AAI28651802.
https://digitalcommons.pace.edu/dissertations/AAI28651802

Share

COinS

Remote User: Click Here to Login (must have Pace University remote login ID and password. Once logged in, click on the View More link above)