Machine Learning and Text Analysis Using Clustering, Classification, Categorization for Applied Industry Research and Its Effect on Trends and Prediction Analysis of a Doctor of Professionals Studies in Computing Dissertation Categories

Ashley Haigler, Pace University

Abstract

The results of an industry research survey showed, understanding Dissertation Research categories has not been the focused on many researchers and institutions. This research expands on machine learning methodologies using two similar datasets to answer these three questions: 1. Is there a way to track the trends of Pace University’s Doctor of Professional Studies (DPS) student’s dissertation categories of the past 20 years using classification and categorizations? 2. Is it possible to determine dissertation trends in a professional doctorate program based on computing technology research? 3. Can we predict trends in DPS dissertation categories by analyzing the computing industry research? These questions were answered using Machine Learning and Text Analysis which created a four-step research methodology also known as the framework. These steps included: Clustering, Classification, Categorization, and Predictive Analysis on 113 DPS dissertations abstracts and comparing that against 98,393 IEEE research abstracts to predict future DPS categorization, specifically looking at the words in the predicted category. The findings of these three questions are as follows 1. A trend in DPS data was found using 240 clustered DPS words put into the classification model by year over the last 20 years showing each year's categorization results. The trend shows the Algorithms category used most in the first five years, then Agile Practicing for the next five years. Finally, Cloud Technology becomes more dominant in later years. 2. There are dissertation trends in a professional doctorate program based on computing technology research. To get this trend, a second Classification model was used on the same DPS words in the new model. The trend depicted in the first five years; Algorithm category was the strongest. Then for the next five years, Cloud Computing became dominant. In the next five years, the trend changes to Software Development then, lastly Cloud Computing started trending. 3. The last part of the research questions predicted a DPS dissertation category trend by analyzing the computing industry research using IBM Watson’s predictive analysis tool. It predicted Cloud Computing as the category for DPS categories for the next five years, with the top 4 words used in this category being Data, Compute, Develop, and Cloud. Pace’s DPS program can now use this four-step process to help extend out their programs to help students' find research categories using categorization and the text categories' words.

Subject Area

Computer science|Information Technology|Computer Engineering|Education

Recommended Citation

Haigler, Ashley, "Machine Learning and Text Analysis Using Clustering, Classification, Categorization for Applied Industry Research and Its Effect on Trends and Prediction Analysis of a Doctor of Professionals Studies in Computing Dissertation Categories" (2021). ETD Collection for Pace University. AAI28323851.
https://digitalcommons.pace.edu/dissertations/AAI28323851

Share

COinS

Remote User: Click Here to Login (must have Pace University remote login ID and password. Once logged in, click on the View More link above)