Ronald Kroening


B.S. in Computer Science

Advisor: Dr. Francis Parisi

Document Type



This paper will focus on utilizing five different methods of machine learning models to properly classify celestial bodies orbiting a star as an exoplanet or a false positive. We will be utilizing a recurrent neural network (RNN), a logistical regression model (LR), and a Random Forest Classifier (RF). The focus of the data set was to improve access to balanced data in the form of extracted features and time series graphs, as well as looking into potential solutions for previous shortcomings outlined in prior work, specifically relating to logistical regression models. Training data was assembled from Astronet, a pipeline that included data from the Kepler satellite and the transiting exoplanet survey satellite (TESS). 6,000 different stars were analyzed, with 3,000 being systems without any exoplanets confirmed, and 3,000 being confirmed instances of an exoplanet. The random forest model was optimized with GridSearch for performance, as well as one of the logistical regression models. Two logistic regression models were also used, with logistical regression as the base and a C4.5 Decision Tree and Support Vector Machine used as different models stacked on top of the predictions, respectfully. In addition, a recurrent neural network was also trained on a smaller subset of the data to account for the necessary computational power, with the class split remaining the same. The metrics used for analysis were accuracy, precision, recall, F1 score, and area under the receiver operator characteristic curve (AUCROC). The results showed all machine learning models trained having at least 90% accuracy, with improvements shown in logistical regression from the stacking methods used. The results also showed the benefits of using LSTM to optimize recurrent neural networks to solve the disappearing gradient problem.