Pump and Dump Detection using Machine Learning By Nour Khellia Outline 01 Descriptive Analysis 02 Data cleaning 03 Data visualisation 04 Exploratory analysis 05 Models Descriptive Analysis Dataset Stock Market Transactions Dataset: 14 columns 533641 rows Data Cleaning Data Cleaning 01 Drop all the rows with NAN values 02 Remove the unused columns LastShares LastPx LastMkt timeStamp MsgseqNum 03 Converting 2 columns into Numerical values Side OrdStatus 04 Normalize the values to be able to compare them Data Cleaning 01 Drop all the rows with NAN values 02 Remove the unused columns LastShares LastPx LastMkt timeStamp MsgseqNum 03 Converting 2 columns into Numerical values Side OrdStatus 04 Normalize the values to be able to compare them Data Cleaning Data Cleaning 01 Drop all the rows with NAN values 02 Remove the unused columns LastShares LastPx LastMkt timeStamp MsgseqNum 03 Converting 2 columns into Numerical values Side OrdStatus 04 Normalize the values to be able to compare them Data Cleaning Data Cleaning 01 Drop all the rows with NAN values 02 Remove the unused columns LastShares LastPx LastMkt timeStamp MsgseqNum 03 Converting 2 columns into Numerical values Side OrdStatus 04 Normalize the values to be able to compare them Data Cleaning Data Cleaning => Data Cleaning 01 Drop all the rows with NAN values 02 Remove the unused columns LastShares LastPx LastMkt timeStamp MsgseqNum 03 Converting 2 columns into Numerical values Side OrdStatus 04 Normalize the values to be able to compare them Data Cleaning Data Visualization Data Visualization Data Visualization Data Visualization Data Visualization Data Visualization Data Visualization Exploratory Analysis Exploratory Analysis HEAT MAP: Exploratory Analysis Correlation Matrix: Feature engineering Daily returns Price Standard deviation Volatilities Price Plot Daily volatility Senders are grouped by Hours Resampling data in 5 minutes each Volatility and Fraud columns are added Volatility: Fraud : Dataset transformation Dataset: Dataset transformation Drop the rows with the NAN values: Models ANN MODEL Model X and Y: Model Spliting the dataset into train and test parts: Feature scaling: Model Building the model: Model Building the model: Accuracy: 0.996% Model Making the prediction and evaluating the model: Clustering Model 2 clusters Model Model Accuracy: 0,99721 Supervised Learning KNN Model Import the Libraries: Model Visualize the Class (Fraud)column: Model Spliting the Positive and the Negative classes Creating training and testing sets: Model Model Accuracy: 0.954% SVM Model Model Training our Dataset: Model Building our model: Conclusion Accuracy: Unsupervised Learning Supervised Learning: ANN: 0,996% Clustering: 0,997% KNN: 0,954% SVM: 0,789% THANK YOU !