Telechargé par choura rania

Pump and Dump Detection (1)

publicité
Pump and Dump Detection
using Machine Learning
By Nour Khellia
Outline
01
Descriptive Analysis
02
Data cleaning
03
Data visualisation
04
Exploratory analysis
05
Models
Descriptive
Analysis
Dataset
Stock Market Transactions Dataset:
14 columns
533641 rows
Data Cleaning
Data Cleaning
01
Drop all the rows
with NAN values
02
Remove the
unused columns
LastShares
LastPx
LastMkt
timeStamp
MsgseqNum
03
Converting 2 columns
into Numerical values
Side
OrdStatus
04
Normalize
the values
to be able to
compare them
Data Cleaning
01
Drop all the rows
with NAN values
02
Remove the
unused columns
LastShares
LastPx
LastMkt
timeStamp
MsgseqNum
03
Converting 2 columns
into Numerical values
Side
OrdStatus
04
Normalize
the values
to be able to
compare them
Data Cleaning
Data Cleaning
01
Drop all the rows
with NAN values
02
Remove the
unused columns
LastShares
LastPx
LastMkt
timeStamp
MsgseqNum
03
Converting 2 columns
into Numerical values
Side
OrdStatus
04
Normalize
the values
to be able to
compare them
Data Cleaning
Data Cleaning
01
Drop all the rows
with NAN values
02
Remove the
unused columns
LastShares
LastPx
LastMkt
timeStamp
MsgseqNum
03
Converting 2 columns
into Numerical values
Side
OrdStatus
04
Normalize
the values
to be able to
compare them
Data Cleaning
Data Cleaning
=>
Data Cleaning
01
Drop all the rows
with NAN values
02
Remove the
unused columns
LastShares
LastPx
LastMkt
timeStamp
MsgseqNum
03
Converting 2 columns
into Numerical values
Side
OrdStatus
04
Normalize
the values
to be able to
compare them
Data Cleaning
Data Visualization
Data Visualization
Data Visualization
Data Visualization
Data Visualization
Data Visualization
Data Visualization
Exploratory
Analysis
Exploratory Analysis
HEAT MAP:
Exploratory Analysis
Correlation
Matrix:
Feature
engineering
Daily returns
Price Standard deviation
Volatilities
Price Plot
Daily volatility
Senders are grouped by Hours
Resampling data in 5 minutes each
Volatility and Fraud columns are added
Volatility:
Fraud :
Dataset transformation
Dataset:
Dataset transformation
Drop the rows with the NAN values:
Models
ANN MODEL
Model
X and Y:
Model
Spliting the dataset into train and test parts:
Feature scaling:
Model
Building the model:
Model
Building the model:
Accuracy: 0.996%
Model
Making the prediction
and
evaluating the model:
Clustering
Model
2 clusters
Model
Model
Accuracy: 0,99721
Supervised Learning
KNN
Model
Import the Libraries:
Model
Visualize the Class
(Fraud)column:
Model
Spliting the Positive and
the Negative classes
Creating training and testing sets:
Model
Model
Accuracy: 0.954%
SVM
Model
Model
Training our Dataset:
Model
Building our model:
Conclusion
Accuracy:
Unsupervised Learning
Supervised Learning:
ANN: 0,996%
Clustering: 0,997%
KNN: 0,954%
SVM: 0,789%
THANK YOU !
Téléchargement