Confusion matrix and Cyber crime case

RajSaundatikar
4 min readJun 6, 2021

What is confusion matrix ?

It is a performance measurement for machine learning classification problem where output can be two or more classes. Confusion matrix is a method used to calculate accuracy in the data mining concept. Confusion matrix contains information related to the actual classifications and predictions conducted by the classification system. The system performance is generally evaluated by using the data in the matrix. It is a table with 4 different combinations of predicted and actual values.

It is extremely useful for measuring Recall, Precision, Specificity, Accuracy and most importantly AUC-ROC Curve.

True Positive (TP) : When you predict positive and it is true.

True Negative (TN) : When you predict negative and it is true.

False Positive (Type 1 Error) : When you predict positive and it is false.

False Negative (Type 2 Error) : When you predict negative and it is false.

We describe predicted values as Positive and Negative and actual values as True and False.

Confusion matrix produces accuracy, precision, and recall values. Accuracy is defined as the proportion of the total number of correct predictions, which is determined by Equation.

Precision is the accuracy measurement of certain class which has been predicted, while the precision is determined by the Equation.

Recall is a measurement of the model ability to predict several issues from certain class which is obtained from the data collection, while recall is determined by the Equation.

The values of Precision and Recall can be given in the form of numbers by using percentage calculations (1–100%) or by using numbers between 0–1. The recommendation system will be considered good if the Precision and Recall values are high.

Cyber Crime Case Study

The growing rate of cybercrime threats is increasing day by day. Currently, there is no foolproof systematic and reliable tool on reviews of cybercrimes due to a lack of record maintenance at concerned offices because of various reasons such as victims’ assumptions on police response, lack of awareness of the users about IT (information technology) act on cybercrimes and the inability of the victims to be recognized that they have been victimized. Promoting and educating the people on cybercrimes, identification and maintenance of area wise cybercrime rates could also assist in reducing and classifying the cybercrimes.

Results and Analysis

The proposed system is designed and developed by considering the data from sources such as Kaggle and CERT-In. It consists of more than 2000 records with the eight attributes such as incident, offender, victim, harm, year, location, age of the offender and cybercrime. Incidents that occurred in India during 2012–2017 were considered. More than 2000 records are used to construct and test the proposed computational system. Table 3 displays the data after removing missing values in the incident column of the dataset and here, a column is added in which the cybercrime is encoded as an integer.

Confusion matrix for the cyber crime case

Above figure depicts the confusion matrix for the model when the training size was 0.8 and the test size was 0.2. By this, we know how many cases are classified correctly and how many are classified incorrectly. It means we can find out the true negatives and true positives and false negatives and false positives classified by using the model

Conclusion

In the present world, cyber crime offenses are happening at an alarming rate. As the use of the internet is increasing many offenders, make use of this as a means of communication in order to commit a crime. This case is useful to predict the cases and to take precautionary steps against filing cyber crime cases on certain hot-spot places identified.

--

--