Confusion Matrix and Cyber Attacks
Cyber crime is defined as a an illegal activity which involves the use of computer or another digital device and network. It is mostly attack on information which is personal and of high importance for individual, organization or government and its exposure can cause serious threats, infrastructure damages, financial loss, and even loss of life.
According to Gartner, the worldwide information security market is forecast to reach $170.4 billion in 2022. Around 88% of organizations worldwide experienced spear phishing attempts in 2019. Data breaches exposed 36 billion records in the first half of 2020. 86% of breaches were financially motivated and 10% were motivated by espionage. 45% of breaches featured hacking, 17% involved malware and 22% involved phishing.
15% of breaches involved healthcare organizations, 10% in the financial industry and 16% in the public Sector. The healthcare industry lost an estimated $25 billion to ransomware attacks in 2019. The average cost of a financial services data breach is $5.85 million USD.
~ Source : Varonis
Thus, detecting various cyber-attacks in a network is very necessary. The application of Machine Learning model in building an effective Intrusion Detection System (IDS) comes into play. A binary classification model can be used to identify what is happening in the network i.e., if there is any attack or not.
Understanding the raw security data is the first step to build an intelligent security model for making predictions about future incidents. The two categories being — normal and anomaly. Take into account the selected security features and performing all preprocessing steps, train the model that can be used to detect whether the test case is normal or an anomaly. For evaluation of model, one of the metric used is Confusion matrix.
Confusion matrix
A confusion matrix is a table that is used to determine the performance of a classification model. We compare the predicted values for test data with the true values known to us. By this, we know how many cases are classified correctly and how many are classified incorrectly. The table below shows the structure of confusion matrix.
Let’s understand the terms used here:
- In two-class problem, such as attack state, we assign the event normal as “positive” and anomaly as “negative“.
- “True Positive” for correctly predicted event values.
- “False Positive” for incorrectly predicted event values.
- “True Negative” for correctly predicted no-event values.
- “False Negative” for incorrectly predicted no-event values.
Confusion matrices have two types of errors: Type I and Type II
Now lets see these terms and their significance under the light of cyber attack prediction for better understanding.
IDS or Intrusion Detection System checks for any malicious activity on the system. It monitors the packets coming over internet using some ML model and predicts whether it is normal or an anomaly.
Lets say our model created the following confusion matrix for total of 165 packets it examined.
A total of 165 packets were analyzed by our model in IDS system which have been classified in the above confusion matrix.
- “Positive” -> Model predicted no attack.
- “Negative” -> Model predicted attack.
- True Negative: Out of 55 times for which model predicted attack will take place, 50 predictions were ‘True’ which means 50 times attack actually took place. Due to prediction, Security Operations Centre (SOC) will receive notification and can prevent the attack.
- False Negative: Out of 55 times for which model predicted attack will take place, 5 times the attack didn’t happen. This can be considered as “False Alarm” and also Type II error.
- True Positive: The model predicted 110 times that attack wouldn’t take place, out of which 100 times actually no attack happened. These are the correct predictions.
- False Positive: 10 times the attack actually took place when the model had predicted that no attack will happen. It is also called as Type I error.
Type I error:
This type of error can prove to be very dangerous. Our system predicted no attack but in real attack takes place, in that case no notification would have reached the security team and nothing can be done to prevent it. The False Positive cases above fall in this category and thus one of the aim of model is to minimize this value.
Type II error:
This type of error are not very dangerous as our system is protected in reality but model predicted an attack. the team would get notified and check for any malicious activity. This doesn’t cause any harm. They can be termed as False Alarm.
We can use confusion matrix to calculate various metrics:
- Accuracy: The values of confusion matrix are used to calculate the accuracy of the model. It is the ratio of all correct predictions to overall predictions (total values)
Accuracy = (TP + TN)/(TP + TN + FP + FN)
2. Precision: (True positives / Predicted positives) = TP / TP + FP
3. Recall: (True positives / all actual positives) = TP / TP + FN
4. Specificity: (True negatives / all actual negatives) =TN / TN + FP
5. Misclassification: (all incorrect / all) = FP + FN / TP + TN + FP + FN
It can also be calculated as -> 1-Accuracy
!! Thank you for reading !!
Reference: Article on IDS