Towards Safe Cyber Practices: Developing a Proactive Cyber-Threat Intelligence System for Dark Web Forum Content by Identifying Cybercrimes

Authors: Sangher, K.S., Singh, A., Pandey, H.M. and Kumar, V.

Journal: Information (Switzerland)

Volume: 14

Issue: 6

eISSN: 2078-2489

DOI: 10.3390/info14060349

Abstract:

The untraceable part of the Deep Web, also known as the Dark Web, is one of the most used “secretive spaces” to execute all sorts of illegal and criminal activities by terrorists, cybercriminals, spies, and offenders. Identifying actions, products, and offenders on the Dark Web is challenging due to its size, intractability, and anonymity. Therefore, it is crucial to intelligently enforce tools and techniques capable of identifying the activities of the Dark Web to assist law enforcement agencies as a support system. Therefore, this study proposes four deep learning architectures (RNN, CNN, LSTM, and Transformer)-based classification models using the pre-trained word embedding representations to identify illicit activities related to cybercrimes on Dark Web forums. We used the Agora dataset derived from the DarkNet market archive, which lists 109 activities by category. The listings in the dataset are vaguely described, and several data points are untagged, which rules out the automatic labeling of category items as target classes. Hence, to overcome this constraint, we applied a meticulously designed human annotation scheme to annotate the data, taking into account all the attributes to infer the context. In this research, we conducted comprehensive evaluations to assess the performance of our proposed approach. Our proposed BERT-based classification model achieved an accuracy score of 96%. Given the unbalancedness of the experimental data, our results indicate the advantage of our tailored data preprocessing strategies and validate our annotation scheme. Thus, in real-world scenarios, our work can be used to analyze Dark Web forums and identify cybercrimes by law enforcement agencies and can pave the path to develop sophisticated systems as per the requirements.

https://eprints.bournemouth.ac.uk/38702/

Source: Scopus

Towards Safe Cyber Practices: Developing a Proactive Cyber-Threat Intelligence System for Dark Web Forum Content by Identifying Cybercrimes

Authors: Sangher, K.S., Singh, A., Pandey, H.M. and Kumar, V.

Journal: INFORMATION

Volume: 14

Issue: 6

eISSN: 2078-2489

DOI: 10.3390/info14060349

https://eprints.bournemouth.ac.uk/38702/

Source: Web of Science (Lite)

Towards Safe Cyber Practices: Developing Proactive Cyber Threat Intelligence System for Dark Web Forums Content By Employing Deep Learning Approaches

Authors: Pandey, H., Sangher, K.S., Singh, A. and Kumar, V.

Journal: Information Systems

Publisher: MDPI

ISSN: 0306-4379

Abstract:

The untraceable part of the Deep Web, also known as the Dark Web, is one of the most used "secretive spaces" to execute all sorts of illegal and criminal activities by terrorists, cybercriminals, spies, and offenders. Identifying actions, products, and offenders on the Dark Web is challenging due to its size, intractability, and anonymity. Therefore, it is crucial to intelligently enforce tools and techniques capable of identifying the activities of the Dark Web to assist law enforcement agencies as a support system. Therefore, this study proposes four deep learning architectures (RNN, CNN, LSTM, and Transformer) based classification models using the pre-trained word embedding representations to identify the illicit activities related to cybercrimes on Dark Web Forums. We used Agora Dataset derived from DarkNet Market Archive for our work, having 109 listed activities in Categories. The listing in the dataset is vaguely described, and several data points are untagged, which rules out the automatic labeling of category items as a target class. Hence, to overcome this constraint, we applied a meticulously designed human annotation scheme to annotate the data taking into account all the attributes to infer the context. In this research, we have conducted comprehensive evaluations to assess the performance of our proposed approaches. Our proposed BERT-based classification model has achieved an accuracy score of 96%. Given the unbalancedness of the experimental data, our results indicate the advantage of our tailored data preprocessing strategies applied and validate our annotation scheme. Thus, in real-world scenarios, our work can be used to analyze Dark Web Forums and identify cybercrimes by law enforcement agencies and can pave the path to develop sophisticated systems as per the requirements.

https://eprints.bournemouth.ac.uk/38702/

https://www.mdpi.com/journal/information/sections/information_systems

Source: Manual

Towards Safe Cyber Practices: Developing Proactive Cyber Threat Intelligence System for Dark Web Forums Content By Employing Deep Learning Approaches

Authors: Sangher, K.S., Singh, A., Pandey, H.M. and Kumar, V.

Journal: Information Systems

Volume: 14

Issue: 6

Publisher: MDPI

ISSN: 0306-4379

Abstract:

The untraceable part of the Deep Web, also known as the Dark Web, is one of the most used "secretive spaces" to execute all sorts of illegal and criminal activities by terrorists, cybercriminals, spies, and offenders. Identifying actions, products, and offenders on the Dark Web is challenging due to its size, intractability, and anonymity. Therefore, it is crucial to intelligently enforce tools and techniques capable of identifying the activities of the Dark Web to assist law enforcement agencies as a support system. Therefore, this study proposes four deep learning architectures (RNN, CNN, LSTM, and Transformer) based classification models using the pre-trained word embedding representations to identify the illicit activities related to cybercrimes on Dark Web Forums. We used Agora Dataset derived from DarkNet Market Archive for our work, having 109 listed activities in Categories. The listing in the dataset is vaguely described, and several data points are untagged, which rules out the automatic labeling of category items as a target class. Hence, to overcome this constraint, we applied a meticulously designed human annotation scheme to annotate the data taking into account all the attributes to infer the context. In this research, we have conducted comprehensive evaluations to assess the performance of our proposed approaches. Our proposed BERT-based classification model has achieved an accuracy score of 96%. Given the unbalancedness of the experimental data, our results indicate the advantage of our tailored data preprocessing strategies applied and validate our annotation scheme. Thus, in real-world scenarios, our work can be used to analyze Dark Web Forums and identify cybercrimes by law enforcement agencies and can pave the path to develop sophisticated systems as per the requirements.

https://eprints.bournemouth.ac.uk/38702/

Source: BURO EPrints