Protein secondary structure prediction using data-partitioning combined with stacked convolutional neural networks and bidirectional gated recurrent units

Authors: Sofi, M.A. and Wani, M.A.

Journal: International Journal of Information Technology (Singapore)

Volume: 14

Issue: 5

Pages: 2285-2295

eISSN: 2511-2112

ISSN: 2511-2104

DOI: 10.1007/s41870-022-00978-x

Abstract:

Protein secondary structure prediction (PSSP) is one of the challenging tasks in computational biology. Deep neural networks have significantly improved the accuracy of predicted secondary structures in the recent years. However, most of the current good performers transform protein sequences to fixed length by truncation or padding. It, therefore, results in loss of structure information and significance of protein length. In this paper, we present a data partitioning approach combined with stacked convolutional neural networks and bidirectional gated recurrent units (S-CNN-BGRU) for PSSP. We divide protein dataset into various partitions based on their sequence length and build multiple deep learning models in alignment with these data partitions. The proposed approach leverages protein sequences of any length efficiently without truncation of amino acids. Experiments are conducted on TR13104 and CB513 datasets, and the best mean accuracy of 74.98 and 73.56% is achieved for 8-class secondary structures. Empirical results indicate the efficacy of our proposed method over the state-of-art methods.

Source: Scopus