PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach

Piyali Chatterjee , Subhadip Basu , Julian Zubek , Mahantapas Kundu , Mita Nasipuri , Dariusz Plewczyński


The prediction of domain/linker residues in protein sequences is a crucial task in the functional classification of proteins, homology-based protein structure prediction, and high-throughput structural genomics. In this work, a novel consensus-based machine-learning technique was applied for residue-level prediction of the domain/linker annotations in protein sequences using ordered/disordered regions along protein chains and a set of physicochemical properties. Six different classifiers—decision tree, Gaussian naïve Bayes, linear discriminant analysis, support vector machine, random forest, and multilayer perceptron—were exhaustively explored for the residue-level prediction of domain/linker regions. The protein sequences from the curated CATH database were used for training and cross-validation experiments. Test results obtained by applying the developed PDP-CON tool to the mutually exclusive, independent proteins of the CASP-8, CASP-9, and CASP-10 databases are reported. An n-star quality consensus approach was used to combine the results yielded by different classifiers. The average PDP-CON accuracy and F-measure values for the CASP targets were found to be 0.86 and 0.91, respectively. The dataset, source code, and all supplementary materials for this work are available at https://cmaterju.org/cmaterbioinfo/ for noncommercial use.

Author Piyali Chatterjee - [Netaji Subhash Engineering College]
Piyali Chatterjee,,
, Subhadip Basu - [Jadavpur University]
Subhadip Basu,,
, Julian Zubek - [Institute of Computer Science of the Polish Academy of Sciences]
Julian Zubek,,
, Mahantapas Kundu - [Jadavpur University]
Mahantapas Kundu,,
, Mita Nasipuri - [Jadavpur University]
Mita Nasipuri,,
, Dariusz Plewczyński (FMIS / DIPS)
Dariusz Plewczyński,,
- Department of Information Processing Systems
Journal seriesJournal of Molecular Modeling, ISSN 1610-2940, e-ISSN 0948-5023
Issue year2016
ASJC Classification1604 Inorganic Chemistry; 1703 Computational Theory and Mathematics; 1605 Organic Chemistry; 1606 Physical and Theoretical Chemistry; 1706 Computer Science Applications; 1503 Catalysis
Languageen angielski
Score (nominal)25
Score sourcejournalList
ScoreMinisterial score = 20.0, 04-06-2020, ArticleFromJournal
Ministerial score (2013-2016) = 25.0, 04-06-2020, ArticleFromJournal
Publication indicators Scopus Citations = 7; WoS Citations = 6; Scopus SNIP (Source Normalised Impact per Paper): 2016 = 0.531; WoS Impact Factor: 2016 = 1.425 (2) - 2016=1.54 (5)
Citation count*
Share Share

Get link to the record

* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.
Are you sure?