Identification of divergent protein domains by combining HMM-HMM comparisons and co-occurrence detection

Abstract : Identification of protein domains is a key step for understanding protein function. Hidden Markov Models (HMMs) have proved to be a powerful tool for this task. The Pfam database notably provides a large collection of HMMs which are widely used for the annotation of proteins in sequenced organisms. This is done via sequence/HMM comparisons. However, this approach may lack sensitivity when searching for domains in divergent species. Recently, methods for HMM/HMM comparisons have been proposed and proved to be more sensitive than sequence/HMM approaches in certain cases. However, these approaches are usually not used for protein domain discovery at a genome scale, and the benefit that could be expected from their utilization for this problem has not been investigated. Using proteins of P. falciparum and L. major as examples, we investigate the extent to which HMM/HMM comparisons can identify new domain occurrences not already identified by sequence/HMM approaches. We show that although HMM/HMM comparisons are much more sensitive than sequence/HMM comparisons, they are not sufficiently accurate to be used as a standalone complement of sequence/HMM approaches at the genome scale. Hence, we propose to use domain co-occurrence--the general domain tendency to preferentially appear along with some favorite domains in the proteins--to improve the accuracy of the approach. We show that the combination of HMM/HMM comparisons and co-occurrence domain detection boosts protein annotations. At an estimated False Discovery Rate of 5%, it revealed 901 and 1098 new domains in Plasmodium and Leishmania proteins, respectively. Manual inspection of part of these predictions shows that it contains several domain families that were missing in the two organisms. All new domain occurrences have been integrated in the EuPathDomains database, along with the GO annotations that can be deduced.
Type de document :
Article dans une revue
PLoS ONE, Public Library of Science, 2014, 9 (6), pp.e95275. 〈10.1371/journal.pone.0095275〉
Liste complète des métadonnées

Littérature citée [58 références]  Voir  Masquer  Télécharger

https://hal-riip.archives-ouvertes.fr/pasteur-01060276
Contributeur : Institut Pasteur Tunis <>
Soumis le : mercredi 3 septembre 2014 - 11:53:23
Dernière modification le : vendredi 23 novembre 2018 - 15:16:02
Document(s) archivé(s) le : vendredi 14 avril 2017 - 11:58:51

Fichier

PDF.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

Citation

Amel Ghouila, Isabelle Florent, Fatma Zahra Guerfali, Nicolas Terrapon, Dhafer Laouini, et al.. Identification of divergent protein domains by combining HMM-HMM comparisons and co-occurrence detection. PLoS ONE, Public Library of Science, 2014, 9 (6), pp.e95275. 〈10.1371/journal.pone.0095275〉. 〈pasteur-01060276〉

Partager

Métriques

Consultations de la notice

1162

Téléchargements de fichiers

420