EC:3.1.30.2 - FACTA Search

Gene/Protein Disease Symptom Drug Enzyme Compound
Pivot Concepts: Target Concepts:

Query: EC:3.1.30.2 (endonuclease)
18,621 document(s) hit in 31,850,051 MEDLINE articles (0.00 seconds)

The LAGLIDADG and HNH families of site-specific DNA endonucleases encoded by viruses, bacteriophages as well as archaeal, eucaryotic nuclear and organellar genomes are characterized by the sequence motifs 'LAGLIDADG' and 'HNH', respectively. These endonucleases have been shown to occur in different environments: LAGLIDADG endonucleases are found in inteins, archaeal and group I introns and as free standing open reading frames (ORFs); HNH endonucleases occur in group I and group II introns and as ORFs. Here, statistical models (hidden Markov models, HMMs) that encompass both the conserved motifs and more variable regions of these families have been created and employed to characterize known and potential new family members. A number of new, putative LAGLIDADG and HNH endonucleases have been identified including an intein-encoded HNH sequence. Analysis of an HMM-generated multiple alignment of 130 LAGLIDADG family members and the three-dimensional structure of the I- Cre I endonuclease has enabled definition of the core elements of the repeated domain (approximately 90 residues) that is present in this family of proteins. A conserved negatively charged residue is proposed to be involved in catalysis. Phylogenetic analysis of the two families indicates a lack of exchange of endonucleases between different mobile elements (environments) and between hosts from different phylogenetic kingdoms. However, there does appear to have been considerable exchange of endonuclease domains amongst elements of the same type. Such events are suggested to be important for the formation of elements of new specficity.
...
PMID:Statistical modeling and analysis of the LAGLIDADG family of site-specific endonucleases and identification of an intein that encodes a site-specific endonuclease of the HNH family. 935 75

The HNHc (SMART ID: SM00507) domain (SCOP nomenclature: HNH family) can be subclassified into at least eight subsets by iterative refinement of HMM profiles. An initial clustering of 323 proteins containing the HNHc domain helped identify the subsets. The subsets could be differentiated on the basis of the pattern of occurrence of seven defining features. Domain association is also different between the subsets. The subsets show organism as well as domain-based clustering, suggestive of propagation by both duplication and horizontal transfer events. Structure-based sequence analysis of the subsets led to the identification of common structural and sequence motifs in the HNH family with the other three families under the His-Me endonuclease superfamily.
...
PMID:HNH family subclassification leads to identification of commonality in the His-Me endonuclease superfamily. 1469 Dec 43

Lipocalins are functionally diverse proteins that are composed of 120-180 amino acid residues. Members of this family have several important biological functions including ligand transport, cryptic coloration, sensory transduction, endonuclease activity, stress response activity in plants, odorant binding, prostaglandin biosynthesis, cellular homeostasis regulation, immunity, immunotherapy and so on. Identification of lipocalins from protein sequence is more challenging due to the poor sequence identity which often falls below the twilight zone. So far, no specific method has been reported to identify lipocalins from primary sequence. In this paper, we report a support vector machine (SVM) approach to predict lipocalins from protein sequence using sequence-derived properties. LipoPred was trained using a dataset consisting of 325 lipocalin proteins and 325 non-lipocalin proteins, and evaluated by an independent set of 140 lipocalin proteins and 21,447 non-lipocalin proteins. LipoPred achieved 88.61% accuracy with 89.26% sensitivity, 85.27% specificity and 0.74 Matthew's correlation coefficient (MCC). When applied on the test dataset, LipoPred achieved 84.25% accuracy with 88.57% sensitivity, 84.22% specificity and MCC of 0.16. LipoPred achieved better performance rate when compared with PSI-BLAST, HMM and SVM-Prot methods. Out of 218 lipocalins, LipoPred correctly predicted 194 proteins including 39 lipocalins that are non-homologous to any protein in the SWISSPROT database. This result shows that LipoPred is potentially useful for predicting the lipocalin proteins that have no sequence homologs in the sequence databases. Further, successful prediction of nine hypothetical lipocalin proteins and five new members of lipocalin family prove that LipoPred can be efficiently used to identify and annotate the new lipocalin proteins from sequence databases. The LipoPred software and dataset are available at http://www3.ntu.edu.sg/home/EPNSugan/index_files/lipopred.htm.
...
PMID:Identification of functionally diverse lipocalin proteins from sequence information using support vector machine. 2018 53

Protein alignments are commonly used to evaluate the similarity of protein residues, and the derived consensus sequence used for identifying functional units (e.g., domains). Traditional consensus-building models fail to account for interpositional dependencies - functionally required covariation of residues that tend to appear simultaneously throughout evolution and across the phylogentic tree. These relationships can reveal important clues about the processes of protein folding, thermostability, and the formation of functional sites, which in turn can be used to inform the engineering of synthetic proteins. Unfortunately, these relationships essentially form sub-motifs which cannot be predicted by simple "majority rule" or even HMM-based consensus models, and the result can be a biologically invalid "consensus" which is not only never seen in nature but is less viable than any extant protein. We have developed a visual analytics tool, StickWRLD, which creates an interactive 3D representation of a protein alignment and clearly displays covarying residues. The user has the ability to pan and zoom, as well as dynamically change the statistical threshold underlying the identification of covariants. StickWRLD has previously been successfully used to identify functionally-required covarying residues in proteins such as Adenylate Kinase and in DNA sequences such as endonuclease target sites.
...
PMID:Optimization of Synthetic Proteins: Identification of Interpositional Dependencies Indicating Structurally and/or Functionally Linked Residues. 2627 77