Gene/Protein
Disease
Symptom
Drug
Enzyme
Compound
Pivot Concepts:
Gene/Protein
Disease
Symptom
Drug
Enzyme
Compound
Target Concepts:
Gene/Protein
Disease
Symptom
Drug
Enzyme
Compound
Query: UNIPROT:P06889 (
Mol
)
630,302
document(s) hit in 31,850,051 MEDLINE articles (0.00 seconds)
In this study, starting with a newly introduced concept of data complexity ("empirical data complexity"), we specify the concept of complexity more concretely in relation to mathematical modeling and introduce "model-based complexity (MBC)". Inductive inference based on the minimum model-based complexity method is then applied to the reconstruction of molecular evolutionary tree from DNA sequences. We find that minimum MBC method has good asymptotic property when DNA sequence lengths approach to infinite and compensates the bias of maximum likelihood method due to the difference of tree topology complexity. The efficiency of minimum MBC method for reconstruction of molecular tree is studied by computer simulation, and results suggest that this method is superior to the traditional maximum likelihood method or its modification by Akaike's
AIC
.
Proc Int Conf Intell Syst
Mol
Biol 1997
PMID:Inference of molecular phylogenetic tree based on minimum model-based complexity method. 932 56
In phylogenetic inference by maximum-parsimony (MP), minimum-evolution (ME), and maximum-likelihood (ML) methods, it is customary to conduct extensive heuristic searches of MP, ME, and ML trees, examining a large number of different topologies. However, these extensive searches tend to give incorrect tree topologies. Here we show by extensive computer simulation that when the number of nucleotide sequences (m) is large and the number of nucleotides used (n) is relatively small, the simple MP or ML tree search algorithms such as the stepwise addition (SA) plus nearest neighbor interchange (NNI) search and the SA plus subtree pruning regrafting (SPR) search are as efficient as the extensive search algorithms such as the SA plus tree bisection-reconnection (TBR) search in inferring the true tree. In the case of ME methods, the simple neighbor-joining (NJ) algorithm is as efficient as or more efficient than the extensive NJ+TBR search. We show that when ME methods are used, the simple p distance generally gives better results in phylogenetic inference than more complicated distance measures such as the Hasegawa-Kishino-Yano (HKY) distance, even when nucleotide substitution follows the HKY model. When ML methods are used, the simple Jukes-Cantor (JC) model of phylogenetic inference generally shows a better performance than the HKY model even if the likelihood value for the HKY model is much higher than that for the JC model. This indicates that at least in the present case, selecting of a substitution model by using the likelihood ratio test or the
AIC
index is not appropriate. When n is small relative to m and the extent of sequence divergence is high, the NJ method with p distance often shows a better performance than ML methods with the JC model. However, when the level of sequence divergence is low, this is not the case.
Mol
Biol Evol 2000 Aug
PMID:Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used. 1090 45
Models of sequence evolution play an important role in molecular evolutionary studies. The use of inappropriate models of evolution may bias the results of the analysis and lead to erroneous conclusions. Several procedures for selecting the best-fit model of evolution for the data at hand have been proposed, like the likelihood ratio test (LRT) and the Akaike (
AIC
) and Bayesian (BIC) information criteria. The relative performance of these model-selecting algorithms has not yet been studied under a range of different model trees. In this study, the influence of branch length variation upon model selection is characterized. This is done by simulating sequence alignments under a known model of nucleotide substitution, and recording how often this true model is recovered by different model-fitting strategies. Results of this study agree with previous simulations and suggest that model selection is reasonably accurate. However, different model selection methods showed distinct levels of accuracy. Some LRT approaches showed better performance than the
AIC
or BIC information criteria. Within the LRTs, model selection is affected by the complexity of the initial model selected for the comparisons, and only slightly by the order in which different parameters are added to the model. A specific hierarchy of LRTs, which starts from a simple model of evolution, performed overall better than other possible LRT hierarchies, or than the
AIC
or BIC.
J
Mol
Evol 2001 May
PMID:The effect of branch length variation on the selection of models of molecular evolution. 1144 47
We propose an original statistical method to estimate how the occurrences of a given process along a genome, genes or motifs for instance, may be influenced by the occurrences of a second process. More precisely, the aim is to detect avoided and/or favored distances between two motifs, for instance, suggesting possible interactions at a molecular level. For this, we consider occurrences along the genome as point processes and we use the so-called Hawkes' model. In such model, the intensity at position t depends linearly on the distances to past occurrences of both processes via two unknown profile functions to estimate. We perform a non parametric estimation of both profiles by using B-spline decompositions and a constrained maximum likelihood method. Finally, we use the
AIC
criterion for the model selection. Simulations show the excellent behavior of our estimation procedure. We then apply it to study (i) the dependence between gene occurrences along the E. coli genome and the occurrences of a motif known to be part of the major promoter for this bacterium, and (ii) the dependence between the yeast S. cerevisiae genes and the occurrences of putative polyadenylation signals. The results are coherent with known biological properties or previous predictions, meaning this method can be of great interest for functional motif detection, or to improve knowledge of some biological mechanisms.
Stat Appl Genet
Mol
Biol 2005
PMID:FADO: a statistical method to detect favored or avoided distances between occurrences of motifs using the Hawkes' model. 1664 42
We examine the application of statistical model selection methods to reverse-engineering the control of galactose utilization in yeast from DNA microarray experiment data. In these experiments, relationships among gene expression values are revealed through modifications of galactose sugar level and genetic perturbations through knockouts. For each gene variable, we select predictors using a variety of methods, taking into account the variance in each measurement. These methods include maximization of log-likelihood with Cp,
AIC
, and BIC penalties, bootstrap and cross-validation error estimation, and coefficient shrinkage via the Lasso.
Stat Appl Genet
Mol
Biol 2005
PMID:Reverse engineering galactose regulation in yeast through model selection. 1664 46
Standard protein substitution models use a single amino acid replacement rate matrix that summarizes the biological, chemical and physical properties of amino acids. However, site evolution is highly heterogeneous and depends on many factors: genetic code; solvent exposure; secondary and tertiary structure; protein function; etc. These impact the substitution pattern and, in most cases, a single replacement matrix is not enough to represent all the complexity of the evolutionary processes. This paper explores in maximum-likelihood framework phylogenetic mixture models that combine several amino acid replacement matrices to better fit protein evolution.We learn these mixture models from a large alignment database extracted from HSSP, and test the performance using independent alignments from TREEBASE.We compare unsupervised learning approaches, where the site categories are unknown, to supervised ones, where in estimations we use the known category of each site, based on its exposure or its secondary structure. All our models are combined with gamma-distributed rates across sites. Results show that highly significant likelihood gains are obtained when using mixture models compared with the best available single replacement matrices. Mixtures of matrices also improve over mixtures of profiles in the manner of the CAT model. The unsupervised approach tends to be better than the supervised one, but it appears difficult to implement and highly sensitive to the starting values of the parameters, meaning that the supervised approach is still of interest for initialization and model comparison. Using an unsupervised model involving three matrices, the average
AIC
gain per site with TREEBASE test alignments is 0.31, 0.49 and 0.61 compared with LG (named after Le & Gascuel 2008
Mol
. Biol. Evol. 25, 1307-1320), WAG and JTT, respectively. This three-matrix model is significantly better than LG for 34 alignments (among 57), and significantly worse for 1 alignment only. Moreover, tree topologies inferred with our mixture models frequently differ from those obtained with single matrices, indicating that using these mixtures impacts not only the likelihood value but also the output tree. All our models and a PhyML implementation are available from http://atgc.lirmm.fr/mixtures.
...
PMID:Phylogenetic mixture models for proteins. 1885 96
jModelTest is a bioinformatic tool for choosing among different models of nucleotide substitution. The program implements five different model selection strategies, including hierarchical and dynamical likelihood ratio tests (hLRT and dLRT), Akaike and Bayesian information criteria (
AIC
and BIC), and a performance-based decision theory method (DT). The output includes estimates of model selection uncertainty, parameter importance, and model-averaged parameter estimates, including model-averaged phylogenies. jModelTest is a Java program that runs under Mac OSX, Windows, and Unix systems with a Java Run Environment installed, and it can be freely downloaded from (http://darwin.uvigo.es).
Methods
Mol
Biol 2009
PMID:Selection of models of DNA evolution with jModelTest. 1937 41
The strength and weakness of microarray technology can be attributed to the enormous amount of information it is generating. To fully enhance the benefit of microarray technology for testing differentially expressed genes and classification, there is a need to minimize the amount of irrelevant genes present in microarray data. A major interest is to use probe-level data to call genes informative or noninformative based on the trade-off between the array-to-array variability and the measurement error. Existing works in this direction include filtering likely uninformative sets of hybridization (FLUSH; Calza et al., 2007) and I/NI calls for the exclusion of noninformative genes using FARMS (I/NI calls; Talloen et al., 2007; Hochreiter et al., 2006). In this paper, we propose a linear mixed model as a more flexible method that performs equally good as I/NI calls and outperforms FLUSH. We also introduce other criteria for gene filtering, such as, R2 and intra-cluster correlation. Additionally, we include some objective criteria based on likelihood ratio testing, the Akaike information criteria (
AIC
; Akaike, 1973) and the Bayesian information criterion (BIC; Schwarz, 1978 ). Based on the HGU-133A Spiked-in data set, it is shown that the linear mixed model approach outperforms FLUSH, a method that filters genes based on a quantile regression. The linear model is equivalent to a factor analysis model when either the factor loadings are set to a constant with the variance of the latent factor equal to one, or if the factor loadings are set to one together with unconstrained variance of the latent factor. Filtering based on conditional variance calls a probe set informative when the intensity of one or more probes is consistent across the arrays, while filtering using R2 or intra-cluster correlation calls a probe set informative only when average intensity of a probe set is consistent across the arrays. Filtering based on likelihood ratio test
AIC
and BIC are less stringent compared to the other criteria.
Stat Appl Genet
Mol
Biol 2010
PMID:Informative or noninformative calls for gene expression: a latent variable approach. 2019 54
In phylogenetic analyses of molecular sequence data, partitioning involves estimating independent models of molecular evolution for different sets of sites in a sequence alignment. Choosing an appropriate partitioning scheme is an important step in most analyses because it can affect the accuracy of phylogenetic reconstruction. Despite this, partitioning schemes are often chosen without explicit statistical justification. Here, we describe two new objective methods for the combined selection of best-fit partitioning schemes and nucleotide substitution models. These methods allow millions of partitioning schemes to be compared in realistic time frames and so permit the objective selection of partitioning schemes even for large multilocus DNA data sets. We demonstrate that these methods significantly outperform previous approaches, including both the ad hoc selection of partitioning schemes (e.g., partitioning by gene or codon position) and a recently proposed hierarchical clustering method. We have implemented these methods in an open-source program, PartitionFinder. This program allows users to select partitioning schemes and substitution models using a range of information-theoretic metrics (e.g., the Bayesian information criterion, akaike information criterion [
AIC
], and corrected
AIC
). We hope that PartitionFinder will encourage the objective selection of partitioning schemes and thus lead to improvements in phylogenetic analyses. PartitionFinder is written in Python and runs under Mac OSX 10.4 and above. The program, source code, and a detailed manual are freely available from www.robertlanfear.com/partitionfinder.
Mol
Biol Evol 2012 Jun
PMID:Partitionfinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. 2231 68
The interactions between chemosensors, 3-amino-5-(4,5,6,7-tetrahydro-1H-indol-2-yl)isoxazole-4-carboxamide (
AIC
) derivatives, and different anions (F(-) Cl(-), Br(-), AcO(-), and H(2)PO(4) (-)) have been theoretically investigated using DFT approaches. It turned out that the unique selectivity of
AIC
derivatives for F(-) is ascribed to their ability of deprotonating the host sensors. Frontier molecular orbital (FMO) analyses have shown that the vertical electronic transitions of absorption and emission for the sensing signals are characterized as intramolecular charge transfer (ICT). The study of substituent effects suggests that all the substituted derivatives are expected to be promising candidates for fluoride chemosensors both in UV-vis and fluorescence spectra except for derivative with benzo[d]thieno[3,2-b]thiophene fragment that can serve as ratiometric fluorescent fluoride chemosensor only.
Int J
Mol
Sci 2012
PMID:A DFT study of pyrrole-isoxazole derivatives as chemosensors for fluoride anion. 2310 33
1
2
3
Next >>