Gene/Protein
Disease
Symptom
Drug
Enzyme
Compound
Pivot Concepts:
Gene/Protein
Disease
Symptom
Drug
Enzyme
Compound
Target Concepts:
Gene/Protein
Disease
Symptom
Drug
Enzyme
Compound
Query: UNIPROT:P06889 (
Mol
)
630,302
document(s) hit in 31,850,051 MEDLINE articles (0.00 seconds)
It is becoming increasingly common in quantitative structure/activity relationship (QSAR) analyses to use external test sets to evaluate the likely stability and predictivity of the models obtained. In some cases, such as those involving variable selection, an internal test set--i.e., a cross-validation set--is also used. Care is sometimes taken to ensure that the subsets used exhibit response and/or property distributions similar to those of the data set as a whole, but more often the individual observations are simply assigned 'at random.' In the special case of MLR without variable selection, it can be analytically demonstrated that this strategy is inferior to others. Most particularly, D-optimal design performs better if the form of the regression equation is known and the variables involved are well behaved. This report introduces an alternative, non-parametric approach termed 'boosted leave-many-out' (boosted LMO) cross-validation. In this method, relatively small training sets are chosen by applying optimizable k-dissimilarity selection (OptiSim) using a small subsample size (k = 4, in this case), with the unselected observations being reserved as a test set for the corresponding reduced model. Predictive errors for the full model are then estimated by aggregating results over several such analyses. The countervailing effects of training and test set size, diversity, and representativeness on
PLS
model statistics are described for CoMFA analysis of a large data set of COX2 inhibitors.
J Comput Aided
Mol
Des
PMID:Boosted leave-many-out cross-validation: the effect of training and test set diversity on PLS statistics. 1367 92
The blood-brain permeation of a structurally diverse set of 281 compounds was modeled using linear regression and a multivariate genetic partial least squares (G/
PLS
) approach. Key structural features affecting the logarithm of blood-brain partitioning (logBB) were captured through statistically significant quantitative structure-activity relationship (QSAR) models. These relationships reveal the importance of logP, polar surface area, and a variety of electrotopological indices for accurate predictions of logBB. The best models reveal an excellent correlation (r > 0.9) for a training set of 58 compounds. Likewise, the comparison of the average logBB values obtained from an ensemble of QSAR models with experimental values also verifies the statistical quality of the models (r > 0.9). The models provide good agreement (r approximately 0.7) between the predicted logBB values for 34 molecules in the external validation set and the experimental values. To further validate the models for use during the drug discovery process, a prediction set of 181 drugs with reported CNS penetration data was used. A >70% success rate is obtained by using any of the QSAR models in the qualitative prediction for CNS permeable (active) drugs. A lower success rate (approximately 60%) was obtained for the best model for CNS impermeable (inactive) drugs. Combining the predictions obtained from all the models (consensus) did not significantly improve the discrimination of CNS active and CNS inactive molecules. Finally, using the therapeutic classification as a guiding tool, the CNS penetration capability of over 2000 compounds in the Synthline database was estimated. The results were very similar to the smaller set of 181 compounds.
J Comput Aided
Mol
Des 2003 Oct
PMID:Computational models to predict blood-brain barrier permeation and CNS activity. 1506 64
Proteochemometrics was applied in the analysis of the binding of organic compounds to wild-type and chimeric melanocortin receptors. Thirteen chimeric melanocortin receptors were designed based on statistical molecular design; each chimera contained parts from three of the MC(1,3-5) receptors. The binding affinities of 18 compounds were determined for these chimeric melanocortin receptors and the four wild-type melanocortin receptors. The data for 14 of these compounds were correlated to the physicochemical and structural descriptors of compounds, binary descriptors of receptor sequences, and cross-terms derived from ligand and receptor descriptors to obtain a proteochemometric model (correlation was performed using partial least-squares projections to latent structures;
PLS
). A well fitted mathematical model (R(2) = 0.92) with high predictive ability (Q(2) = 0.79) was obtained. In a further validation of the model, the predictive ability for ligands (Q(2)lig = 0.68) and receptors (Q(2)rec = 0.76) was estimated. The model was moreover validated by external prediction by using the data for the four additional compounds that had not at all been included in the proteochemometric model; the analysis yielded a Q(2)ext = 0.73. An interpretation of the results using
PLS
coefficients revealed the influence of particular properties of organic compounds on their affinity to melanocortin receptors. Three-dimensional models of melanocortin receptors were also created, and physicochemical properties of the amino acids inside the receptors' transmembrane cavity were correlated to the
PLS
modeling results. The importance of particular amino acids for selective binding of organic compounds was estimated and used to outline the ligand recognition site in the melanocortin receptors.
Mol
Pharmacol 2005 Jan
PMID:Proteochemometric mapping of the interaction of organic compounds with melanocortin receptor subtypes. 1547 82
A pharmacophore model for the sigma-2 receptor was derived using GRIND (GRid INdependent Descriptors) descriptors arising from a 3D-level procedure whose main prerogative is that it does not require ligand alignment.
PLS
models for sigma-2 affinity (sigma-2 model: r2=0.83, q2=0.63) and sigma-1/sigma-2 selectivity (r2=0.72, q2=0.46) were derived using a series of alpha-tropanyl derivatives. The models provide pictures of the virtual receptor site (VRS) significant enough to attain a qualitative pharmacophoric representation of the sigma receptor. They give the internal geometrical relationships within two hydrophobic areas (hydrophobic-1 and -2) and a H-bond donor receptor region with which ligands establish non-covalent bonds.
J Comput Aided
Mol
Des 2004 May
PMID:GRIND-derived pharmacophore model for a series of alpha-tropanyl derivative ligands of the sigma-2 receptor. 1559 62
An automated
PLS
engine, WB-
PLS
, was applied to 1632 QSAR series with at least 25 compounds per series extracted from WOMBAT (WOrld of Molecular BioAcTivity). WB-
PLS
extracts a single Y variable per series, as well as pre-computed X variables from a table. The table contained 2D descriptors, the drug-like MDL 320 keys as implemented in the Mesa A&C Fingerprint module, and in-house generated topological-pharmacophore SMARTS counts and fingerprints. Each descriptor type was treated as a block, with or without scaling. Cross-validation, variable importance on projections (VIP) above 0.8 and q2 > or = 0.3 were applied for model significance. Among cross-validation methods, leave-one-in-seven-out (CV7) is a better measure of model significance, compared to leave-one-out (measuring redundancy) and leave-half-out (too restrictive). SMARTS counts overlap with 2D descriptors (having a more quantitative nature), whereas MDL keys overlap with in-house fingerprints (both are more qualitative). The SMARTS counts is the most effective descriptor system, when compared to the other three. At the individual level, size-related descriptors and topological indices (in the 2D property space), and branched SMARTS, aromatic and ring atom types and halogens are found to be most relevant according to the VIP criterion.
J Comput Aided
Mol
Des
PMID:An automated PLS search for biologically relevant QSAR descriptors. 1572 45
Binding affinity data [Bioorg Med Chem (2004) 12:613-623] of thiazole and thiadiazole derivatives (n = 30) for the human adenosine A3 receptor subtype have been subjected to 3D-QSAR (Quantitative structure-activity relationships) analyses by molecular shape analysis (MSA) and molecular field analysis (MFA) techniques using Cerius2 Version 4.8. In the case of the MSA, the major steps were (1) generation of conformers and energy minimization; (2) hypothesizing an active conformer (global minimum of the most active compound); (3) selecting a candidate shape-reference compound (based on the active conformation); (4) performing pairwise molecular superimposition using the maximum common subgroup (MCSG) method; (5) measuring molecular shape commonality using MSA descriptors; (6) determining other molecular features by calculating spatial, electronic and conformational parameters; (7) selection of conformers; (8) generation of QSAR equations by genetic function algorithm (GFA) or stepwise regression. The best 3D-QSAR equation (MSA) obtained from GFA technique shows 70.0% predicted variance (leave-one-out) and 77.7% explained variance. This equation shows the importance of Jurs descriptors (atomic charge weighted positive surface area, relative negative charge and relative positive charge surface area), partial moment of inertia, energy of the most stable conformer and the ratio of common overlap steric volume to volume of individual molecules. In the case of stepwise regression, the best relation showed 46.1% predicted variance and 72.3% explained variance. In the case of MFA, the major steps were (1) generating conformers and energy minimization; (2) matching atoms using a maximum common substructure (MCS) search and aligning molecules using the default options; (3) setting MFA preferences (rectangular grid with 2 A step size, charges by the Gasteiger algorithm, H+ and CH3 as probes); (4) creating the field; (5) analysis by the Genetic partial least squares (G/
PLS
) method. The equation obtained was of excellent statistical quality: 96.1% explained variance and 71.6% predicted variance. Statistically reliable 3D-QSAR models obtained from this study suggest that these techniques could be useful to design potent A3 receptor antagonists.
J
Mol
Model 2005 Nov
PMID:Exploring 3D-QSAR of thiazole and thiadiazole derivatives as potent and selective human adenosine A3 receptor antagonists+. 1592 17
A new application of the fractional wavelet transform (FWT) was proposed for the simultaneous determination of ampicillin (AP) and sulbactam (SB) in a pharmaceutical combination for injection. FWT approach is a new powerful tool for removing noise and irrelevant information from the absorption spectra. Cardinal information having higher peak amplitude, eliminated noise, sharp peaks with shrinking width of spectral range was obtained by the application of FWT procedure to the original absorption spectra. In this paper, FWT approach was subjected to the data vector of the UV-signals obtained from AP and SB in the wavelength range of 211.5-313.8 nm. Derivative transform was applied to the original absorption signal together with its FWT generalization. The calibration graphs for AP and SB were obtained by measuring the FWT and usual derivative amplitudes at zero-crossing points. The method validation was carried out by using the synthetic mixture analysis. Our proposed FWT approach was compared with the usual derivative spectrophotometry and chemometric methods (CLS, PCR and
PLS
) and a good agreement was reported.
Spectrochim Acta A
Mol
Biomol Spectrosc 2006 Mar 01
PMID:A new fractional wavelet approach for the simultaneous determination of ampicillin sodium and sulbactam sodium in a binary mixture. 1602 81
The cytochrome P450 (CYP) enzyme superfamily plays a major role in the metabolism of commercially available drugs. Inhibition of these enzymes by a drug may result in a plasma level increase of another drug, thus leading to unwanted drug-drug interactions when two or more drugs are coadministered. Therefore, fast and reliable in silico methods predicting CYP inhibition from calculated molecular properties are an important tool which can be applied to assess both already synthesized as well as virtual compounds. We have studied the performance of support vector machines (SVMs) to classify compounds according to their potency to inhibit CYP3A4. The data set for model generation consists of more than 1300 structural diverse drug-like research molecules which were divided into training and test sets. The predictive power of SVMs crucially depends on a careful selection of parameters specifying the kernel function and the penalty for misclassifications. In this study we have investigated a procedure to identify a valid set of SVM parameters which is based on a sampling of the parameter space on a regular grid. From this set of parameters, either single SVMs or SVM committees were trained to distinguish between strong and weak inhibitors or to achieve a more realistic three-class assignment, with one class representing medium inhibitors. This workflow was studied for several kernel functions and descriptor sets. All SVM models performed significantly better than
PLS
-DA models which were generated from the corresponding descriptor sets. As a very promising result, simple two-dimensional (2D) descriptors yield a three-class model which correctly classifies more than 70% of the test set. Our work illustrates that SVMs used in combination with simple 2D descriptors provide a very effective and reliable tool which allows a fast assessment of CYP3A4 inhibition potency in an early in silico filtering process.
J Comput Aided
Mol
Des 2005 Mar
PMID:A support vector machine approach to classify human cytochrome P450 3A4 inhibitors. 1605 71
The affinities of 177 nonameric peptides binding to the HLA-A*0201 molecule were measured using a FACS-based MHC stabilisation assay and analysed using chemometrics. Their structures were described by global and local descriptors, QSAR models were derived by genetic algorithm, stepwise regression and
PLS
. The global molecular descriptors included molecular connectivity chi indices, kappa shape indices, E-state indices, molecular properties like molecular weight and log P, and three-dimensional descriptors like polarizability, surface area and volume. The local descriptors were of two types. The first used a binary string to indicate the presence of each amino acid type at each position of the peptide. The second was also position-dependent but used five z-scales to describe the main physicochemical properties of the amino acids forming the peptides. The models were developed using a representative training set of 131 peptides and validated using an independent test set of 46 peptides. It was found that the global descriptors could not explain the variance in the training set nor predict the affinities of the test set accurately. Both types of local descriptors gave QSAR models with better explained variance and predictive ability. The results suggest that, in their interactions with the MHC molecule, the peptide acts as a complicated ensemble of multiple amino acids mutually potentiating each other.
J Comput Aided
Mol
Des 2005 Mar
PMID:Towards the chemometric dissection of peptide--HLA-A*0201 binding affinity: comparison of local and global QSAR models. 1605 72
We examined "descriptor collision" for several chemical fingerprint systems (MDL 320, Daylight, SMDL), and for a 2D-based descriptor set. For large databases (ChemNavigator and WOMBAT), the smallest collision rate remains around 5%. We systematically increase the "descriptor collision" rate (here termed "descriptor confusion"), in order to design a set of "descriptors to mask chemical structures", DMCS. If effective, a DMCS system would not allow third parties to determine the original chemical structures used to derive the DMCS set (i.e., reverse engineering). Using SMDL keys, the "confusion" rate is increased to 45.6% by eliminating those keys that have a low frequency of occurrence in WOMBAT structures. We applied an automated
PLS
engine, WB-
PLS
[Olah et al., J. Comput. Aided
Mol
. Des., 18 (2004) 437], to 1277 series of structures from 948 targets in WOMBAT, in order to validate the biological relevance of the SMDL descriptors as a potential DMCS set. The "reduced set" of SMDL descriptors has a small loss of modeling power (around 20%) compared to the initial descriptor set, while the collision rate is significantly increased. These results indicate that the development of an effective DMCS is possible. If well documented, DMCS systems would encourage private sector data release (e.g., related to water solubility) and directly benefit public sector science.
J Comput Aided
Mol
Des
PMID:Descriptor collision and confusion: toward the design of descriptors to mask chemical structures. 1632 10
<< Previous
1
2
3
4
5
6
7
8
9
10
Next >>