|
Frequently Asked Questions
General/Miscellaneous
OMEGA
NOTE: pertains to Omega v2.2 unless otherwise noted.
Zap
FRED
ROCS
VIDA
NOTE: pertains to Vida v2.0 unless otherwise noted.
Filter
OEShape
OEChem
OEWrappers - Python
Lexichem
PVM and Parallelization
Quacpac
Chemistry
Answers General/Miscellaneous
-
How do I download OpenEye software?
Either
-
Go to the download page linked to the OpenEye home page, or
-
Anonymous FTP to ftp.eyesopen.com and cd pub
-
How do I install my license file oe_license.txt?
License files should be defined by environment variable $OE_LICENSE.
It is recommended that these files be located at OE_DIR=/usr/local/openeye/etc/oe_license.txt.
License files can also be used if named oe_license.txt and
either (1) present in the current working directory or (2) present in the
directory defined by environment variable $OE_DIR. However,
these methods are not as reliable and not generally recommended.
-
What's the deal with CYGWIN?
CYGWIN is a Unix environment for Windows, available at http://www.cygwin.com/.
Some OpenEye products are compiled for Windows using CYGWIN. However, CYGWIN
installation should not be required at runtime. One difference between CYGWIN-built
and native-windows built software is how absolute file paths may be specified,
unix-style or windows-style.
-
Why doesn't my license work? (#1)
If a Windows mail program was used to receive the license, one possibility is
that carriage return characters (ascii 13) were inadvertently added to the file.
Remove these using the dos2unix command or the following Perl command:
perl -pi -e 's/\r\n*/\n/g;' oe_license.txt
-
Why doesn't my license work? (#2)
In general, OE licenses may be concatenated and remain valid. However, some
exceptions to this rule exist. Old-format and new-format licenses cannot be
combined. Within an old-format license, Zap licenses may interfere with other
licenses following. And vice-versa: other product licenses may interfere with
Zap licenses which follow. To be safe, in old-format license files, keep Zap
licenses in a separate file from other product licenses.
-
Where/how should I install OpenEye software?
All software packages are installed into a directory named openeye,
typically at /usr/local/openeye, and defined by environment
variable $OE_DIR. Typical example:
$ pwd
/usr/local
$ tar xzf $HOME/omega-1.8.1-redhat-3.0WS-g++3.3-i586.tar.gz
$ tar xzf $HOME/szybki-1.0-redhat-3.0WS-g++3.4-i586.tar.gz
Several subdirectories are used consistently among packages:
/usr/local/openeye/ |
$OE_DIR |
/usr/local/openeye/arch/ |
platform dependent executables and libs |
/usr/local/openeye/bin/ |
common place for executables, typically links to platform dependent files
in arch |
/usr/local/openeye/data/ |
data files used by apps and libs |
/usr/local/openeye/docs/ |
documentation: html, pdf, etc. |
/usr/local/openeye/etc/ |
license files, config files, etc. |
/usr/local/openeye/etc/oe_license.txt |
new-format license (defined by $OE_LICENSE) |
/usr/local/openeye/toolkits/examples/ |
example C++ source code |
/usr/local/openeye/toolkits/include/ |
include files for libs |
/usr/local/openeye/toolkits/lib/ |
static and shared libs, typically links to platform dependent files in
arch |
/usr/local/openeye/wrappers/python/ |
Python toolkit wrappers, e.g., PyOEChem package |
/usr/local/openeye/wrappers/python/examples/ |
example Python source code |
/usr/local/openeye/wrappers/java/ |
Java toolkit wrappers |
/usr/local/openeye/wrappers/java/examples/ |
example Java source code |
-
What are the differences between OpenEye (OEChem) SMARTS and
Daylight SMARTS?
Upcarat indicates the hybridization: ^3 means SP3, ^2 means SP2. This feature is
used in the Omega torlib.txt file. Another difference is the semantics of
primitive [R], which in Daylight means number of SSSR rings an atom is in. In
OEChem, this means the number of ring bonds (e.g., benzene atoms are all [R2]),
while [R]) means any non-zero number of ringbonds. This discrepancy is
unfortunate but based on the weakness of SSSR (as explained in the OEChem manual),
which is not a rigorous concept, and the algorithms for which are arbitrary and
not deterministic.
-
Windows Error: msvcp71.dll and msvcr71.dll are missing.
These DLLs comprise the needed runtime environment for applications built with
MS Visual C++ 7.1, such as many OpenEye windows applications. They may be freely
obtained and used, and are available for our customers' convenience at ftp://ftp.eyesopen.com/pub/misc/MSVC71runtime.zip.
-
Windows Application Error... "The application failed to
initialize properly."
This can mean that a needed DLL is present but not executable. Set on the
executable property of the DLL.
-
How should I cite OpenEye software in a publication?
Sample citation:
OEChem, version 1.3.4, OpenEye Scientific Software, Inc., Santa Fe, NM, USA, www.eyesopen.com,
2005.
-
Windows Error: libmmd.dll is missing.
This DLL is the needed as runtime environment for applications built with the
Intel C++ Compiler. The DLL is freely obtained and used, and is available for
our customers' convenience at ftp://ftp.eyesopen.com/pub/misc/INTELlibmmd.zip.
-
My MOL2 file is interpreted strangely? What is wrong?
Please see MOL2
files for Dummies, Roger Sayle, CUPV, Feb. 2004.
-
What elements are handled by MMFF94?
MMFF94 handles a specific list of atom types, where a type is defined by the
element and its valence state. For typical organic chemistry environments the
following elements are handled:
C, N, O, F, S, P, Cl, Br, I, Si, H
Also, the following ions:
Fe+2, Fe+3, F-, Cl-, Br-, Li+, Na+, K+, Zn+2, Ca+2, Cu+1, Cu+2, Mg+2
OpenEye applications which utilize MMFF94 such as Omega and Szybki will thus be
similarly limited.
-
What is Centos? How is it related to RedHat Enterprise
Edition?
As stated at www.centos.org:
CentOS is an Enterprise-class Linux Distribution derived from sources freely
provided to the public by a prominent North American Enterprise Linux vendor.
CentOS conforms fully with the upstream vendors redistribution policy and aims
to be 100% binary compatible.
So, OpenEye provides RedHat Enterprise Linux (RHEL) compatible versions in some
cases by means of CentOS.
OMEGA
NOTE: pertains to Omega v2.2 unless otherwise noted.
-
Does the order of rules matter in torlib.txt?
Yes. Earlier rules take precedence if a subsequent rule conflicts.
-
Why use
.oeb (OEBinary) or .oeb.gz
as output format?
The OEB format is particularly suited to use for Omega output and input by
downstream applications.
-
OEB and better, OEB-gzipped are compact. This is particularly important when
parallelizing using PVM.
-
OEB explicitly stores multiconformer molecules as such. Alternatives such as
SDF do not and thus are fundamentally less reliable at preserving these
relationships.
-
OEB, like SDF, allows storage of generic data such as energies and scores.
-
Omega 2.0+ requires OEB output in PVM mode.
-
As of Omega 2.2, -rotorOffsetCompress is true by default which reduces the size
of ouput OEB.
-
Which Omega?
If you enter "omega " and get something like:
This is Omega, Version 3.14159--1.8 (Web2C 7.3.1)
Copyright (c) 1994--1999 John Plaice and Yannis Haralambous
... you're running the wrong program! This Omega is a TeX variant for unicode!
Try specifying the full path of the OpenEye Omega.
-
Can Omega enumerate all the stereoisomers of an input
molecule?
Omega cannot by itself permute the stereo configurations of an input molecule.
The recommended approach is to enumerate the desired stereoisomers prior to the
Omega run. Omega will infer R/S from the 2D SD file if possible, and stick with
that configuration. Included with Omega is an auxilliary program "flipper"
which can enumerate stereoisomers for such a purpose. In addition, OEChem
includes source code examples for enumerating stereoisomers which can be
customized.
-
What elements can Omega handle?
C, N, O, F, S, P, Cl, Br, I, Si and H. Molecules containing other elements will
be skipped.
-
What does the upcarat ^ symbol mean in the SMARTS in torlib.txt?
Upcarat indicates the hybridization: ^3 means SP3, ^2 means SP2. This is one of
only two differences between OpenEye SMARTS and Daylight SMARTS; see the OEChem
manual for details.
-
What energies are calculated and reported by Omega?
The "-ewindow" value is applied relative to the lowest energy conformer. the
energies written to output formats which can contain energies (.mol2 and .sdf)
are standard MMFF94 energies, in kcal/mol.
-
What force field is used by Omega?
The options -buildff and -searchff are
both set by default to mmff94s_NoEstat which is designed to
improve reproduction of aqueous solution phase conformations. Other choices:
mmff, mmff_NoEstat, mmff_Trunc,
mmff94s, mmff94s_NoEstat, mmff94s_Trunc.
-
Stochastic versus deterministic: when can Omega results
vary for the same inputs?
First and foremost, note that inconsistent results between separate Omega
executions do not imply that results are incorrect relative to the design goals
or stipulated parameters, rmsd, energy, etc. There are no known cases where the
inconsistencies discussed reflect actual errors. However, for several reasons it
is desireable to have numerically identical results for constant program version,
input data and parameters, independent of input order, format, platform, time,
phase of the moon, etc.
Although number of conformers per molecule is a conspicuous result, note that it
can be a deceptive measure. Given the combinatorial nature of conformer
enumeration, that number should probably be regarded on a logarithmic scale. For
example, 440 vs. 220 can result from just one more torsion angle value at one
rotor.
The distance geometry 2D->3D method used in Omega involves a stochastic step
which can result in somewhat different results for the same input molecule. Of
course "stochastic", on a digital computer, usually and in this case implies
pseudo-random numbers based on a seed. It is a design goal that results not
vary or vary minimally due to the stochastic step.
In Omega 2.1+, the input molecule atom order is canonicalized. This eliminates
another cause of variation in results.
With Omega the use of a stochastic step is limited since the program relies on a
fragment database which can be large and comprehensive. For any user dataset,
it is possible to augment the supplied fragment database so that the fragment
database is comprehensive, so the stochastic distance geometry algorithm need
not ever be invoked during the Omega run itself (only by makefraglib).
One other possible source of inconsistent results is floating point precision
variations among platforms. These inconsistencies are minimized in Omega by use
of platform-independent psedo-random number generators and other methods, but
they can exist.
-
How fast is Omega?
Using Omega 2.1, on a computer with two dual-core 2.6GHz cpus, 780287 molecules,
from the Chembridge public dataset, were processed in 3days, 7hrs, 54min and 31sec.
That's 0.369 sec/mol overall, and 1.47 sec/mol on each processor.
Zap
-
Is there a way to download all of ZAP at once?
Currently no. The ZAP library for each platform is downloaded separately from
example source code and platform-specific compiled binaries. However, here is a
bundle of all the example source code: zap_example_src.tgz.
-
Additional documentation...?
See the Sybyl/Zap interface
manual written by Glen Kellogg which is an excellent primer on PB
electrostatics.
FRED
-
Does FRED consider a molecule's internal potential?
Scoring functions generally do not have an internal potential. Thus it is
possible for a high energy conformation to score well. Conformation quality
assurance is left to the conformation generation program (Omega).
-
What does "Unable to dock" mean? Shouldn't FRED
report the best fit regardless?
The 'unable to dock molecule X' indicates that initial exhaustive search
routines could not place the molecule in the pocket (i.e., there are no poses to
score). FRED creates a negative image of the receptor site which is defined as
the active site box (see the -box flag), minus positions where a probe atom
clashes which the protein (see the -clash_checking flag) and minus positions
which have a very low shape score (see the -neg_img_size flag). FRED filters its
initial exhaustive set of poses through this negative image. Any poses which
have atoms that lie outside this image are rejected. This message indicates that
this filter rejected every pose of the initial exhaustive ensemble. Increasing -neg_img_size
has a very good chance of fixing the issue. Decreasing -clash_checking may help
also if you have a very tightly bound ligand, this however will allow the ligand
to clash more with the protein (if you need to do this to reproduce an
experimentally determined structure that structure likely has some fairly
significant clashes). Finally you may simply need to increase the size of your
box (there is a flag -addbox that provides a simple mechanism to do this, or you
can just create a new box). Now in v2.0, the -neg_img_size specifies a size
relative to the box.
-
How is FRED related to other OpenEye technologies and
products?
The FRED algorithm is based on a "Gaussian docking function", which
derived from the same work as OpenEye's OEShape toolkit (Grant, Nicholls, et al.).
Much of the theory of shape and surfaces thus applies to FRED as well. FRED also
has integrated Zap Poisson-Boltzman electrostatics technology via the zap_bind
scoring function.
-
What is the fredA/fredPA program?
fredA (now called fredPA) is short for FRED Analysis, and is designed to take a
bound ligand, redock it and then report the RMSDs of the redocked ligand poses
relative to the bound pose, plus other details of the redocking process.
Conformations are recalculated internally using Omega.
-
How is the best box size determined?
Not too small, but not too big. Here's the explanation. FRED generates a
negative image of the receptor site; any pose that does not fit inside this
image is rejected by FRED. If this image is too small FRED will reject every
possible pose, resulting in a "Failed to Dock" message for that
molecule. The image is constructed as follows:
1) Start with the entire enclosed volume of the box.
2) Remove positions within the box that clash with the protein.
3) Remove positions with poor shape score.
Steps 1 & 2 are straightforward. Step 3 needs some explanation. Poor shape
score refers to poor shape score w.r.t. the Gaussian Shape Scoring Function, and
FRED selects a cutoff value for the shape score of a probe atom. Any position
where the probe atom has a shape score below the cutoff value is removed from
the negative image. The cutoff value for the scores isn't directly specified.
What is specified is total volume of the negative image of the receptor site,
via option -excvol. FRED chooses a cutoff such that it creates an image with the
specified volume. If you increase your box size, FRED will increase the cutoff
score such that the size of the negative image remains constant. This can cause
it to reject poses it would accept with a smaller box, because positions that
previously had acceptable scores are now rejected. To increase box size and
retain previously acceptable poses, increase -excvol also.
Another reason not to use too large a box is simply that FRED will require more
time and steps to enumerate all possible poses in the box.
-
Scoring function references:
PLP
- D.K. Gehlhaar, G.M. Verkhivker, P.A. Rejto, C.J. Sherman, D.B. Fogel, L.J.
Fogel and S.T. Freer, "Molecular recognition of the inhibitor AG-1343 by
HIV-1 protease: Conformationally flexible docking by evolutionary programming",
Chemistry & Biology 1995, 2, 317-324.
- Chemscore
-
M.D. Eldridge, C.W. Murray, T.R. Auton, G.V. Paolini, and R.P. Mee, "Empirical
scoring functions: I. the development of a fast empirical scoring function to
estimate the binding affinity of ligands in receptor complexes", J. Computer-Aided
Molecular Design 11:425-445 (1997).
-
C.A. Baxter, C.W. Murray, D.E. Clark, D.R. Westhead and M.D. Eldridge, "Flexible
docking using TABU search and an empirical estimate of binding affinity",
Proteins, 33, 367-382 (1998).
- Screenscore
- M. Stahl and M. Rarey, "Detailed Analysis of Scoring Functions for
Virtual Screening", J. Med. Chem, 2001, 44, 1035-1042.
- ShapeGauss, ChemGauss
- Mark R. McGann, Harold R. Almond, Anthony Nicholls, J. Andrew Grant, and
Frank K. Brown, "Gaussian Docking Functions", Biopolymers, Vol. 68, pp.
76-90, 2003.
- Zap_Bind
- J. Andrew Grant, Barry T. Pickup and Anthony Nicholls, "A Smooth
Permittivity Function for Poisson-Boltzmann Solvation Methods", Journal of
Computational Chemistry, Vol. 22, No. 6, pp. 608-640, 2001.
-
What is the receptor box?
A Fred box is rectangular, oriented to the cartesian axes, and defined by a
molecule file where the X, Y, and Z ranges are defined by the max/min atom
coordinates. Thus, a minimal box file consists of only two atoms, but any
molecule file can be used (such as that of a bound ligand).
-
Can one atom satisfy multiple constraints?
Yes. If multiple defined constraint volumes overlap, one atom located in the
overlap volume can satisfy all the constraints (if the smarts patterns are
matched).
-
How can a receptor box be defined?
Regarding Fred 2.0+ box generation, here are some choices:
-
Use Vida 1.1.2. (Vida 1.3 has some bugs in the Fred setup functionality.) Be
sure to rename the resulting box file with the correct suffix. ".box" really
should be ".pdb".
-
Use the .xyz format (see example below, four lines only) and edit by hand. Only
two points are needed to define a rectangular box (xmin,ymin,zmin) and (xmax,
ymax, zmax).
2
test box
C -1.00000 2.00000 -4.00000
C 15.00000 9.50000 7.50000
-
Use any existing bound ligand molecule file. The dimensions of the box will be
determined by the minimum and maximum x/y/z values of all atoms in the molecule.
Use the "-addbox" parameter to increase the extent of each of the six sides of
the box, if desired.
-
Use the PyOEChem example program mol2box.py
to generate a box file based on a bound ligand. This code can vary the extents
of all six sides of the box independently.
-
Future: Use a special graphical tool designed for Fred setup, now (August 2005)
in development. May be bundled with Fred.
ROCS
-
What's with the SUBTAN column?
The SUBTAN column in the Rocs report file stands for the 'substructure Tanimoto'.
The subtan is calculated by performing a shape superposition of 'fit' molecules
onto a reference molecule (query molecule). Once superimposed, atoms in the 'fit'
molecule that are greater than 2.5 angstroms away from the closest atom of the
reference molecule are discarded, and a shape Tanimoto is calculated with the
remaining atoms. The subtan score was an attempt to provide minimal facility to
do superstructure shape searches. In some cases it is possible to find a
superstructure of the query molecule in a database. In practice, this is an
imperfect solution and subshape searching can often be done in better ways, such
as via Tversky similarity.
-
What's the difference between the -chemff and -optchem
options?
Both options involve the same chemical force field defined by smarts patterns in
an external file. -chemff defines and includes a chemical force field for final
scoring after optimization. The argument to -chemff is the file defining the
force field. -optchem includes the defined chemical force field in overlay
optimization, so the resulting aligned coordinates will be affected. Thus, -optchem
requires -chemff to define the field. Using -chemff alone means optimization of
alignment will be based on shape only.
-
Does ROCS consider hydrogens?
No. This is intentional, the prevailing judgment being that hydrogens do not
contribute meaningfully to shape comparison in the context of ROCS. The OEShape
Toolkit does provide this functionality.
-
Is the ROCS shape similarity calculation analytical or grid
based?
Grid based. The OEShape Toolkit provides several levels of accuracy including a
high-precision approximation to analytical. For speed ROCS uses the grid based
shape comparison.
VIDA
NOTE: pertains to Vida v2.0 unless otherwise noted.
-
Does VIDA require a license?
As of version 2.0, Vida utilizes the same licensing system as other recently
released programs. A single common license file normally identified by
environment variable OE_LICENSE is required. If a license
file is not found, Vida will prompt the user for its location, and store that
location for future use.
-
How do I set environment variable
OE_DIR
or OE_LICENSE on Windows?
On Win2k/WinXP, use Start->Settings->Control Panel->System->Advanced->Environment
Variables
-
Resolving VIDA/Windows error: "The dynamic link library
MSVCP60.dll could not be found."
MSVCP60.dll is the Microsoft Visual C version 6 runtime library, which is
available publicly from Microsoft bundled in Vcredist.exe. See microsoft.com.
-
Where can I install a custom startup script?
With Vida 2.0, a mechanism was introduced whereby Vida-API (Python) commands may
be invoked at startup. At present, this file must be named startup.py
and must be located as follows:
- Windows:
C:\Documents and Settings\username\OpenEye\VIDA2\startup.py
- Unix/linux:
$HOME/.OpenEye/VIDA2/startup.py
Note that the example pdbopen.py provided with Vida can be
used as a startup script. This adds a menu button to download a PDB file by its
ID.
-
Will Vida work with all video cards?
The answer is no. At this time, the following cards are known to be compatible:
-
NVidia (several)
-
ATI FireGL V5000
These are not compatible:
For best results, use the latest available graphics driver software. Some
graphics driver download sites:
The amount of RAM that has been allocated to the video card should be 128 MB or
more for best results. Most integrated graphics cards use system RAM instead of
having their own video RAM. Video RAM should be a configurable parameter, on
Windows, in the display settings of the control panel.
Filter
-
What's the difference between filter_lead.txt and
filter_drug.txt?
For lead candidate filtering, filter_lead.txt is recommended, and corresponds
with the default settings. It should filter things a modeler wouldn't want to
show a medicinal chemist. filter_drug.txt is more discriminating and should
filter everything that doesn't look like a known drug (including many useful
lead compounds).
OEShape Toolkit
-
How are ROCS and the OEShape toolkit related?
ROCS is based on and built upon the OEShape Toolkit and uses the shape
similarity measure in the toolkit API. The OEShape Toolkit includes source code
examples which are essentially minimalistic ROCS implementations. As the time of
this writing, the ROCS product is included when the OEShape Toolkit is licensed.
OEChem
-
What C++ compiler is required?
OEChem distributions are generally labelled clearly to indicate the
corresponding compiler and version. Additional compiler support information can
be found at the OpenEye Platform Support
document, in the accompanying technical notes.
-
How should OEChem be installed?
As of OEChem 1.2, OEChem and other products are to be installed in a consistent
way into a directory named "openeye" typically at /usr/local/openeye. See also Where/how
should I install OpenEye software?
No further installation is needed for the C++ package, but if code is to be
compiled in user directories then /usr/local/openeye/oechem/include needs to be
in the include path and /usr/local/openeye/lib/ needs to be in the link path.
This can be accomplished simply by using the Makefile provided with the example
code.
For the Python "openeye" package further installation is required as
described in the INSTALL file. Either $PYTHONPATH must be redefined to include
the PyOEchem directory openeye/wrappers/python/ (recommended),
or, the OpenEye directory must be copied recursively to the Python site-packages
directory. Also, on unix/linux platforms $LD_LIBRARY_PATH
must be redefined to include the openeye/wrappers/libs/
directory. On MacOS the environment variable is $DYLD_LIBRARY_PATH
. On IRIX also set $LD_LIBRARYN32_PATH and $LD_LIBRARY64_PATH
.
-
OEChem or OELib?
Matt Stahl explains the history and motivation behind OEChem: OEChem
or OELib?
-
When reading a MDLfile, what does this mean: Warning:
Stereochemistry corrected on atom number N?
The problem is that the configuration specified by up and down bonds around a
tetrahedral center in a 2D depiction is potentially ambiguous or blatantly
incorrect. An excellent review of the issues can be found in the CORINA manual:
http://msdlocal.ebi.ac.uk/docs/chem_comp/corina.ps, p19-21. This "Stereochemistry
corrected" warning is issued when OEChem takes a "best guess" at
what the chemist intended for configurations that CORINA lists as "incorrect".
The "Invalid stereochemistry specified on atom number N" is for an
arrangement of wedge-and-hash bonds that even OEChem can't figure out whats
going on. e.g. 4 up bonds from an atom.
-
What file formats does OEChem handle?
Here's the current status (OEChem 1.4.2):
| code |
ext |
format |
read? |
write? |
| 1 |
smi |
SMILES |
yes |
yes |
| 2 |
mdl |
MDL Mol |
yes |
yes |
| 3 |
pdb |
PDB |
yes |
yes |
| 4 |
mol2 |
Tripos MOL2 |
yes |
yes |
| 5 |
bin |
OEBinary v1 |
yes |
no |
| 6 |
tdt |
Daylight TDT |
no |
no |
| 7 |
ism |
Isomeric SMILES |
yes |
yes |
| 8 |
mol2h |
MOL2 with H |
yes |
yes |
| 9 |
sdf |
MDL SDF |
yes |
yes |
| 10 |
can |
Canonical SMILES |
yes |
yes |
| 11 |
mf |
Molecular Formula |
no |
yes |
| 12 |
xyz |
XYZ |
yes |
yes |
| 13 |
fasta |
FASTA |
yes |
yes |
| 14 |
mopac |
MOPAC |
no |
yes |
| 15 |
oeb |
OEBinary v2 |
yes |
yes |
| 16 |
mmod |
Macromodel |
yes |
yes |
| 17 |
sln |
Tripos SLN |
no |
yes |
| 18 |
rdf |
MDL RDF |
yes |
no |
| 19 |
cdx |
ChemDraw CDX |
yes |
yes |
| 20 |
skc |
MDL ISIS Sketch File |
yes |
no |
-
What is the difference between the high- and low-level
molecule file writers?
There is an important functional difference between high-level molecule writing
method OEWriteMolecule() (or equivalently, the overloaded << operator in C++)
and the low-level format-specific writers such as OEWriteMDLFile(). With the
high-level writers, OEChem takes responsibility for normalizing the state of the
molecule to correspond with the output format. For example, before writing an
MDL file, OEChem will apply the MDL aromaticity model; before writing a
canonical SMILES, OEChem will apply the Daylight aromaticy model, etc. The low-level
writers, in contrast, require that the programmer apply these normalizations
manually, and provide the flexibility to do otherwise, and control chemical
content and file format independently. For example, it is possible to write a
MDL file with the Tripos aromaticity model, or a Kekule canonical smiles. For
details as to which normalizations are applied for each file format, consult the
theory manual.
A related issue is the roles of the I/O "flavors" which are specified by the
SetFlavor() method for an oemolistream or
oemolostream, and the "flags", which are specified via an optional
argument to the low-level I/O functions. The low-level flags specify choices
which must be made at the time of file I/O, such as whether PDB TER records
delimit molecules or molecule components. The flavors do not affect the low-level
functions, and their effects are a superset of the effects of the flags. Flavors
additionally control molecule standardization functions such as aromaticity
model perception.
Low-level molecular input functions:
-
OEReadCDXFile
-
OEReadFASTAFile
-
OEReadMacroModelFile
-
OEReadMDLFile
-
OEReadMol2File
-
OEReadMolecule
-
OEReadMOPACFile
-
OEReadOldBinary
-
OEReadPDBFile
-
OEReadSketchFile
-
OEReadXYZFile
Low-level molecular output functions:
-
OECreateAbsSmiString
-
OECreateCanSmiString
-
OECreateIsoSmiString
-
OECreateSlnString
-
OECreateSmiString
-
OEWriteCDXFile
-
OEWriteFASTAFile
-
OEWriteMacroModelFile
-
OEWriteMDLFile
-
OEWriteMOPACInputFile
-
OEWriteMol2File
-
OEWritePDBFile
-
OEWriteMolecule
-
OEWriteXYZFile
-
Windows and MS Visual Studio/C/C++: How do I get started?
First, be sure to see the README pertaining to Windows development available in
the OEChem section of the download page, which
describes the required compiler versions. In the MSVC distribution, the Makefile
at openeye/examples/oechem/ contains the recommended compilation flags for use
with MS Visual Studio/C/C++ (although most developers will not use make). As of
OEChem 1.3.3, example project files are provided with the distribution.
Note also that ZLIB must be installed; to obtain see zlib.org.
Also: see Using OEChem and
Ogham with Microsoft Visual Studio .NET [PDF]
-
Why doesn't OEGetSDData(mol,"foo") work on my OEMol?
OEMol objects, such as returned by GetOEMols, are multiconformer molecules. When
an OEMol is read from an SDF file, the SD data is attached to the conformations,
not the parent mol. So, this will work:
for mol in ims.GetOEMols():
for conf in mol.GetConfs():
print "This conf name is", OEGetSDData(conf,"Name")
In contrast OEGraphMol objects are single conformer. When dealing with single
conformer files these should be used. Then this will work:
for mol in ims.GetOEGraphMols():
print "This molecule name is", OEGetSDData(mol,"Name")
Note that this is true even when a "multiconformer molecule" only has one
conformation, or conformations are actually 2D, or even are all zeroes.
This may seem complicated, but this approach is consistent with OEChem's ability
to handle multiconformer molecules across multiple formats.
-
Is OEChem thread safe?
OEChem is designed to be thread safe, i.e., re-entrant, but as of v1.4.0 there
are known areas where OEChem is not re-entrant, namely, OEThrow, and generic
data (e.g. SetData/GetData). In addition, C++ STL streams are normally not
thread safe, so OEChem streams are also not thread safe. Multithreaded code can
be written by selective use of OEChem, but the forbidden zones are not yet well
established. At the very least, separate threads should not operate on the same
molecules.
-
How should I get started using SMIRKS and OEChem
reaction processing?
See the SMIRKS Primer.
OEWrappers - Python
-
Why Python?
Python is an object oriented interpreted language which fits well as a wrapper
around the native C++ OEChem API. Almost all the C++ constructs and concepts
translate directly into Python. Python is highly recommended for its power,
convenience, and extensibility. If you haven't already, try it, you'll like it!
-
What Python version is required?
As of OEChem version 1.3.3 (May 2005), both Python 2.3 and 2.4 are supported,
and Python 2.2 is dropped. Python packages are major version specific (1st
decimal place), so installers should assure that a supported major version is
installed. Minor versions should not require specific builds.
-
How can I run Py-OEChem from CYGWIN?
Py-OEChem is not available for use with the Python bundled with CYGWIN. However,
by installing "Python for windows" from python.org
and running that executable (normally /cygdrive/c/python23/python.exe)
then Py-OEChem scripts may be run from CYGWIN. The one caveat is that absolute
file pathnames must be specified in DOS format -- with backslashes, etc. --
though relative pathnames can be in CYGWIN/UNIX format.
Lexichem
-
What is the difference between OpenEye, Systematic, and
IUPAC names?
The core issue is that the IUPAC standards for chemical nomenclature are a
moving target. They were first published in 1979, revised again in 1993 and
currently undergoing final draft status for a third revision in 2005. As time
goes by IUPAC attempts a form of social manipulation on geological timescales by
deprecating common names that have been tolerated traditionally, moving towards
more and more systematic names.
The bullseye of this moving target is what is (in the 2005 standard) referred to
as the PIN (preferred IUPAC name), which is the often the single name blessed by
IUPAC for the compound. In addition, to this the standard allows what are
called acceptable names, for example many of the names preferred in the earlier
1979 and 1993 standards now have acceptable status.
The three categories you point out, (kind of) define the extremes of these IUPAC
categories. The IUPAC name style in Ogham is intended to precisely follow the
2005 PIN. Future versions releases of Ogham may even have explicit IUPAC2005,
IUPAC93 and IUPAC79 namestyles that'll implement those standards more accurately.
The "Systematic" namestyle is the fully systematic name, and predicts where the
IUPAC standards are headed. The "OpenEye" name style are IUPAC allowed names,
that are more familiar to a chemist. Conceptually IUPAC names are closer to the
names found in chemical supplier catalogs or the IUPAC93 and IUPAC79 standards.
A good rule of thumb is that OpenEye names are typically always shorter than
IUPAC names. Other than whether a name is still "allowed" by IUPAC, the border
is more blurred between OpenEye names and traditional names (which are the final
resting place for archaic names of historical interest).
So for some examples of the differences:
| O |
"water" in OpenEye, but "oxidane" in IUPAC and Systematic. |
C#C |
"acetylene" in OpenEye and IUPAC, but "ethyne" in Systematic. |
*Nc1cccc1 |
"anilino" in OpenEye and IUPAC, but "phenylamino" in Systematic. |
*C(=O)C |
"acetyl" in OpenEye and IUPAC, but "ethanoyl" in Systematic. |
*O[N+]#[C-] |
"fulminato" in OpenEye, but "isocyanooxy" in IUPAC and Systematic. |
CC(=O)C |
"acetone" in OpenEye, but "propan-2-one" in IUPAC/Systematic. |
C12C3C4C1C5C4C3C25 |
"cubane" in OpenEye, but "BLAH" in IUPAC and Systematic. |
C(=O)O |
"formic acid" in OpenEye/IUPAC, but "methanoic acid" in Systematic. |
As an example of traditional names *S is "sulfanyl" in OpenEye/IUPAC/Systematic
but "mercapto" in traditional.
You can see that OpenEye names which are pretty much the names that you've been
using until now are "reasonable" names. Naming water as "oxidane" and
disallowing "indane" and "cubane" might push chemists tolerances to the limit.
After all they are still prefered names in government agencies until the new
standard is ratified and comes into effect (in 2005).
PVM and Parallelization
-
What factors affect the performance of OpenEye tools
parallelized with PVM?
Several factors can degrade parallel scalability from the maximum possible which
is N-times single-cpu speed where N = #cpus. The master node is devoted to
traffic control and not computation so the first slave (second total CPU)
provides no immediate advantage. The time required for communication between
master and slaves affects performance adversely. This affect increases with: (1)
more slaves, (2) a slower network, (3) a faster single-cpu program, and/or (4)
less compact file formats. The point of diminishing returns therefore will be
different depending on the program, input files, and network. In general,
uncompact file formats such as SDF or MOL2 should not be used. Compressed
OEBinary should be used if possible (.oeb.gz). In general, ROCS is faster than
Omega, which is faster than FRED. So the number of slaves at this point of
diminishing returns will increase in that order.
-
What is the meaning of these error messages?
Warning: Could not start pvm daemon on host 'foobar'
Warning: Unable to launch slave on host foobar, no pvm daemon
Often the cause of these errors is simply that the master is unable to connect
with the slave via rsh or ssh (whichever is defined by $PVM_RSH). You should be
able to run the command "rsh foobar 'echo $PVM_ROOT'". Another possibility is
that PVM is not installed correctly.
Warning: Slave 1 on host foobar died and could not be restarted
This implies that the slave executable launched but then exited improperly. If
error messages were generated these may be found on the slave in /tmp/pvml.<uid>.
-
What is the correct slave hostname?
By running the command line utility pvm (on the master),
this answer can be found, and other diagnostics can be run.
$ pvm
pvm> conf
conf
3 hosts, 2 data formats
HOST DTID ARCH SPEED DSIG
beavis.mypharma.com 40000 DARWIN 1000 0x0658eb59
butthead 80000 LINUX 1000 0x00408841
mork 140000 LINUXI386 1000 0x00408841
pvm> add mindy
add mindy
0 successful
HOST DTID
mindy Can't start pvmd
Auto-Diagnosing Failed Hosts...
mindy...
Verifying Local Path to "rsh"...
...
From this output we see that nodes butthead and mork
must be specified with their simple hostname, whereas beavis
requires its fully qualified hostname with domain. Using the add
command tests and diagnoses the master-slave connection.
-
Installing PVM
(This answer is adapted from the Fred 2.2 manual.)
PVM or parallel virtual machine is a freely available library for running
processes on more than one processor on one or more machines. OE applications
can take advantage of PVM to distribute docking jobs over multiple processors.
To do this PVM must be installed on all the machines OE applications will be
distributed over. The PVM source is freely available from
http://www.csm.ornl.gov/pvm/pvm_home.html
however many Linux distributions, and some Unix versions, include PVM by default.
At the time of this writing (December 2006), OE applications are built with the
PVM version 3.4.4, but should also work with PVM version 3.4.3. PVM is not
supported for Windows.
To use OE applications with PVM you must do one of the following:
-
Place a link or copy of the OE application executable in
$PVM_ROOT/bin/$PVM_ARCH
-
Define the enviroment variable
PVM_PATH, which names the
directory in which the OE application executable resides.
The environment variables PVM_ROOT and PVM_ARCH
should be defined globally as part of the PVM installation. PVM_PATH
is generally a user defined environment variable, and must defined for all
shells (i.e., it may not be defined only in the shell from which OE application
was launched).
NOTE : There is no specific slave executable. The executable distributed for OE
applications serves as both a master and slave PVM program as well as a single
processor version.
OE applications currently PVM-enabled:
-
eon
-
fred
-
omega
-
rocs
-
szybki
QuacPac
Chemistry
-
Anilinic nitrogen conformation: a notorious exception.
The handling of anilinic nitrogen, trivalent, three-connected and single bonded
to an aromatic ring, is problematic, since it is neither fully sp3 nor sp2
hybridized, not planar with respect to its three attached atoms but not
tetrahedral. This assertion is borne out to some degree by x-ray
crystallographic data though crystal structures will reflect averaging of
resonant chiralities. Since two R/S chiralities are possible, the planar
conformation may be used and useful as an "average". This is the basis
of the MMFF94S variant of MMFF94 (Halgren et al.). Since a bound ligand will
tend to adopt one conformation this approach makes less sense for reproducing a
bioactive conformation. OpenEye software, in general, will reflect this
understanding of anilinic nitrogen, but may also allow for the planar model (e.g.,
MMFF94 and MMFF94S are both available).
REF: "New Parameterization of the Cornell et al. Empirical Force Field Covering
Amino Group Nonplanarity in Nucleic Acid Bases", Ryjacek, Kubar and Hobza, J.
Comp. Chem. 24: 1891-1901, 2003.
|
|