class oemolistream : public oemolstreambase
The OEChem oemolistream class provides a stream-like abstraction for
reading molecules from files, strings or standard in (std::cin).
The oemolistream maintains the format and flavor of molecular
reading for the stream. It also manages the conversion between
multi-conformer molecules and single conformer molecules in cases where
the molecule read into is not compatible with the file format (in the
sense of a multi-conformer file format being read into a single-conformer
molecule, or a single-conformer file format being read into a multi-conformer
molecule).
The OEChem oemolistreams are capable of uncompressing gzip files
while reading.
oemolistream() explicit oemolistream(const char *fname) explicit oemolistream(const std::string &fname) explicit oemolistream(OEPlatform::oeistream *istr, bool owned = true)
The oemolistream class supports several forms of constructor.
The form without any arguments creates a new oemolistream that
is connected to the processes standard in (std::cin). The forms
that take a single string argument (either a const char* or a
const std::string&) open the file specified by the given filename.
The final form above, that takes a OEPlatform::oeistream can be
used to create a new oemolistream from an exisiting oeistream.
The second optional argument is used to indicate whether the new
oemolistream now ``owns'' the given oeistream and is
therefore responsible for closing and destroying it when it itself
is closed and/or destroyed.
To associate a file or a stream with an oemolistream after it
has been created, see the oemolistream::open method.
bool operator >> (OEMolBase &mol) bool operator >> (OEQMolBase &mol) bool operator >> (OEMCMolBase &mol) bool operator >> (OEMol &mol) bool operator >> (OEGraphMol &mol) bool operator >> (OEQMol &mol)
Read a molecule from an input oemolstream. The molecule is read from
the input oemolstream in the file format currently associated with that
oemolstream. This method is equivalent to the OEReadMolecule
function. The return value indicates whether the read operation was
successful.
This (high-level) method automatically clears the molecule before reading,
skips empty or invalid molecules in the input stream. By default, it
automatically calls OEFindRingAtomsAndBonds and
OEAssignAromaticFlags to
assign the ``in ring'' and ``aromatic'' properties of atoms and bonds as a
convenience to the user. OEChem also contains low-level file I/O APIs that
allow finer control over the variants of molecular file formats read and
written. Access to these variants is also available via the
SetFlavor method.
void close()
Close an input oemolstream. The oemolistream::close method may be
safely called multiple times. This method is called from within the
oemolstream destructor and therefore it is not necessary to call
this explicitly under most circumstances.
unsigned int GetFlavor(unsigned int format) const
Returns the file flavor associated with the format for the input oemolstream.
The format arguments are a set of unsigned integers defined by the
OEFormat namespace. The flavors are a set of unsigned integer
bitmasks defined in the OEIFlavor namespace. A different set of
bitmasks is defined and stored for each input format. The input flavor
for any format can be set using the oemolistream::SetFlavor method.
The default flavors are automatically set by the oemolistream
constructors.
unsigned int GetFormat() const
Return the file format associated with an input oemolstream. The set of
unsigned integer values valid for this property are defined by the
OEFormat namespace. By default, when reading from standard in
(std::cin), the associated file format is OEFormat::SMILES.
The file format property of an input oemolstream may be set using the
oemolistream::SetFormat method. Note that the file format property
is also set automatically by oemolistream::open based upon the file
extension of the specified filename.
bool Getgz()
Returns whether the stream is reading from a gzip compressed oemolstream.
This value can be altered with oemolistream::Setgz function.
bool open() bool open(const char *fname) bool open(const std::string &fname)
Open a file for reading with an input oemolstream. The fname
argument specifies the filename of the file to be opened. The open with no
arguments may be used to specify that the input oemolstream should read from
standard in (std::cin). In this case the format defaults to SMILES.
If an argument is used, open sets the file format property of the input
oemolstream, based upon the extension of the given filename. If the file
extension isn't recognized, a warning is issued and the file format is set
to OEFormat::UNDEFINED. If the filename is appended with ``.gz'', the
oemolistream will decompress it on-the-fly. The filename-based file
format may be overridden by calling oemolistream::SetFormat
explicitly with the desired file format. If only a file extension is used
as the filename (``.oeb.gz''), then std::cin is opened with the
format specified by the given extensions.
bool openstring(const unsigned char *buffer, unsigned int len) bool openstring(const std::string &str)
The openstring methods of an oemolistream allow the
input molstream to read from a buffer in memory, instead of from a
file or standard in (std::cin). The buffer to be read from
is specified either directly by a pointer to the input files contents
and a length, or as an STL string, const std::string.
If the contents of the buffer have been compressed with gzip, the
Setgz method should be called before calling openstring.
Internally, the openstring methods make a copy of the
specified file contents, allowing the oemolistream to continue
to function independently of whether the original buffer is later
modified or deallocated.
bool SetConfTest(const OEConfTest &)
Sets the functor class which is used to compare incoming graphs to
determine whether they should be placed as conformers of a multi-conformer
molecule or be returned individually as single molecules. The default
conformer test never places separate graphs into a multi-conformer
molecule. There are several pre-defined OEConfTest objects, including
OEAbsoluteConfTest, OEIsomericConfTest and
OEAbsCanonicalConfTest.
bool SetFlavor(unsigned int format, unsigned int flavor)
Set the file flavor for a given format associated with this input
oemolstream. The set of unsigned integer formats are defined by the
OEFormat namespace. The set of unsigned integer bitmasks flavors
are defined by the OEIFlavor namespace. The current flavor can
be queried using the oemolistream::GetFlavor method. Each format
has its own specific flavor which must be set separately. The
oemolistream constructors set the flavors for all of the formats
to their default state.
bool SetFormat(unsigned int format)
Set the file format associated with an input oemolstream. The set of
unsigned integer values valid for this property are defined by the
OEFormat namespace. By default, when reading from standard in
(std::cin), the associated file format is OEFormat::SMILES.
The file format property of an input oemolstream may be retrieved using the
oemolistream::GetFormat method. Note that the file format property
is also set automatically by oemolistream::open based upon the file
extension of the specified filename.
bool Setgz(bool gz)
Specify that the contents of the input molstream are to be treated
as compressed by GNU gzip, and decompressed on-the-fly. Usually the
``gz'' property of an oemolistream is implied automatically
from the file extension used to open the stream for reading. The current
``gz'' property of a oemolistream can be retrieved using the
Getgz method.
void seek(oefpos_t pos}
Moves the position of the next valid read to the position indicated. This function takes account of gzip streams and molecule caching.
oefpos_t size()
Fuction returns the size of the input stream if applicable to the current stream. The return type is a portable file-system pointer type.
oefpos_t tell()
Returns the current position of the next read. This function accounts for molecular caching. Note: If you are reading an ``oeb'' file that was written as multiconformer molecules and is being read with single conformer molecules, all of the conformers are read into cache at once, and the pointer will point to the beginning of a multi-conformer molecule rather than to a conformer inside a molecule.