External Files

Input Files

Typically, a user will find it most convenient to generate all BROOD input files useing the graphical interface. Even if one want to run BROOD on a cluster or other machine where it isn’t convenient to use the graphical interface, the graphical interface can be used to generate all necessary input files and then BROOD can be executed on a different machine after moving the input files. BROOD requires one input file, a query fragment file. With a default installation, the BROOD application can identify and search the default fragment database. Typically,

Query Files

There are two parts to a BROOD query. The first is the molecule that the user would like to modify. The second is a specification of the portion of the molecule (fragment of the molecule) that the user would like BROOD to change. Formally, only the fragment is required for a BROOD search, but in most cases the results are much more interesting when a molecule is provided as this allows BROOD to build analog molecules rather than simply provide a list of similar fragments.

The queryFragment file is the molecule file that contains the fragment to be replaced in the original molecule. This query fragment may be specified either with or without 3D coordinates for user convenience. If no 3D coordinates are specified, then 3D coordinates will either be taken from the -queryMolecule molecule (if one with 3D coordinates is specified), or generated using a low energy MMFF conformer. If one doesn’t want to use the BROOD graphical interface, the recommended method for creating a query is to sketch the appropriate fragment (with attachment points) into an available sketching program, then save that file for use as a BROOD query file. For those familiar with SMILES format, writing queries in SMILES can be easy and convenient.

Fragment Database

As mentioned in the theory section, BROOD comes with a pregenerated, multiconformer fragment database. This database is made of from fragments if known molecules that contian 1-15 heavy atoms and 1-3 attachment sites.

The default database contains over 6 milltion fragments selected from 65 million unique fragments generated by fragmenting over 20 million known molecules. This database takes only a few minutes to search using modern desktop computers. It is suitable for many preliminary enquiries but 30,000 fragments excludes many mediciannly interesting fragments.

BROOD can also search user-generated fragment files. Details on how to create a custom database are included in a later chapter (vida infra).

Example Fragment Files

The input query or a database fragment for bioisostere searching must be a molecular fragment with one or more “attachment points”. An attachment point represents a bond from the fragment to the rest of the molecule. Inside OEChem, this is represented as a bond to an atom with atomic number 0 (termed a “Dummy Atom”). If it is necessary to uniquely distinguish these dummy atoms, Map Indicies can be used via the OEAtomBase::SetMapIdx() and OEAtomBase::GetMapIdx() api.

The input query does not need to have 3D coordinates (they will be generated on-the-fly). This makes SMILES and 2D SDF file formats the most convenient for query fragment input. On the other hand, the fragment database should contain 3D coordinates with multiple conformers per fragment. In the examples below, a simple amide fragment with generic (amide) and uniquely labelled (amideR) dummy atoms are shown.

SMILES format:

*C(=O)N* amide
[*:1]C(=O)N[*:2] amideR

SDF format:

amide
  -OEChem-01040614232D

  5  4  0     0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 *   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 *   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  2  0  0  0  0
  2  4  1  0  0  0  0
  4  5  1  0  0  0  0
M  END
$$$$
amideR
  -OEChem-01040614232D

  5  4  0     0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 R#  0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 R#  0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  2  0  0  0  0
  2  4  1  0  0  0  0
  4  5  1  0  0  0  0
M  RGP  2   1   1   5   2
M  END
$$$$

Output Files

BROOD generates four informational output files in addition to the hitlist. All of these files begin with the prefix specified by the -prefix flag, that by default is “brood”. These files include:

  • bood.info, the info file
  • brood.log, the log file
  • brood.param, the param file
  • brood.rpt, the report file
  • brood.txt, the spreadsheet file

By default, there is one hitlist and the content of the hitlist will be determined by the command flags.

  • brood.hitlist.oeb.gz

If the -ET flag is used, the hitlist will include molecules electrostatically similar to the query.

If the -queryMolecule flag is used, the hitlist will be full of analogs of the query molecule, while if now query molecule is specified, the hitlist will be full of fragments similar to the query fragment sepcified by -queryFrag.

Info File

By default, an information file titled brood.info is written or updated for every 500 fragments that are processed. Reading this info file allows a user to monitor the progress of the database search while it is occuring and it serves as a record of the performance of that search after the execution is completed. An example info file is shown below.

*********** Progress Update ***********
Percent of Database Processed    = 100%
Total packets read               = 8
Packets suitable for Processing  = 1
Number of Fragments Overlayed    = 1
Number of Fragments Eliminated   = 0
Number of Fragments Processed    = 1
Number in color hitlist          = 1
Number of Warnings               = 2
Number of Errors                 = 0
Processed fragments/sec          = 0
Elapsed Time (sec)               = 7
***************************************

If any warnings or errors are noted in the info file, it is strongly suggested that a user check in the log file to determine the nature of the problem!

Param File

BROOD’s command-line interface can be efficiently run using the -param command line parameter followed by the name of a parameter file. Param files are files that contain one command-line parameter on each line. Every execution of BROOD generates a .param file called brood.param. This file contains all of the parameters used by BROOD. Further, this file can be used in subsequent runs with the -param flag either with or without user modifications.

The .param file is particularly useful if you want to use the graphical interface to set up a job, but execute the job on another machine (such as a cluster). The .param file can be moved to a different machine along with the -queryMolecule file it specifies and it can be used to execute exatly the same run as would have been executed from within the graphical interface.

The following execution will generate a .param file called brood.param which is listed below. This file could be edited or used “as-is” with subsequent runs.

prompt> brood -query amide.smi -db drug.fragments.oeb.gz

Listing of brood.param file contents:

#Interface settings

#Execute Options :
  #-param (Not set, no default)
  #-chunk (Not set, no default)

#Brood :

  #Input :
      -queryFrag  q.ism
      #-queryMol (Not set, no default)
      -db  foo
      #-prot (Not set, no default)
      #-cpddb  none
      #-param (Not set, no default)

  #Output :
      #-prefix  brood
      #-dots  true
      #-log (Not set, no default)
      #-info (Not set, no default)
      #-report (Not set, no default)
      #-format  oeb.gz
      #-idea  true

  #Control parameters :
      #-ET  false
      #-txt  true
      #-linkOnly  false
      #-quickLook  false
      #-maxfrag  0
      #-ringOnly  true
      #-sdtag  verbose
      #-bondOrder  true
      #-checkBond  true
      #-attachmentCutoff  0.780000
      #-shapeCutoff  0.600000
      #-fromCT  false
      #-fileChrg  false
      #-maxHit  1000
      #-title (Not set, no default)
      #-interval  1000000
      #-hitinterval  2000000
      #-attachmentScale  1.500000
      #-cff (Not set, no default)

  #Property Filters :
      #-property  true
      #-molWt  75.000000
      #-logp  2.000000
      #-psa  60
      #-rotbond  3
      #-hvyAtom  4
      #-LipinskiDon  3
      #-LipinskiAcc  5
prompt> brood -param brood.param

This would result in exactly the same execution as the one which generated the file above.

Log File

The log file contains all of the critical information about the execution in one place. It begins with a copy of the param file and it finishes with the final info file output. In between, it contains all the warnings and errors that might have occured during execution. The log file gives a user a single place they can check to determine what job was run, if it executed properly and how long it took.

Report File

The report file contains a detailed listing of the similarity scores for every molecule in the database. By default, the file is titled brood.rpt and contains 1 line for the column titles and 1 line for each molecule in the database. The report file contains columns for

  • database fragment SMILES
  • database fragment title
  • query SMILES
  • number of attachment bonds
  • structural rms value
  • attachment score
  • shape Tanimoto
  • color Tanimoto
  • combo score (shape + color)
  • et attachment score
  • et shape Tanimoto
  • electrostatic Tanimoto
  • et combo score (shape + et)
  • a comment regarding the disposition of the fragment (if and why it failed to be scored)

If a particular score was not calculated for any given fragment, “-” will be found in the report file under the corresponding column. Since all the data for a fragment is contained in a single line, there is only one report file regardless of the number or type of the number or type of hitlists generated. This file is in tab separated format and can easily be imported into a spreadsheet program for further analysis of the results.

Hitlist File

As discussed above, a hitlist is generated for each execution of BROOD. All of the hitlists are written in the file format specified by -format, which defaults to gzipped OEB format .oeb.gz. If the -queryMolecule flag is specified, the hitlist will contain complete molecules, otherwise it will contain fragments. The first entry in the hitlist is the query molecule or query fragment. Each subsequent analog molecule (fragment) in the hitlist is oriented in the optimum overlay on the query molecule (fragment). For the “struc” hitlist, this overlay is the optimum overlay of the attachment point atoms. In addition, by default, the similarity scores and physical property data for each molecule (fragment) are attached as SDTags (OEB and SD format only). In the case when the queryMolecule is not specified, but the -ET flag is set, in the hitlist, the attachment vectors of the fragments are replaced by methyl groups. This facilitates easy calculation of electrostatic potentials in data visualization programs such as VIDA.

By default, the hitlist files are written periodically while the search is being carried out (see -hitinterval). This allows a user to examine results at intermediate stages without waiting for the entire search to complete.

If the -queryMolecule flag specified a 2D input molecule, then all of the analog molecules in the hitlist will likewise be in 2D format, though the fragment similarity search will have been carried out in 3D. When using the graphical interface with a 2D input molecule, a 3D molecule is generated prior to processing and the output format will always be 3D.