Usage

Getting Started

If you are just getting started with brood, we highly recommend you use the graphical interface. Even if you prefer to use the command-line, walking through the graphical interface one time will give you a good overview of the workflow involved in running BROOD.

The basic idea behind BROOD is that you have a lead molecule and would like to generate analogs with somewhat different properties by changing a portion of the molecules. Normally, you will specify both the lead molecule (with the -queryMolecule flag) and the fragment of the molecule that you would like to modify (with the -queryFrag flag).

Brood will search the default database of fragments for fragments with similar shape and electrostatics to your query fragment, replace them in your query molecule and generate a hitlist of analog molecules. To carry out a search like this, use the command-line:

prompt> brood -queryMol myMol.sdf -queryFrag myFrag.sdf

will search the default database for fragments similar to myFrag.sdf, replace them in the molecule found in myMol.sdf and write a list of analogs into the default output file brood.oeb. The input molecule can be in any standard file format, however, it is easiest to specify the fragment in SDF or SMILES format. For information on query files, see the section Query Files. Example SDF and SMILES files can be found in Example Fragment Files.

The output file brood.oeb can be viewed in VIDA. We recommend you use the VIDA extension that ships with the BROOD application. This extension allows you to simultaneously view the 3D structure of the analogs generated by BROOD and various of their physical properties. Further, it aids in the exploration of the clusters in the BROOD hitlist. If that is not possible, you should examine the SD data attached to each molecule in whatever viewer you deem best. In addition, a .txt file of all the molecular data is written that can be imported into most spreadsheet programs for detailed examination.

Command Line Interface

Help

Executing BROOD with no arguments will result in:

prompt> brood
Brood 2.0.0, 20091231
  OEChem version 1.8.0, 20091231
  Platform: centos-5.4-i586-x64
  OpenEye Scientific Software, Inc.

Supported Run Modes:
  Single processor
  MPI Multiprocessor

No argument specified on the command line
Required parameters:
    -queryFrag : Input query filename
For more help type:
  brood --help

A description of the command line interface can be obtained by executing BROOD with the --help option.

prompt> brood --help

will generate the following output:

Help functions:
  brood --help simple      : Get a list of simple parameters
  brood --help all         : Get a complete list of parameters
  brood --help defaults    : List the defaults for all parameters
  brood --help <parameter> : Get detailed help on a parameter
  brood --help html        : Create an html help file for this program

The defaults for each command-line parameter can be examined with the --defaults flag.

Simple Help

If you desire to see the most important command-line options use --help simple.

prompt> brood --help simple

will generate the following output:

Brood 2.0.0, 20100105
  OEChem version 1.8.0, 20091211
  Platform: microsoft-win32-msvc9-MD-x86
  OpenEye Scientific Software, Inc.

  Supported Run Modes:
         Single processor
         MPI Multiprocessor
Simple parameter list
    Execute Options
      -param : A parameter file

    Brood
      Input
        -queryMol : Query molecule for building analogs
        -queryFrag : Query fragment to use as search template
        -db : Fragment database to search

      Control parameters
        -quickLook : Do a brief search and return a quick set of results
        -ringOnly : Only select fragments with a ring in the attachment path

      Property Filters
        -property : Filter fragments by property

Complete Help

If you desire to see all of the command-line options use --help all.

prompt> brood --help all

will generate the following output:

Brood 2.0.0, 20100116
  OEChem version 1.8.0, 20091229
  Platform: microsoft-win32-msvc9-MD-x86
  OpenEye Scientific Software, Inc.

  Supported Run Modes:

         Single processor

         MPI Multiprocessor



This executable supports single processor execution
Complete parameter list
    Execute Options
      -param : A parameter file
      -chunk : Number of input chunks to be created

    Brood
      Input
        -queryFrag : Query fragment to use as search template (required)
        -queryMol : Query molecule for building analogs (not required)
        -db : Fragment database to search
        -prot : Macro molecule for bump-check of fragments and build analogs
        -cpddb : Database of known compounds to use for synthetic reference
        -param : Control parameter file

      Output
        -prefix : Prefix for generic output files
        -dots : Write a dot to the terminal for every 500 cpds processed
        -log : Write to specified log file (override -prefix)
        -info : Write to specified info file (override -prefix)
        -report : Write complete output in table form (override -prefix)
        -format : Molecular output format
        -txt : Generate tab separated hitlist for reading into spreadsheets
        -idea : Generate cluster information for hitlists

      Control parameters
        -quickLook : Do a brief search and return a quick set of results
        -ringOnly : Only select fragments with ring in attachment path
        -ET : Generate electrostatic Tanimoto hitlist.
        -linkOnly : Identify linkers that mimic geometry attachment ONLY
                    (Caveat-like).
        -sdtag : Add bioisostere scores as SD Tags (SDF and OEB only)
        -checkBond : Check for medicinally acceptable attachment bonds
        -maxHit : Size of hitlists (1-5000)
        -title : Add scores to molecule title with this delimeter

      Advanced parameters
        -bondOrder : Require same attachment bond order
        -attachmentCutoff : Minimum acceptable attachment point tanimoto
        -shapeCutoff : Minimum acceptable shape tanimoto
        -attachmentScale : Scale factor weighting the importance of attachment
        -fromCT : Generate query conformer from the connection table.
        -fileChrg : Take partial charges from the input molecule.
        -interval : Update info file every N molecules
        -hitinterval : Write intermediate hitlist files every N molecules
                           points
        -maxfrag : Maximum number of fragments to search

      Property Selection
        -property : Filter fragments by property
        -molWt : Molecular weight less than current +/- value
        -logp : LogP less than current +/- value
        -psa : PSA less than current +/- value
        -rotbond : Rotatable bond less than current +/- value
        -hvyAtom : Heavy atom less than current +/- value
        -LipinskiDon : Heavy atom less than current +/- value
        -LipinskiAcc : Heavy atom less than current +/- value

Required Parameters

-queryMolecule (-queryMol)
Molecular file containing the primary molecule of interest. This parameter was formerly known as the -buildMolecule. This is the molecule for which you would like to generate analogs. If the query molecule has a query fragment already specified from the graphical interface, this is the only necessary parameter.
-queryFragment (-queryFrag)
File containing the molecular connection table for the fragment to be replaced in the query molecule. If you use the graphical interface to generate a query, the queryFragment will be specified directly on the -queryMolecule, and this parameter is no longer required. It is possible to carry out a search of a fragment apart from a primary molecule. In this case, this is the only required parameter.

Optional Parameters

Input Parameters

-db (-database)
Specified the directory containing the BROOD database files. Additional details on the files contained in the directory can be found in the section on the Fragment Database as well as in the section on Database Preparation
-prot (-protein)
Specifies and optional protein file. If a protein is given, after fragments and analogs are overlayed on the query fragment or molecule, they will be checked for clashes with the protein. Any analogs with clashes will be eliminated. Thus if you have a co-crystal structure of your ligand of interest, this flag will allow BROOD to only identify analogs that can fit in the active site.
-param
This flag specifies a parameter file that contains all of the command-line parameters in a simple text file. The parameter file is automatically written to “-prefix.param” with every execution. It is a record of the input that was used and can be used to rerun the exact same process. It can also be altered by hand to modify a prior execution. If a parameter is set in the param file and on the command line, the command line setting takes precedence. More information is available in the section on the parameter files below.

Output Parameters

-prefix
This string flag determines the prefix of the info, log, report, param and output files. For instance, if -prefix is set to foo, then the output files will include foo.info, foo.log, foo.report, and foo.param. [default = brood]
-dots
When this flag is set, the program will write a single dot (.) to the terminal (stdout/cout) for every 500 fragments that are processed. [default = false]
-log
This flag can be used to override the -prefix flag to specify the filename of the log file. The log file contains general information concerning the program’s execution including a duplicate of the splash screen, all the parameters used in the execution, all the warnings and errors generated during the execution and a summary of the run.
-info
This flag can be used to override the -prefix flag to specify the filename of the info file. The info file contains running totals of the progress of the run. Examining the info file is the best means of checking on the progress of an execution.
-report
This flag can be used to override the -prefix flag to specify the filename of the report file. This file contins a 1-line per molecule encapsulation of the entire calculation. This data file is ready for import into a spreadsheet program for easy examination. The table contains complete entries for all of the molecules in the input file regardless of whether or not they are in a hitlist.
-txt
If set, this parameter causes BROOD to write a .txt file containing a tab separated hitlist, with one line per analog molecule. Each line of the file will contain the molecule specified with SMILES and all of the associated scores and physical property data. This file is designed to be easily imported into a number of spreadsheets for detail examination, particularly if you are not able to use the spreadsheet built into VIDA.
-format
This flag determines the file-format (vida infra) of the molecular output of the hitlist. The preceeding “.” is optional (e.g. both “.sdf” and “sdf” will work. [default = oeb.gz]
-idea
This flag determines if the hitlist will be organized according to a reduced graph hierarchy. This is quite useful for clustering similar analogs and thus allowing a user to quickly scan the different analog families and drill-down into the most interesting clusters without needing to examine hundreds of analogs.

Control Parameters

-quickLook (-quick)
This integer flag specifies the number of seconds the search will use before it returns. BROOD is designed to identify as many interesting hits as soon as possible. Thus while a thorough search may take minutes to hours, preliminary results can be generated in one or two minutes with the -quickLook 90 command.
-ringOnly (-ring)
If this flag is set, all the fragments placed into the hitlist will contain ring system. As discussed in the theory section, this is useful for identify more rigid bioisosteric fragments. [default = true]
-ET (-et)
This boolean flag determines whether or not the electrostatic Tanimoto similarity is calculated. [default = false]
-linkOnly (-linkonly, -struc, -struct)
If this flag is set, then the shape and chemistry of the fragment are ignored and ONLY the attachment point geometry (and constraints) are used to identify similar fragments. This is similar to Bartlett’s Caveat algorithm, one of the first algorithms in this genre. This type of search can be useful when trying to bridge one or more fragments without any prior knowledge, such as when linking two fragments in fragment-based design.
-sdtag
This flag indicates whether the scoring information will be attached to output molecules as SD tag data. The possible values are “false”, which indicates that no SD data will be attached, and “verbose”, which indicates that all of the sub-scores will be attached. For further details on the scoring labels, please see the section on the Report File (vide infra). This parameter will only work for .sdf or .oeb file formats. [defualt = verbose]
-checkBond (-checkbond)
If this flag is set to true, BROOD will check all bonds formed between new fragments and the rest of the query molecule and exclude fragments that form unusual bond types. [defualt = true]
-maxHit (-maxhit)
This flag determines the number of compounds saved in the hitlist. Allowable range is 1-5000. [default = 1000]
-title
When this string parameter is set, the score of each fragment in each hitlist will be appended to the title of the molecule. The parameter value is the delimeter between the original title and the addendum.

Advanced Parameters

-bondOrder (-bo)
When this boolean flag is set, only fragments with the same attachment bond orders as the query fragment are allowed. [default = true]
-attachmentCutoff (-attachcut)
Minimum acceptable attachment point cutoff. The cutoff is for the shape-overlap score of the beginning and ending atom of each attachment bond. The default was chosen empirically to assure all fragments in the hitlist whould have sensible attachment geometries. This value does not effect runs when the -linkOnly flag is set. [default = 0.78]
-shapeCutoff
Indicates the minimum required shape Tanimoto score required for a fragment to appear in the color, elect or queryAnalog hitlists. This cutoff is useful for cases when few shape-similar fragments exist in the database being searched. [default = 0.6]
-attachmentScale (-attach, -aScale)
This floating point value determines the balance between the chemical color score and the attachment point scores. Higher values indicate more weighting for the attachment-point alignment. This parameter has complex effects, please use it with care. [default = 1.5]
-fromCT
This flag indicates that the 3D conformer of the query molecule should be generated from the molecular graph, independent of the conformer in the input file. If this flag is false, BROOD will attempt to read the query molecule as a 3D structure. If the input format is 2D in nature, BROOD will generate a 3D conformer for the query molecule regardless of this flag (i.e. a user can specify the query without a 3D structure). Please note that the database file should always contain 3D structures. [default = false]
-fileChrg
When this flag is true, the partial charges for electrostatic Tanimoto calculations are taken from the input files. If this flag is false, or the input file does not contain partial charges, MMFF charges will be used. [default = false]
-interval
This is the interval at which data is written to the info file. The info file contains running totals of the progress of the run. Examining the info file is the best means of checking on the progress of an execution. If this flag is 50, then the info file is re-written every 50 molecules. Early in searches, the file may be written more frequently. [default = 5000]
-hitinterval
This integer flag indicates the interval at which intermediate copies of the hitlists should be written to disk. This allows a user to examine preliminary results while the database search is still executing. If the value is set to 0, the intermediate files will not be written. Early in searches, the file may be written more frequently. [default = 3000]

Property selection

-property (-prop)
This boolean flag indicates whether ANY of the molecular properties should be used to eliminate compounds from consideration. While these filters can be useful, if a user expects to see a particular fragment or analog and it does not appear in the hitlist, it will often have been eliminated by the property filters. The Report File will indicate which if any property filter effected each fragment. If set to false, all of the related flags below become irrelevant. [default = true]
-molWt (-molecularweight)
This flag indicates the maximum molecular weight of any analog as an offset from the query molecule (or query fragment). Analogs higher in molecular weight than this cutoff will be eliminated from consideration. A value of 100 would indicate a new analog could weigh up to 100 amu more than the query molecule, while a value of -100 would indicate any new analog would have to weight at least 100 amu less than the query molecule. [default = 75]
-logp (-logP, -LogP)
This flag indicates the maximum logP of any analog as an offset from the query molecule (or query fragment). Analogs higher in logP than this cutoff will be eliminated from consideration. A value of 2.5 would indicate a new analog could weigh up to 2.5 more in logP than the query molecule, while a value of -1.5 would indicate any new analog would have to have a logP at least 1.5 log units less than the query molecule [Wang-1997]. [default = 2.0]
-psa (-tpsa, -tPSA)
This flag indicates the maximum polar surface area (PSA) of any analog as an offset from the query molecule (or query fragment). Analogs higher in PSA this cutoff will be eliminated from consideration. A value of 60 would indicate a new analog could have a PSA 60 more than the query molecule, while a value of -40 would indicate any new analog would have to have a PSA of at least 40 less than the query molecule [Clark-1999] [Ertl-2000]. [default = 60]
-rotbond (-RotBond, -RotatableBond)
This flag indicates the maximum number of rotatable bonds of any analog as an offset from the query molecule (or query fragment). Analogs with a higher number of rotatable bonds than this cutoff will be eliminated from consideration. A value of 2 would indicate a new analog could have up to 2 more rotatable bonds than the query molecule, while a value of -3 would indicate any new analog would need to have at least 3 fewer rotatable bonds than the query molecule. [default = 3]
-hvyAtom (-heavyAtom, -HeavyAtom, -hvyatm)
This flag indicates the of heavy atoms of any analog as an offset from the query molecule (or query fragment). Analogs with more heavy atoms than this cutoff will be eliminated from consideration. A value of 4 would indicate a new analog could have up to 4 heavy atoms more than the query molecule, while a value of -2 would indicate any new analog would have to weight at least 2 fewer heavy atoms than the query molecule. [default = 4]
-LipinskiDon (-LipDon, -donor, -donors, -Donors)
This flag indicates the maximum number of h-bond donors of any analog as an offset from the query molecule (or query fragment). Analogs with a higher number of h-bond donors than this cutoff will be eliminated from consideration. A value of 2 would indicate a new analog could have up to 2 more h-bond donors than the query molecule, while a value of -3 would indicate any new analog would need to have at least 3 fewer h-bond donors than the query molecule. For the purpose of this measure, h-bond donors are determined by the method of Lipinski [Lipinski-1997]. [default = 3]
-LipinskiAcc (-LipAcc, -acceptor, -acceptors, -Acceptors)
This flag indicates the maximum number of h-bond acceptors of any analog as an offset from the query molecule (or query fragment). Analogs with a higher number of h-bond acceptors than this cutoff will be eliminated from consideration. A value of 2 would indicate a new analog could have up to 2 more h-bond acceptors than the query molecule, while a value of -3 would indicate any new analog would need to have at least 3 fewer h-bond acceptors than the query molecule. For the purpose of this measure, h-bond acceptors are determined by the method of Lipinski [Lipinski-1997]. [default = 5]

Pharmacokinetic selection

-lipinski (-Lipinski)
This integer flag indicates the number of allowed violations of Lipinski’s rules [Lipinski-1997]. In Lipinski’s work, in order to segregate molecules that progressed through clinical trials, he determined that one violation was acceptible, but two were not. Any negative value will turn off this filter. [default = 1]
-martin (-Marin)
This floating point parameter (range 0.0-1.0) indicates the minimum allowable probability that F will be >10% in rats according the the QSAR model developed and published by Yvonne Martin [Martin-2005]. A value of 0.0 will allow all compounds to pass. [default = 0.5]
-eganEgg (-egan, -Egan, -pharmacopia)
This boolean parameter determines whether analog compounds will be required to fulfill the “Egan egg” measure of bioavailability. This measure was published by Bill Egan while at Pharmacopia [Egan-2000], and rejects compounds with a LogP > 5.88 or a PSA > 131.6. [default = true]
-veber (-Veber, -gsk, -GSK)
This boolean parameter determines whether analog compounds will be required to fulfill the measure of bioavailability Veber published at GSK [Veber-2002]. His measure of bioavailability eliminates compounds with a PSA > 140 or more than 10 rotatable bonds. [default = false]

Example Executions

This section has a series of example BROOD command-line executions. Each example is followed by a brief description of its behavior.

If you would like to execute the following examples as written, the appropriate paths to the executable file and the database file must be included. In addition, the file amide.smi will need to be in the working directory. This can be accomplished with the following command:

prompt> echo "*C(=O)NC*" >> amide.smi

This file can now be used as the query for each case below.

prompt> brood amide.smi
prompt> brood -queryFrag amide.smi -db brood.v200.db

These two commands will yield identical results. They execute BROOD with the default parameters. The file amide.smi is opened in SMILES format as the query, and the database brood.v200.db is read in database format. The default hitlist will be written to brood.hitlist.oeb.gz, using the default -prefix argument “brood”. Similarly, the informational output files brood.info, brood.log, brood.param and brood.rpt will also be written.

prompt> brood -queryFrag amide.smi -db brood.v200.db -prefix 4dfr

This command is the same as the previous except that the prefix to all of the output files has been changed from “brood” to “4dfr” (for example, the log file will be written to 4dfr.log rather than brood.log.

prompt> brood -queryFrag amide.smi -db brood.v200.db -prefix 4dfr -report myRpt

This executes BROOD as above, however, the -report argument over-rides the -prefix argument and the report file is written to the file “myRpt” rather than the file “4dfr.rpt“.

prompt> brood -param 4dfr.param

This execution of BROOD will read all the command-line arguments from the file “4dfr.param“. Every time BROOD is executed, a param file is generated that can be used to exactly reporduce the run (vida infra).

prompt> brood -param 4dfr.param -maxHit 2500

The first of these command-lines will execute BROOD with the parameters from “4dfr.param“, but the -maxHit parameter will be overridden to a value of 2500. This indicates that 2500 compounds will be stored in each of the four hitlists.

prompt> brood -queryFrag amide.smi -db brood.v200.db -format sdf -sdtag

This execution will be as before, except now the hitlist file will be written in .sdf format. Further, the score of each of the ligands in the file will be attached to the molecule as SD Tag data. Note: the default for the -sdtag is verbose, so writing this information is on by default. In addition, the default format is .oeb, which also handles SD Tag data properly.

prompt> brood -queryFrag amide.smi -db brood.v200.db -dots

The execution will be as before except a single ‘.’ will be written to the screen periodically as database fragments are processed. This gives an easy visual measure of the progress of the execution.

prompt> brood -queryFrag amide.smi -db brood.v200.db -queryMol mol.smi

The -queryMol flag indicates that the query fragment in amide.smi will be located in the molecule specified by mol.smi. Each of the similar fragments in the hitlist will be used to replace that fragment in the whole molecule to generate a new analog molecules. Since the molecule specified with the -queryMol flag in this instance is 2D, all of the final molecules will be generated with 2D coordinates.

prompt> brood -queryFrag amide.smi -db brood.v200.db -queryMol mol.mol2

This command line acts similarly to the one above. In this case the molecule specified by the -queryMol parameter has 3D coordinates. This causes the query’s 3D coordinates to be copied out of the -queryMol molecule (rather than being generated as a minimized MMFF structure). Further, the constructed analog molecules will be generated with minimal perturbation to the 3D geometry of the -queryMol molecule.

prompt> brood -queryFrag amide.sdf -db brood.v200.db -queryMol mol.smi

Again, this execution is similar to the one above. In this case, the query fragment has 3D structure but the -queryMol molecule does not. The 3D query will be carried out with the coordinates specified in the amide.sdf file, but the built molecules will all be generated with only 2D coordinates.

prompt> brood -queryFrag amide.sdf -db brood.v200.db -queryMol mol.mol2

In this example of a -queryMol execution, both the query fragment and the -queryMol molecule have 3D coordinates. Here, the input 3D coordinates of the query fragment are discarded and the 3D coordinates that the query fragment has inside the -queryMol molecule are used to both carry out the search and to build the analog molecules.

prompt> brood -queryFrag amide.sdf -db brood.v200.db -queryMol mol.mol2 -prot spf3.pdb

In this final example of a -queryMol execution, both the query fragment and the -queryMol molecule have 3D coordinates as above. BROOD reads the protein from the spf3.pdb. Each analog of the query molecule is built and then tested for clashes with the protein. Analogs that clash are eliminated from the hitlist.

Molecular File Formats

BROOD can read and write a variety of molecular file formats. The file format is automatically interpreted from the filename suffix.

File Type Extension
SMILES .smi .ism .can .smi.gz .ism.gz .can.gz
SDF .sdf .mol .sdf.gz .mol.gz
SKC .skc .skc.gz
CDK .cdk .cdk.gz
MOL2 .mol2 .mol2.gz
PDB .pdb .ent .pdb.gz .ent.gz
MacroModel .mmod .mmod.gz
OEBinary v2 .oeb .oeb.gz
Old OEBinary .bin

Old OEBinary format can be read but not written by BROOD. Gzipped OEBinary version 2 (oeb.gz) is the recommended output format.

BROOD is capable of piping formatted input and output. The simple “-” can be used in place of a filename to indicate std::cin or std::cout with the default SMILES format.

prompt> BROOD -in - -out -

This execution will run BROOD with std::cin as the input with SMILES format. It will also open std::cout with SMILES format as output. However, the use of “-” does not allow control of the file format.

To control the file format of std::cin and std::cout one may use the file extensions without a preceeding filename.

prompt> BROOD -in .ism -out .oeb.gz

This executes BROOD with the input from std::cin formatted in isomeric SMILES and the output sent to std::cout in gzipped OEBinary version 2 format.