Subsections

4.3 chunker

This is a simple commandline utility to take an input database file and divide it into similar-sized smaller pieces. Then each piece can be used as a dbase file in a separate ROCS run. This divide- -and-conquer approach is an alternative way to run a single dbase over multiple CPU's but without the use of PVM.

The previous version of chunker only operated on .bin files. This new version will chunk any of the ROCS molecule file formats (except .lst and .rocsdb). All output files have exactly the same format as the input file, with the exception of old OEBinary (.bin) which is converted to OEBinary v2 (.oeb).

4.3.1 Command Line Parameters

-in
Name of input file.

-base
Base name of output files. Output files will be sequentially numbered.

-nchunks N
Create N new files of equal number of molecules. Chunker will read through the entire file once to count the number of molecules, then will create the new files.

-chunksize M
Create new files, each containing M molecules.

Note: Only one of -nchunks or -chunksize can be used.

4.3.2 Examples

To break input.oeb.gz into 5 chunks, each with the same number of molecules.

prompt> chunker -in input.oeb.gz -base bar -nchunks 5

would create

bar0000001.oeb.gz
bar0000002.oeb.gz
bar0000003.oeb.gz
bar0000004.oeb.gz
bar0000005.oeb.gz

To break input.oeb.gz into chunks, each with 1000 multi-conformer molecules:

prompt> chunker -in input.oeb.gz -base foo -chunksize 1000