cspy-db command

The cspy-db command is used to process and analyse the SQLite3 database files that a produced from a crystal structure prediction simulation.

cspy-db <command> [<args>]
        
        Available commands are:
        prune       Remove duplicate structures from databases
        cluster     synonymous with 'prune'
        dump        Extract data from databases into other formats

positional arguments

options

  • -h, --help - show this help message and exit

Finding redundant structures via PXRD comparison

After a CSP calculation, you will often (almost always) find the same structure multiple times. These redundant structures may be found by using the cluster subprogram in cspy-db:

cspy-db cluster *.db

This will find redundant structures within all the database files, combine the unique structures into a new database file (defaulting to output.db), then find unique structures within the combined file.

Finding redundant structures with COMPACK

Note

This section is only relevant to those which have a license for the CSD Python API and followed the installation instructions in CSD Python API.

In addition to PXRD clustering, we are able to perform clustering with the COMPACK algorithm on a csp database. This uses the CSD Python API and requires a conda environment which combines the CSD Python API and mol-CSPy environments into one.

Clustering crystal structures

To cluster a database using COMPACK, the following command can be used:

cspy-db cluster input.db -m compack

Other optional flags include:

-cdt CLUSTER_DENSITY_THRESHOLD, --cluster-density-threshold CLUSTER_DENSITY_THRESHOLD
                      The density threshold used in clustering, within which
                      structures are considered the same.

-cet CLUSTER_ENERGY_THRESHOLD, --cluster-energy-threshold CLUSTER_ENERGY_THRESHOLD
                      The energy threshold used in clustering, within which
                      structures are considered the same.

-j JOBS, --jobs JOBS  Number of parallel processes/threads to use for
                      xrd/compack clustering

-rms CLUSTER_RMS_THRESHOLD, --cluster-rms-threshold CLUSTER_RMS_THRESHOLD
                      RMS difference threshold used in compack clustering.

There are a number of COMPACK search settings that can be tweaked. The defaults of these are recorded in the configuration.py file. Alternatively, user defined values can be read from a cspy.toml file. See below for an example:

[compack]
angle_tolerance = 30
distance_tolerance = 0.3
packing_shell_size = 60
ignore_hydrogen_counts = true
ignore_hydrogen_positions = true

Searching the database for a match

To search through a database or series of databases and compare to a given structure, the following command should be employed:

cspy-db cluster input.db -m compack --compack_exp_str NAME_OF_STRUCTURE

Where --compack_exp_str NAME_OF_STRUCTURE is the filename containing the comparison crystal structure (typically an experimental SCXRD structure). Alternatively, the user may specify the CSD reference code (The user should be aware that some CSD structures may contain disordered atoms or solvent molecules that will affect the overlay comparison).

Extracting structure information from a database

If you’d prefer to work with a csv file, you can dump out the data about unique structures by using the dump subprogram in cspy-db:

cspy-db dump output.db # only unique structures
cspy-db dump output.db --include-duplicates # all structures in the database

This will result in a data table being written to structures.csv, and an archive of SHELX res files being written to structures.zip. By default this will only export unique structures.

If you want to dump structures within a specificed energy range from the global minimium structure, this can be done with the -e flag:

cspy-db dump output.db -e 7 # only unique structures 7 kJ mol^-1 from the global minimum