cspy-db command
The cspy-db command is used to process and analyse the SQLite3 database files that a produced from a crystal structure prediction simulation.
cspy-db <command> [<args>]
Available commands are:
prune Remove duplicate structures from databases
cluster synonymous with 'prune'
dump Extract data from databases into other formats
positional arguments
command- sub command to run
options
Finding redundant structures via PXRD comparison
After a CSP calculation, you will often (almost always) find the same
structure multiple times. These redundant structures may be found by
using the cluster subprogram in cspy-db:
cspy-db cluster *.db
This will find redundant structures within all the database files,
combine the unique structures into a new database file (defaulting to
output.db), then find unique structures within the combined file.
Finding redundant structures with COMPACK
Note
This section is only relevant to those which have a license for the CSD Python API and followed the installation instructions in CSD Python API.
In addition to PXRD clustering, we are able to perform clustering with the COMPACK algorithm on a csp database. This uses the CSD Python API and requires a conda environment which combines the CSD Python API and mol-CSPy environments into one.
Clustering crystal structures
To cluster a database using COMPACK, the following command can be used:
cspy-db cluster input.db -m compack
Other optional flags include:
-cdt CLUSTER_DENSITY_THRESHOLD, --cluster-density-threshold CLUSTER_DENSITY_THRESHOLD
The density threshold used in clustering, within which
structures are considered the same.
-cet CLUSTER_ENERGY_THRESHOLD, --cluster-energy-threshold CLUSTER_ENERGY_THRESHOLD
The energy threshold used in clustering, within which
structures are considered the same.
-j JOBS, --jobs JOBS Number of parallel processes/threads to use for
xrd/compack clustering
-rms CLUSTER_RMS_THRESHOLD, --cluster-rms-threshold CLUSTER_RMS_THRESHOLD
RMS difference threshold used in compack clustering.
There are a number of COMPACK search settings that can be tweaked. The
defaults of these are recorded in the configuration.py file.
Alternatively, user defined values can be read from a cspy.toml
file. See below for an example:
[compack]
angle_tolerance = 30
distance_tolerance = 0.3
packing_shell_size = 60
ignore_hydrogen_counts = true
ignore_hydrogen_positions = true
Searching the database for a match
To search through a database or series of databases and compare to a given structure, the following command should be employed:
cspy-db cluster input.db -m compack --compack_exp_str NAME_OF_STRUCTURE
Where --compack_exp_str NAME_OF_STRUCTURE is the filename containing the comparison crystal structure (typically an experimental SCXRD structure).
Alternatively, the user may specify the CSD reference code (The user should be aware that some CSD structures may contain disordered atoms or solvent molecules that
will affect the overlay comparison).
Extracting structure information from a database
If you’d prefer to work with a csv file, you can dump out the data about unique structures by using the
dump subprogram in cspy-db:
cspy-db dump output.db # only unique structures
cspy-db dump output.db --include-duplicates # all structures in the database
This will result in a data table being written to structures.csv, and an archive of SHELX res files being written to structures.zip.
By default this will only export unique structures.
If you want to dump structures within a specificed energy range from the global minimium structure, this can be done with the -e flag:
cspy-db dump output.db -e 7 # only unique structures 7 kJ mol^-1 from the global minimum