cspy-clg command

The cspy-clg command can be used to generate candidate crystal structures.

cspy-clg [-h] [-s INITIAL_SEED] [-g SPACEGROUPS] [-n NUMBER_STRUCTURES] [--nudge NUDGE]
         [--adaptcell] [--asi] [--clg-chomp CLG_CHOMP | --clg-aut] [--log-level LOG_LEVEL]
         [--keep-files] [--skip-header] [--status-file STATUS_FILE]
         xyz_files [xyz_files ...]

positional arguments

  • xyz_files - Xyz files containing molecules for generation

options

  • -h, --help - show this help message and exit

  • -s INITIAL_SEED, --initial-seed INITIAL_SEED - Initial seed for sobol sampling (default: 1)

  • -g SPACEGROUPS, --spacegroups SPACEGROUPS - Spacegroup set for structure generation (default: fine10)

  • -n NUMBER_STRUCTURES, --number-structures NUMBER_STRUCTURES - Number of structures for structure generation

  • --nudge NUDGE - Nudge molecules in assymetric unit that fail QR step (default: 0)

  • --adaptcell - Adaptively optimise cell parameters

  • --asi - Allow molecules to have superimposed centroids (set to true for encapsulation)

  • --clg-chomp CLG_CHOMP - Use the chomp CLG with molecular pairs from the provided database

  • --clg-aut - Use the AUT CLG with molecular pairs. If not running a CSP, a database must exist of format: [seed]-spacegroup-AU.db

  • --log-level LOG_LEVEL - Log level (default: INFO)

  • --keep-files - Keep DMACRYS and NEIGHCRYS files which, for each structure, are stored in a new directory in the pwd.

  • --skip-header - Skip the mol-CSPy header at the start of the job.

  • --status-file STATUS_FILE - Specify output status file (default: status.txt)

Methodology Overview

cspy-clg generates crystal structures by mapping quasi-random, low-discrepancy sequences (generated by the Sobol method) on to structural parameters.

Currently, there are two methods of generating crystal structures.

Current generator

The methodology is largely in line with that used for our original generator, which was first described in the following publication. However, there are some differences to the detection and alleviation of intramolecular collisions. The summary below outlines the behaviour of the current generator.

For each crystal structure, we map a N-dimensional vector of quasi-random (QR) numbers, xi, on to the structural parameters of the crystal.

The mapping process can be outlined as follows:

  1. Each molecule is initially aligned along the axes at the origin.

  2. Each unit cell angle that is not fixed by space group symmetries is mapped by:

    \[\theta_{j} = (\frac{1}{2}\arccos{1 - 2x_{i}}) + \frac{\pi}{4}\]

    Where \(theta_{j}\) and \(x_{i}\) are the unit cell angle and QR number, respectively. The angles are now checked to be in the range (45, 135), and if not: FAILURE

  3. The angle component of volume for the unit cell:

    \[\sqrt{1 + 2 \cos(a)\cos(b)\cos(c) - \cos^2(a) - \cos^2(b) - \cos^2(c)}\]

    is checked to be > 0.5. If not, it’s considered a flat cell and a FAILURE

  4. Each molecule is subjected to a random rotation.

  5. Unit cell lengths are calculated based on space group constraints and bounds determined from projection of the rotated molecules onto the unit cell axis frame, with a specified target volume and standard deviation. If the cell volume is greated than the specified maximum volume: FAILURE

  6. Each molecule is subjected to a random translation based on the QR numbers, where values are pinned to be in the range for the given spacegroup (i.e. a QR number of 0.5 for the range [0, 0.25] would be 0.125).

  7. The molecules may be overlapping (i.e. have an unphysical interaction distance). Molecules are considered colliding if the distance between any two atoms is less than the sum of their covalent radii plus 0.5 Å. If such a collision is detected, then the shortest unit cell length is increased by 1.0 Å until there are no collisions. If the cell volume becomes greater than a target value, then FAILURE.

  8. After this step, the largest cell length is contracted in steps of 1.0 Å until a collision is detected. The generator then takes the previous structure and repeats the process with the next largest cell length. Finally, when the shortest cell length is contracted and a collision is detected, the last crystal structure which does not result in a collision is saved.

If all of these steps are successful, the crystal structure is returned to the user.

Example usage

For example, to generate 10 crystal structures in space group 14 from the molecule in the xyz file test.xyz:

cspy-clg -i test.xyz -g 14 -n 10

This command should print something like the following to stdout:

[I 08:53:39] Loading asymmetric unit molecule from test.xyz
[I 08:53:39] Generated crystals will have 1 molecule in their asymmetric unit
[I 08:53:39] Landscape is complete
[I 08:53:39] Summary:
       Accepted Rejected Success rate
Local        10        5       66.67%
Global       10        5       66.67%
[I 08:53:39] Writing structures to test_14.zip

The resulting structures will be written to test_14.zip (as shelx .res files). along with a zip file containing the generated structures.

cspy-clg takes whatever is in each .xyz file and uses this as an asymmetric unit. For example:

  • using one file containing one single molecule would produce a Z' = 1 crystal with the single molecule as the asymmetric unit

  • using one file containing two molecules in a fixed configuration would produce a Z' = 1 crystal with the 2 molecules in their fixed configuration in the asymmetric unit

  • using two files each containing one single molecule would produce a Z' = 2 crystal with the 2 molecules in a varying configuration in the asymmetric unit

For co-crystals, we would typically explicitly state the stoichiometry (rather than using Z') and whether the components in the asymmetric unit are fixed or free to move:

  • using one file containing two molecules in a fixed configuration would produce a 1:1 cocrystal with the 2 molecules in their fixed configuration in the asymmetric unit

  • using two file each containing one single molecule would produce a 1:1 cocrystal with the 2 molecules in a varying configuration in the asymmetric unit

  • using three files each containing one single molecule, with the first two molecules of the same type would produce a 2:1 cocrystal with the 3 molecules in a varying configuration in the asymmetric unit

To achieve this, structures with Z' > 1 can be made by including multiple space-separated xyz files after --input-filenames, e.g:

cspy-clg --input-filenames test.xyz test.xyz

Which would yield a Z' = 2 crystal, with two identical molecules in the asymmetric unit in varying configurations.

Adaptive cell flag

The adaptive cell flag (--adaptcell) is a relatively new feature added to mol-CSPy which is designed to improve the efficiency and success rate of the cspy-clg command.

By default, if a collision is detected during the QR step, the shortest cell length of the cell is increased by 1 Å and a new check is made for collisions. If the collisions are relieved, the structure generation is successful. If a collision is still present, the process repeats until no collision can be detected or the cell exceeds some maximum volume. If the collision is relieved, the longest cell length is reduced by 1 Å repetitively until no cell length can be reduced without inducing a clash.

With the adaptive cell flag, collision vectors are computed for all atomic collisions. The direction of this vector is taken to be the same as the vector between the two colliding atoms, and the magnitude of the vector is set to the displacement required to relieve that collision along that vector. The cell is then expanded in a single move proportionally to the magnitude of the collision vectors.

Where possible, cell lengths should then be reduced. If there are no collisions and hence no collision vectors, we use the following approach: for each molecule in the cell, a column, centered on the molecule and of diameter equal to its VdW sphere is defined through the cell in each direction. The distance in each axis between the molecule and its neighbours within the column is then computed. For each axis, the shortest distance across all molecules is then taken as the length of empty space in each axis. The cell is shrunk by half this value in each direction. Only half is used to minimise the likelihood of overshooting, and to prevent clashes being induced by multiple axes optimising simultaneously. Finally, the cell lengths are minimised precisely through the 1 Å step-wise process defined above.

Adaptive cell shows marginal improvements in computational speed over the default method and also marginally increases the success rate as the initial cell length expansion step has no bias towards cells with equal cell lengths.

Nudge flag

The nudge flag (--nudge) is another relatively new feature added to mol-CSPy which is designed to improve the efficiency and success rate of the cspy-clg command.

By default, if the volume of a cell exceeds the maximum volume threshold during the collision relief step, the crystal is invalidated and a new crystal is initialised. The nudge algorithm attempts to reduce the number of instances of this by displacing molecules into empty space within the cell.

Within the nudge algorithm, collision vectors are computed for the unexpanded cell and stored. If the cell exceeds the maximum volume threshold, the cell is shrunk back to its original size and all molecules are displaced proportionally to their resultant collision vectors in the unexpanded cell. Collision vectors are then updated and the cell may then be expanded as before (either by the default expansion algorithm or adaptive cell) until a valid crystal is acquired or the cell again exceeds some maximum volume. This process may repeat until each molecule in the cell has individually reached the maximum allowed number of nudges (set by specifying an interger number of moves with the --nudge flag). If this occurs and the cell volume is larger than the maximum threshold, the crystal is invalidated and counts as a FAILURE.