start:software:topical:lifesciences:molecular_screening

# Structure Based Ligand Screening

This workflow along with its documentation is under development. Report issues at the project site. Be sure to add the version information, which you can obtain by running
$mflow --version Prior to staring the workflow, be sure to provide the necessary parameters. A sample configuration is provided, you can retrieve it by running: $ mflow --show-configuration StructureBasedScreening

This will not only print the sample configuration on screen, but save a file StructureBasedScreening.yaml, too.

Edit the file according to your needs, specifically to following entries:

# Path where docking results are stored
OUTPUT_DIR: "/lustre/project/<project account>/<desired output subdirectory path>"

# Number of best results to be displayed (0<value<=1: percentage)
RESULT_NUMBER: "10"

# specifiy the targets according to their PDB-Code as a comma separated list of strings, when applying the same ligands on different targets:
ENZYMES: ["PDB_ID1, <CHAIN_1> <CHAIN_2>", "PDB_ID2, <CHAIN_1> <CHAIN_2>"]

# each target requires a grid for VinaLC. This input needs to be provided as a grid parameter file in one directory
GRID_DIR: "/lustre/project/<project account>/<desired path to grid data>"

To obtain the grid parameter file for each of the targets use AutoDockTools to select the size and center of the grid box.

Specify your rescreening targets in the configfile in line 66. The best percentage of ligands from the first step, will be screened against these new targets. An output file containing the union of all best ligands and corresponding binding enthalpy.

RESCREENING_TARGETS: ["TARGET1,A B C", "TARGET2,A B C", "TARGET3, A B C"]

To start the workflow with compounds from ZINC database edit the configuration file accordingly.

# the DATABASE parameter in the configuration file needs to contain "ZINC"
DATABASE: ["ZINC"]

Choose the desired compounds using the ZINC nomenclature of ZINC and edit the following lines according to the parameters you want to select:

ZINC_INPUT:
WEIGHT: ["A", "B", "C","D","E","F","G"]
LOGP:  ["A", "D", "E", "F", "G", "H","I","J"]
REACT: ["A", "B", "C", "E", "G"]
PURCHASE: ["A", "B", "C", "D", "E"]
PH: ["M"]
CHARGE: ["N", "M", "O", "L", "P"]
Prior to running your data on all ZINC compounds, check the number of included ligands. The current workflow supports screening of less than a million ligands. We are working to extend the capability, but screening the entire ZINC database naively in one go is beyond Mogon's capacity.

Briefly, the following codes are used:

• WEIGHT - the mol. weight in Daltons. A ⇐ 200, B ⇐ 250, C ⇐ 300, D ⇐325, E ⇐ 350, F ⇐ 375, G ⇐ 400, H ⇐ 425, I ⇐ 450, J ⇐ 500, K > 500
• LOGP - logP. A: -1, B: 0, C: 1, D: 2, E: 2.5, F: 3, G: 3.5, H: 4, I: 4.5, J: 5, K :>5
• reactivity - A = anodyne, B = Bother (e.g. chromophores), C = clean (but pains ok), E = mild reactivity ok, G = reactive ok, I = hot chemistry ok
• PURCHASE - purchasability. A and B = in stock, C = in stock via agent, D = make on demand, E = boutique (expensive), F = annotated (not for sale)
• PH - pH range. R = ref (7.4), M = mid (near 7.4), L = low (around 6.4), H = high (around 8.4)
• CHARGE - molecular charge following InCHIKeys convention. N = neutral, M = minus 1, L = minus 2 (or greater). O = plus 1, P = plus 2 (or greater)

#### Unsing ZINC subsets

Choose a subset from ZINC subsets and edit line 55 in the configuration file to a valid subset name.

Change line 22 in the config file to anything else than ZINC or Enamine to use locally stored ligands:

DATABASE: ["MY_INPUT"]

You will also have to edit these to lines to specify the name of your dataset and select the folder where the input files should be taken from. This workflow expects ligand files in pdbqt format.

LOC_DATA: ["<SAMPLE_NAME>"]

LOCAL_INPUT_DIR: "/PATH/TO/INPUT/<folder_with_ligands>"

After the configuration is edited, running the workflow is simply done with:

\$ mflow --workflow StructureBasedScreening -A <my SLURM account> --configfile <my_config.yaml>

Note, that the run time may exceed the life time of your terminal. To avoid aborting the workflow, you can

1. start it in nohup mode (see the ''mflow''-documentation)
2. rely on snakemake to resume unfinished workflow steps the other day

Already submitted jobs will not be aborted, when the terminal running the workflow looses its connection to Mogon.

• start/software/topical/lifesciences/molecular_screening.txt