Alphafold

AlphaFold Reference Data

Reference data for AlphaFold are stored at a central storage point to avoid overhead.

The path is /lustre/project/alphafold_users.

Note: This path is a link to the latest AlphaFold database (which will receive a version flag), for example alphafold_users -> alphafold_users_v2.3.0. Older version will be kept for a period of time (which is not yet set).

The AlphaFold Module

Generally, software is provided per modulefile.

The support via module files may not work smoothly, users may resort to the containerized version (see below).
#!/bin/bash

#SBATCH -J <name of your job>
#SBATCH -o <desired name for log file>.%j.log
#SBATCH -A <account>
#SBATCH -p <m2_gpu|deeplearning>
#SBATCH --gres=gpu:1 # NOTE: AlphaFold is multi gpu capable, but
                     #       apparently not stable.
#SBATCH -c 8         # NOTE: Non-GPU components of AlphaFold are
                     #       hardly able to use more than 8 CPUs.
#SBATCH --mem=20G    # NOTE: For really large protein complexes more
                     #       memory might be needed.
#SBATCH -t 300       # NOTE: This is plenty of time for small and medium
                     #       sized problems. Increase the time value in case
                     #       of bigger simulations.


################################################################################
# load environment

module purge
module load bio/AlphaFold

################################################################################
# variables

INFILE=<path to input FASTA file>
# NOTE: AlphaFold per default creates an output file using the input file name.
#       In order to avoid overriding old runs, you can indicate an own, holding
#       the unique jobid.
OUTDIR=$PWD/alphafold_test_$SLURM_JOB_ID
mkdir OUTDIR

# NOTE: As the environment variable $ALPHA_FOLD_DATA is set by the module,
#       no further data indicators are required upon starting the programm.
srun alphafold \
    --output_dir=$OUTDIR \
    --fasta_paths=$INFILE \
    --max_template_date=<max_template_date, e.g. '2020-05-14'> \
    --db_preset=<full_dbs|reduced_dbs> \
    --model_preset=<monomer|multimer>

AlphaFold per Container

The container location is /lustre/project/alphafold_users/container. Select your container version there and enter the appropriate name in the template script below (under <alphafold_container_version>.sif).

#!/bin/bash

#SBATCH -J <name of your job>
#SBATCH -o <desired name for log file>.%j.log
#SBATCH -A <account>
#SBATCH -p <m2_gpu|deeplearning>
#SBATCH --gres=gpu:1 # NOTE: AlphaFold is multi gpu capable, but
                     #       apparently not stable.
#SBATCH -c 8         # NOTE: Non-GPU components of AlphaFold are
                     #       hardly able to use more than 8 CPUs.
#SBATCH --mem=20G    # NOTE: For really large protein complexes more
                     #       memory might be needed.
#SBATCH -t 300       # NOTE: This is plenty of time for small and medium
                     #       sized problems. Increase the time value in case
                     #       of bigger simulations.


################################################################################

module purge
module load tools/AppTainer # NOTE: The AppTainer module provides support for
                            #       a Singularity container.

################################################################################

INFILE=<path to input FASTA file>
# NOTE: AlphaFold per default creates an output file using the input file name.
#       In order to avoid overriding old runs, you can indicate an own, holding
#       the unique jobid.
OUTDIR=$PWD/alphafold_test_$SLURM_JOB_ID
mkdir -p $OUTDIR

ALPHAFOLD_DATA_DIR=/lustre/project/alphafold_users
CONTAINERPATH=${ALPHAFOLD_DATA_DIR}/container

# NOTE: The $ALPHAFOLD_DATA_DIR is not accepted by the current container version, therefore
#       the individual flags need to be defined.

srun singularity run --env TF_FORCE_UNIFIED_MEMORY=1,XLA_PYTHON_CLIENT_MEM_FRACTION4.0,OPENMM_CPU_THREADS=8
  -B .:/etc --nv ${CONTAINERPATH}/<alphafold_container_version>.sif
  --fasta_paths=$INFILE
  --output_dir=$OUTDIR
  --max_template_date=<max_template_date, e.g. '2020-05-14'>
  --data_dir=$ALPHAFOLD_DATA_DIR
  --uniref90_database_path=$ALPHAFOLD_DATA_DIR/uniref90/uniref90.fasta
  --mgnify_database_path=$ALPHAFOLD_DATA_DIR/mgnify/mgy_clusters_2018_12.fa
  --small_bfd_database_path=$ALPHAFOLD_DATA_DIR/small_bfd/bfd-first_non_consensus_sequences.fasta
  --pdb70_database_path=$ALPHAFOLD_DATA_DIR/pdb70/pdb70
  --template_mmcif_dir=$ALPHAFOLD_DATA_DIR/pdb_mmcif/mmcif_files
  --obsolete_pdbs_path=$ALPHAFOLD_DATA_DIR/pdb_mmcif/obsolete.dat
  --use_gpu_relax
  --db_preset=<full_dbs|reduced_dbs>
  --model_preset=<monomer|multimer>