A note on learning to run AlphaFold 3¶
Author: Xiping Gong (xipinggong@hotmail.com, Department of Food Science and Technology, College of Agricultural and Environmental Sciences, University of Georgia, Griffin, GA, USA)
Date: 01/22/2025
Introduction¶
AlphaFold 2 has renolutionized biomolecular structrue prediction by providing accurate 3D protein structures, which can be effectively used for rapid molecular docking (DOI: https://doi.org/10.1038/s41586-021-03819-2). This year, AlphaFold 3 was launched, extending its capability to accurately model the biomolecule-ligand interactions, likely offering unprecedented precision in studying PFAS binding to critical toxicological targets, such as proteins (DOI: https://doi.org/10.1038/s41586-024-07487-w). It was claimed that its advanced predictive accuracy significantly surpasses that of tranditional molecular docking models (e.g., AutoDock Vina), providing more opportunities in understanding the PFAS-biomolecule binding mechanisms that drive PFAS bioaccumulation and toxicity (DOI: https://doi.org/10.1038/s41586-024-07487-w). The recent release of open-source code in November 2024 (Link: https://github.com/google-deepmind/alphafold3) introduces high-throughput capabilities, making it possible to rapidly screen a wide array of biomolecule-logand interactions. These advancements provide a foundation for generating high-quality structural features on PFAS-biomolecule interaction.
This note uses the PFOA-human serum albumin interaction as an example to demonstrate how AlphaFold 3 can be utilized for docking. Additionally, I discuss the docking results and compare them to the outcomes obtained using AutoDock Vina from our previous note.
AlphaFold 3: https://github.com/google-deepmind/alphafold3
An example: PFOA - human serum albumin (hSA) protein¶
The goal of this example is to how we can use the AlphaFold 3 to predict the binding of PFOA with the hSA protein. To test it, I integrated all scripts (Python and Bash) together, so that we can automatically screen other potential PFAS molecules.
Background¶
Reference Maso, Lorenzo, et al. "Unveiling the binding mode of perfluorooctanoic acid to human serum albumin." Protein Science 30.4 (2021): 830-841. DOI: https://doi.org/10.1002/pro.4036
Figure 1. Structure of hSA in complex with PFOA and Myr. Chemical structure (top) and composite omit maps depicting the (Fo−Fc) electron density (bottom) of PFOA (a) and Myr (b) contoured at 4σ; (c) Crystal structure of hSA-PFOA-Myr complex (white) obtained using a twofold molar excess of PFOA over Myr [PDB identification code: 7AAI]; (d) Superimposition of hSA-PFOA-Myr ternary complex (white) with aligned hSA-Myr binary complex (blue white) [PDB identification code: 7AAE]. The structure of hSA is organized in homologues domains (I, II and III), subdomains (A and B), fatty acids (FA) and Sudlow's binding sites. The α-helices of hSA are represented by cylinders. Bound PFOA and Myr are shown in a ball-and-stick representation with a semi-transparent van der Waals and colored by atom type (PFOA: carbon = dark salmon, oxygen = firebrick, fluorine = palecyan; Myr: carbon = smudge green, oxygen = firebrick). The electron density PFOA and Myr is shown as grey mesh. (Note: I switched the "7AAE" with "7AAI" after checking out both structures from the PDB database.)
A general script to run the docking¶
# 1. Prepare the input files: input.json and the parameters file
# The parameters file can be requested, and please see the link: https://github.com/google-deepmind/alphafold3
# The input file can be found in the Appendix file, and you can also check out the document from here: https://github.com/google-deepmind/alphafold3/blob/main/docs/input.md
# 2. Using a bash script to run the job.
$ bash sub.sh # Please see the sub.sh script in the Appendix for the details.
# 3. Check out the output
# Please see the documentation from here: https://github.com/google-deepmind/alphafold3/blob/main/docs/output.md
Analysis & Conclusion¶
Figure 1 Comparison of PFOA-hSA interaction structures obtained experimentally and through AlphaFold 3 docking..
The results reveal a close alignment between the two methods, with the head group of PFOA showing strong similarity. Notably, no specific binding pocket was predefined in this docking example, indicating that AlphaFold 3 can accurately predict the binding pocket of PFOA in the hSA protein. However, differences are observed in the orientation of the PFOA tail. In conclusion, this docking result relatively surpasses that of AutoDock Vina, where the outcome heavily depends on the predefined docking box (see more details here). Combining these two tools could be a powerful strategy: using AlphaFold 3 to predict the binding pocket and then leveraging AutoDock Vina to refine the binding site prediction. This approach can yield PFOA-protein binding predictions that more closely resemble experimental results.
Appendix¶
sub.sh¶
#!/bin/bash
#SBATCH --job-name=af3 #Name your job something original
#SBATCH --partition=gpu_p #Use the GPU partition
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=32 #If you use the default options, AlphaFold3 will run four simutaneous Jackhmmer processes with 8 CPUs each
#SBATCH --gres=gpu:1 #If you don’t care whether your job uses an A100 node or an H100 node (and there isn’t much difference in run time)…
#SBATCH --constraint=Milan|SapphireRapids #…this is the easiest way to specify either one without accidentally using a P100 or L4, which lack sufficient device memory
#SBATCH --mem=60gb
#SBATCH --time=120:00:00
#SBATCH --output=x_%x.%j.out
#SBATCH --error=x_%x.%j.err
SECONDS=0 # Reset the timer
cd $SLURM_SUBMIT_DIR
af3_param_dir='/xx/alphafold3' # the directory to include the AlphaFold 3 model parameters file, which can be "af3.bin".
work_dir='/xx/alphafold3' # the directory to have all input files, including "input.json" and "sub.sh".
# unnecessary to change unless you know what you are doing
singularity exec \
--nv \
--bind $work_dir:/root/af_input \
--bind $work_dir:/root/af_output \
--bind $af3_param_dir:/root/models \
--bind /db/AlphaFold3/20241114:/root/public_databases \
/apps/singularity-images/alphafold-3.0.0-CCDpatched.sif \
python /app/alphafold/run_alphafold.py \
--json_path=/root/af_input/input.json \
--model_dir=/root/models \
--db_dir=/root/public_databases \
--output_dir=/root/af_output
echo "# Elapsed time: $SECONDS seconds"
input.json¶
{
"name": "pfoa_hsa",
"sequences": [
{
"protein": {
"id": "A",
"sequence": "AHKSEVAHRFKDLGEENFKALVLIAFAQYLQQCPFEDHVKLVNEVTEFAKTCVADESAENCDKSLHTLFGDKLCTVATLRETYGEMADCCAKQEPERNECFLQHKDDNPNLPRLVRPEVDVMCTAFHDNEETFLKKYLYEIARRHPYFYAPELLFFAKRYKAAFTECCQAADKAACLLPKLDELRDEGKASSAKQRLKCASLQKFGERAFKAWAVARLSQRFPKAEFAEVSKLVTDLTKVHTECCHGDLLECADDRADLAKYICENQDSISSKLKECCEKPLLEKSHCIAEVENDEMPADLPSLAADFVESKDVCKNYAEAKDVFLGMFLYEYARRHPDYSVVLLLRLAKTYETTLEKCCAAADPHECYAKVFDEFKPLVEEPQNLIKQNCELFEQLGEYKFQNALLVRYTKKVPQVSTPTLVEVSRNLGKVGSKCCKHPEAKRMPCAEDYLSVVLNQLCVLHEKTPVSDRVTKCCTESLVNRRPCFSALEVDETYVPKEFNAETFTFHADICTLSEKERQIKKQTALVELVKHKPKATKEQLKAVMDDFAAFVEKCCKADDKETCFAEEGKKLVAASQAALGL"
}
},
{
"ligand": {
"id": "B",
"smiles": "C(C(C(C(C(C(C(F)(F)F)(F)F)(F)F)(F)F)(F)F)(F)F)(=O)[O-]"
}
}
],
"modelSeeds": [1],
"bondedAtomPairs": [],
"dialect": "alphafold3",
"version": 2
}
What does the output directory contain?¶
├── TERMS_OF_USE.md
├── pfoa_hsa_confidences.json
├── pfoa_hsa_data.json
├── pfoa_hsa_model.cif
├── pfoa_hsa_summary_confidences.json
├── ranking_scores.csv
├── seed-1_sample-0
│ ├── confidences.json
│ ├── model.cif
│ └── summary_confidences.json
├── seed-1_sample-1
│ ├── confidences.json
│ ├── model.cif
│ └── summary_confidences.json
├── seed-1_sample-2
│ ├── confidences.json
│ ├── model.cif
│ └── summary_confidences.json
├── seed-1_sample-3
│ ├── confidences.json
│ ├── model.cif
│ └── summary_confidences.json
└── seed-1_sample-4
├── confidences.json
├── model.cif
└── summary_confidences.json
# please check out the documentation for the details: https://github.com/google-deepmind/alphafold3/blob/main/docs/output.md
Additional documentation?¶
Please also check out the documentation from here: https://wiki.gacrc.uga.edu/wiki/AlphaFold3-Sapelo2