Question

alphafold hpc large fasta file job

0

Entering edit mode

4 months ago

Kayenat Sheikh • 0

Dear Community Members,

I am currently running AlphaFold jobs on a high-performance computing (HPC) cluster to predict hypothetical protein structures. For sequences ranging from 100 to 500 residues, I have successfully obtained five unrelaxed models per protein within a 24-hour job using 4 GPUs, 16 CPUs per task, and 200 GB of memory.

However, I am encountering issues with longer sequences, specifically those between 1000 to 2000 residues. Despite increasing the memory allocation to 500 GB, I have not been able to produce any models for these longer sequences.

Could anyone provide insights or suggestions on how to successfully run AlphaFold for proteins with longer sequences? Are there additional parameters or resources that I should consider adjusting?

Thank you for your assistance.

fasta hpc alpahfold prediction • 470 views

ADD COMMENT • link 4 months ago by Kayenat Sheikh • 0

score 0 · Answer 1 · 2024-07-07

0

Entering edit mode

4 months ago

Mensur Dlakic ★ 28k

If you are using GPUs for folding, that is most likely causing your problems. You didn't tell us anything about GPU memory, so I will make the best guess. Empirically, a GPU with 8 GB memory is enough for proteins in the range of 700-800 residues. It would appear that your GPUs are safely in that range. You will need higher-end GPUs - I think with 16-24 GB - to fold a protein that has 2000 residues. Maybe you already have them and only need to increase the GPU memory allocation.

Out of curiosity, how did you assign 500 GB on a computer with 200 GB of RAM? That aside, I don't think RAM will be a problem here whether you go with 200 or 500 GB.

ADD COMMENT • link 4 months ago by Mensur Dlakic ★ 28k

0

Entering edit mode

Hey thanks for responding. Could you take a look and tell what is wrong in the error file:

I0707 10:16:17.114566 22502113127296 templates.py:718] hit 4jz6_A did not pass prefilter: Proportion of residues aligned to query too small. Align ratio: 0.06842539159109645.
I0707 10:16:17.114745 22502113127296 templates.py:718] hit 6ekk_C did not pass prefilter: Proportion of residues aligned to query too small. Align ratio: 0.03709810387469085.
I0707 10:16:17.114921 22502113127296 templates.py:718] hit 6ekk_D did not pass prefilter: Proportion of residues aligned to query too small. Align ratio: 0.03709810387469085.
I0707 10:16:17.115144 22502113127296 templates.py:718] hit 6if2_B did not pass prefilter: Proportion of residues aligned to query too small. Align ratio: 0.03709810387469085.
I0707 10:16:17.115316 22502113127296 templates.py:718] hit 6if3_B did not pass prefilter: Proportion of residues aligned to query too small. Align ratio: 0.03709810387469085.
I0707 10:16:17.115643 22502113127296 templates.py:718] hit 7r7a_E did not pass prefilter: Proportion of residues aligned to query too small. Align ratio: 0.046166529266281946.
I0707 10:16:17.115961 22502113127296 templates.py:718] hit 2pmr_A did not pass prefilter: Proportion of residues aligned to query too small. Align ratio: 0.02225886232481451.
I0707 10:16:17.116190 22502113127296 templates.py:718] hit 6cdw_A did not pass prefilter: Proportion of residues aligned to query too small. Align ratio: 0.02225886232481451.
I0707 10:16:17.116366 22502113127296 templates.py:718] hit 6cgk_A did not pass prefilter: Proportion of residues aligned to query too small. Align ratio: 0.02225886232481451.
I0707 10:16:17.357235 22502113127296 pipeline.py:234] Uniref90 MSA size: 88 sequences.
I0707 10:16:17.357752 22502113127296 pipeline.py:235] BFD MSA size: 1471 sequences.
I0707 10:16:17.357936 22502113127296 pipeline.py:236] MGnify MSA size: 37 sequences.
I0707 10:16:17.358096 22502113127296 pipeline.py:237] Final (deduplicated) MSA size: 1585 sequences.
I0707 10:16:17.358629 22502113127296 pipeline.py:239] Total number of templates (NB: this can include bad templates and is later filtered to top 4): 1.
I0707 10:16:17.496235 22502113127296 run_alphafold.py:216] Running model model_1_multimer_v3_pred_0 on eiag13
I0707 10:16:17.497090 22502113127296 model.py:165] Running predict with shape(feat) = {'aatype': (1213,), 'residue_index': (1213,), 'seq_length': (), 'msa': (1585, 1213), 'num_alignments': (), 'template_aatype': (4, 1213), 'template_all_atom_mask': (4, 1213, 37), 'template_all_atom_positions': (4, 1213, 37, 3), 'asym_id': (1213,), 'sym_id': (1213,), 'entity_id': (1213,), 'deletion_matrix': (1585, 1213), 'deletion_mean': (1213,), 'all_atom_mask': (1213, 37), 'all_atom_positions': (1213, 37, 3), 'assembly_num_chains': (), 'entity_mask': (1213,), 'num_templates': (), 'cluster_bias_mask': (1585,), 'bert_mask': (1585, 1213), 'seq_mask': (1213,), 'msa_mask': (1585, 1213)}

ADD REPLY • link 4 months ago by Kayenat Sheikh • 0

0

Entering edit mode

There is no error in that file. Closer to the top, which you are not showing, it will tell you which device is used for folding.

ADD REPLY • link 4 months ago by Mensur Dlakic ★ 28k