alphafold hpc large fasta file job
1
0
Entering edit mode
10 weeks ago

Dear Community Members,

I am currently running AlphaFold jobs on a high-performance computing (HPC) cluster to predict hypothetical protein structures. For sequences ranging from 100 to 500 residues, I have successfully obtained five unrelaxed models per protein within a 24-hour job using 4 GPUs, 16 CPUs per task, and 200 GB of memory.

However, I am encountering issues with longer sequences, specifically those between 1000 to 2000 residues. Despite increasing the memory allocation to 500 GB, I have not been able to produce any models for these longer sequences.

Could anyone provide insights or suggestions on how to successfully run AlphaFold for proteins with longer sequences? Are there additional parameters or resources that I should consider adjusting?

Thank you for your assistance.

fasta hpc alpahfold prediction • 408 views
ADD COMMENT
0
Entering edit mode
10 weeks ago
Mensur Dlakic ★ 28k

If you are using GPUs for folding, that is most likely causing your problems. You didn't tell us anything about GPU memory, so I will make the best guess. Empirically, a GPU with 8 GB memory is enough for proteins in the range of 700-800 residues. It would appear that your GPUs are safely in that range. You will need higher-end GPUs - I think with 16-24 GB - to fold a protein that has 2000 residues. Maybe you already have them and only need to increase the GPU memory allocation.

Out of curiosity, how did you assign 500 GB on a computer with 200 GB of RAM? That aside, I don't think RAM will be a problem here whether you go with 200 or 500 GB.

ADD COMMENT
0
Entering edit mode

Hey thanks for responding. Could you take a look and tell what is wrong in the error file:

I0707 10:16:17.114566 22502113127296 templates.py:718] hit 4jz6_A did not pass prefilter: Proportion of residues aligned to query too small. Align ratio: 0.06842539159109645.
I0707 10:16:17.114745 22502113127296 templates.py:718] hit 6ekk_C did not pass prefilter: Proportion of residues aligned to query too small. Align ratio: 0.03709810387469085.
I0707 10:16:17.114921 22502113127296 templates.py:718] hit 6ekk_D did not pass prefilter: Proportion of residues aligned to query too small. Align ratio: 0.03709810387469085.
I0707 10:16:17.115144 22502113127296 templates.py:718] hit 6if2_B did not pass prefilter: Proportion of residues aligned to query too small. Align ratio: 0.03709810387469085.
I0707 10:16:17.115316 22502113127296 templates.py:718] hit 6if3_B did not pass prefilter: Proportion of residues aligned to query too small. Align ratio: 0.03709810387469085.
I0707 10:16:17.115643 22502113127296 templates.py:718] hit 7r7a_E did not pass prefilter: Proportion of residues aligned to query too small. Align ratio: 0.046166529266281946.
I0707 10:16:17.115961 22502113127296 templates.py:718] hit 2pmr_A did not pass prefilter: Proportion of residues aligned to query too small. Align ratio: 0.02225886232481451.
I0707 10:16:17.116190 22502113127296 templates.py:718] hit 6cdw_A did not pass prefilter: Proportion of residues aligned to query too small. Align ratio: 0.02225886232481451.
I0707 10:16:17.116366 22502113127296 templates.py:718] hit 6cgk_A did not pass prefilter: Proportion of residues aligned to query too small. Align ratio: 0.02225886232481451.
I0707 10:16:17.357235 22502113127296 pipeline.py:234] Uniref90 MSA size: 88 sequences.
I0707 10:16:17.357752 22502113127296 pipeline.py:235] BFD MSA size: 1471 sequences.
I0707 10:16:17.357936 22502113127296 pipeline.py:236] MGnify MSA size: 37 sequences.
I0707 10:16:17.358096 22502113127296 pipeline.py:237] Final (deduplicated) MSA size: 1585 sequences.
I0707 10:16:17.358629 22502113127296 pipeline.py:239] Total number of templates (NB: this can include bad templates and is later filtered to top 4): 1.
I0707 10:16:17.496235 22502113127296 run_alphafold.py:216] Running model model_1_multimer_v3_pred_0 on eiag13
I0707 10:16:17.497090 22502113127296 model.py:165] Running predict with shape(feat) = {'aatype': (1213,), 'residue_index': (1213,), 'seq_length': (), 'msa': (1585, 1213), 'num_alignments': (), 'template_aatype': (4, 1213), 'template_all_atom_mask': (4, 1213, 37), 'template_all_atom_positions': (4, 1213, 37, 3), 'asym_id': (1213,), 'sym_id': (1213,), 'entity_id': (1213,), 'deletion_matrix': (1585, 1213), 'deletion_mean': (1213,), 'all_atom_mask': (1213, 37), 'all_atom_positions': (1213, 37, 3), 'assembly_num_chains': (), 'entity_mask': (1213,), 'num_templates': (), 'cluster_bias_mask': (1585,), 'bert_mask': (1585, 1213), 'seq_mask': (1213,), 'msa_mask': (1585, 1213)}
ADD REPLY
0
Entering edit mode

There is no error in that file. Closer to the top, which you are not showing, it will tell you which device is used for folding.

ADD REPLY

Login before adding your answer.

Traffic: 1499 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6