Question

Annotating dbnsfp by SnpSift on a cluster

0

Entering edit mode

5.1 years ago

kanika.151 ▴ 160

Hello All, I know that the command works as it ran for smaller chromosomes such as '21' or 'Y' on the cluster. Nodes: 3 and threads 24. Should I increase the no of. nodes or threads? Because that's the maximum memory (-Xmx64g) I can give. Or should I do something else as the other chromosomes are not getting processed. dbnsfp version 4.0a

SnpSift -Xmx64g dbnsfp -f ref,alt,aaref,aaalt,rs_dbSNP151,aapos,genename,Ensembl_geneid,Ensembl_transcriptid,Ensembl_proteinid,Uniprot_acc,Uniprot_entry,HGVSc_ANNOVAR,HGVSp_ANNOVAR,HGVSc_snpEff,HGVSp_snpEff,HGVSc_VEP,HGVSp_VEP,GENCODE_basic,VEP_canonical,cds_strand,refcodon,codonpos,codon_degeneracy,Ancestral_allele,AltaiNeandertal,Denisova,VindijiaNeandertal,SIFT_score,SIFT_converted_rankscore,SIFT_pred,SIFT4G_score,SIFT4G_converted_rankscore,SIFT4G_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,LRT_score,LRT_converted_rankscore,LRT_pred,LRT_Omega,MutationTaster_score,MutationTaster_converted_rankscore,MutationTaster_pred,MutationTaster_model,MutationTaster_AAE,MutationAssessor_score,MutationAssessor_rankscore,MutationAssessor_pred,PROVEAN_score,PROVEAN_converted_rankscore,PROVEAN_pred,VEST4_score,VEST4_rankscore,MetaSVM_score,MetaSVM_rankscore,MetaSVM_pred,MetaLR_score,MetaLR_rankscore,MetaLR_pred,MutPred_score,MutPred_rankscore,MutPred_protID,MutPred_AAchange,MutPred_Top5features,Aloft_Fraction_transcripts_affected,Aloft_prob_Tolerant,Aloft_prob_Recessive,Aloft_prob_Dominant,Aloft_pred,Aloft_Confidence,CADD_raw,CADD_raw_rankscore,CADD_phred,GERP++_NR,GERP++_RS,GERP++_RS_rankscore -db dbNSFP4.0a_{chr}_hg19.txt.gz input.{chr}.vcf > output.{chr}.vcf

next-gen dna-seq annotation dbnsfp snpsift • 1.9k views

ADD COMMENT • link 5.1 years ago by kanika.151 ▴ 160

0

Entering edit mode

What error are you getting? What scheduler/workload manager are you submitting to?

ADD REPLY • link 5.1 years ago by Brice Sarver ★ 3.8k

0

Entering edit mode

Error: java.lang.outofmemoryerror: java heap space.... Scheduler: PBS

ADD REPLY • link 5.1 years ago by kanika.151 ▴ 160

score 1 · Answer 1 · 2019-10-14

1

Entering edit mode

5.1 years ago

Brice Sarver ★ 3.8k

If your only issue is that you're running out of memory, the maximum memory has been reserved per node, and your -l mem=SIZE is set large enough by default, there's not a whole lot you can do. Instead, try using something like split to split your VCFs into more manageable files and combine them back into a single VCF at the end.

ADD COMMENT • link 5.1 years ago by Brice Sarver ★ 3.8k

0

Entering edit mode

I have already set it to the maximum memory size per node. And, the VCF files are split by chromosomes already. Do you mean I should split them further? :|

ADD REPLY • link 5.1 years ago by kanika.151 ▴ 160

1

Entering edit mode

Yes - if your data are from gnomAD, as you mentioned below, you're dealing with a ton of sites per chromosome for just the WES dataset and way more for the WGS dataset. I would test one or two partitions of, say, chromosome 1 and see what the memory footprint is. It may be helpful to bring up an interactive session on your cluster, if your configuration and administrator allows it.

ADD REPLY • link 5.1 years ago by Brice Sarver ★ 3.8k

score 0 · Answer 2 · 2019-11-11

0

Entering edit mode

5.1 years ago

kanika.151 ▴ 160

Done!

Forgot to index each file as Pablo said. tabix -s 1 -b 2 -e 2 "$file"

Thanks.

ADD COMMENT • link 5.1 years ago by kanika.151 ▴ 160