I'm trying to annotate my VCF file with wAnnovar and have received the following message:
User input contains 33631 lines
WARNING: 33574 invalid alternative alleles found in the input file
Annovar still produces files with results.
I found one person's comment on the Annovar author's website, and they say this is a memory error, and it is solved by splitting a vcf file from the multi-sample file for it to contain only one sample. My question is how to do that? Or maybe there is another problem and solution? Can I just ignore the warning maybe?
Any help is greatly appreciated. I am new to working with VCF files and bioinformatics in general.
this is a memory error, and it is solved by splitting a vcf file from the multi-sample file for it to contain only one sample.
If it is a memory error, a better way to address it would be splitting the VCF by chromosome instead of by sample. Annotation is per locus and multiple samples might have variants at the same locus. You want to annotate each locus once; so split by locus, not by sample.
I was using wANNOVAR until quite some time back, but I remember that they have a file size limit of 50 MB. You could try zipping your file into the required format if it exceeds the limit (I don't know the error it gives when the size exceeds).
Also, wANNOVAR is also long outdated, and the authors suggest using ANNOVAR instead. If you are going to be working on annotation often, I would suggest you download ANNOVAR and work on it offline. They have detailed all the installation steps, and the databases are well-maintained.
Lastly, I remember my colleague recently asked me for some help because wANNOVAR was down, and they were not accepting any files. Please ensure it is up and running again by putting the first few lines from your file as a sample input.
If it is a memory error, a better way to address it would be splitting the VCF by chromosome instead of by sample. Annotation is per locus and multiple samples might have variants at the same locus. You want to annotate each locus once; so split by locus, not by sample.
Your input file is in VCF format correct ? Some of the threads found by search have alternate explanations for the error: https://www.google.com/search?q=invalid+alternative+alleles+found+in+the+input+file
I'm sorry, I didn't get what you meant by that
They're asking if your input is in VCF format. The question is worded a little oddly.
Yes, it's a VCF file