Hello,
I did a variant calling of 200 genotypes with freebayes. I filtered for the DP and GQ values and the genotypes that did not pass the filter were set to ./.
. I now want to impute these filtered vcf files with BEAGLE v5.1. But it is giving me the following error:
java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: ERROR: inconsistent number of alleles for sample Sample_1469 at marker [chr4A 305381905 . G A]
What could be the problem? I had a look at the position and it looks like this: .:.:.:.:.:.:.:.:.
This is missing data. Could it be the reason? If yes, how could I deal with this?
Hey, yes at the end I changed all
.
in missing data./.
. In my case the.
is really a missing data, while./.
is the missing genotype after I filtered the vcf. I found an answer how to change here on BioStars, but I cannot find the thread to give it credits:Sorry to bring up this issue again. But I was wondering if anyone be able to find out why this occurs?
I have the same issue where one sample (from 800) on one variant site (out of 20k) is called as ".:0,0:3:.:0|1:74380344_TAAAA_T:0,0,0:74380344", making it look like it's called haploid form
Rather than use a fix, I'd rather just prevent this form happening
I got the same problem (.) when using Beagle after using vcftools to thin my data (previously ./.). The original un-thinned vcf file works well. Not knowing the reasons.
Could you paste the pattern that you get?
Thank you very much for your answer! I've seen that we are quite a lot of users with doubts about Beagle but not a lot of people answering them, so your help is really appreacite it right now! I'm going to try right what you did, but just in case, here is my error message:
As you can see, is similar to the one you got (NC_041312.1 is one of my chromosomes)
And here is the observation itself: NC_041312.1 1098286 . T C 65.76 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=2;CIGAR=1X;DP=2;DPB=2;DPRA=0;EPP=7.35324;EPPR=0;GTI=0;LEN=1;MEANALT=1; MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=7.37776;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=82;QR=0; RO=0;RPL=1;RPP=3.0103;RPPR=0;RPR=1;RUN=1;SAF=1;SAP=3.0103;SAR=1;SF=1;SRF=0;SRP=0;SRR=0;TYPE=snp GT:QA:RO:AO:AD:DP:GL:QR . 1/1:82:0:2:0,2:2:-7.77968,-0.60206,0:0
Let's see if you can help me, I hit a wall...
My solution will not help in your case, since it is simply replacing the
.
with./.
.Why is the name of your sample isunknown
? How many samples you have in your vcf file? Also is there a.
betweenQR . 1/1
?I don't know why when I merge the different vcf files, the first one is called always unknown. I was just trying with a total of 2 vcf files this time. And yes, it seems there is a . between them, do you think that could be the problem?
Does you bam files have read groups? In my opinion this error is something to do with the previous SNP calling step...I do not think it is ok to have an unknown sample name
I was following this post and anything appeared, so I imagine I don't have them. https://gatkforums.broadinstitute.org/gatk/discussion/6472/read-groups
bwa mem -M -t 10 /mnt/CIBIO/homes/gabri.mochales/data/OXPHOS_run2/align/prueba/genome/index.fna /mnt/CIBIO/homes/gabri.mochales/data/OXPHOS_run2/RAPiD-Genomics_HL5T3BBXX_POR_100801_P01_WA01_i5-505_i7-59_S97_L001_R1_001.fastq.gz /mnt/CIBIO/homes/gabri.mochales/data/OXPHOS_run2/RAPiD-Genomics_HL5T3BBXX_POR_100801_P01_WA01_i5-505_i7-59_S97_L001_R2_001.fastq.gz | samtools sort -o /mnt/CIBIO/homes/gabri.mochales/data/OXPHOS_run2/results_alignment/RAPiD-Genomics_HL5T3BBXX_POR_100801_P01_WA01_i5-505_i7-59_S97_L001_.bam
where index is my genome, the paired end fastq files and the output. How can I create my read groups?
could you please describe all the steps you are using to do variant calling? For example, I am following this freebayes protocol.
I followed that or another similar link:
freebayes -f index.fna /mnt/CIBIO/homes/gabri.mochales/data/OXPHOS_run2/results_gz/RAPiD-Genomics_HL5T3BBXX_POR_100801_P01_WA01_i5-505_i7-59_S97_L001_.bam > /mnt/CIBIO/homes/gabri.mochales/ecoli_SNP_calling/results_SNP_calling/RAPiD-Genomics_HL5T3BBXX_POR_100801_P01_WA01_i5-505_i7-59_S97_L001_.vcf
Is quite straightforward, let's see if it can help you
Heys again, When I do this: java -jar picard.jar ValidateSamFile \ I=input.bam \ MODE=SUMMARY
I get all these errors, also they warn me that there is a missing read group:
Error Type Count ERROR:INVALID_FLAG_FIRST_OF_PAIR 23510 ERROR:INVALID_FLAG_MATE_UNMAPPED 5459 ERROR:INVALID_FLAG_SECOND_OF_PAIR 17307 ERROR:MISSING_READ_GROUP 1 WARNING:RECORD_MISSING_READ_GROUP 797715
I can only suggest to use the best practice for every single method you use (freebayes, GATK or bcftools/samtools etc) and follow all the steps. For now for me it is very confusing to understand what you are doing. For sure read groups are missing, that is why you have unknown sample. In case of problems create and ask a new question here on the forum, I am sure you will find a solution.
Just for the information of everybody, rather than using samtools + bwa + freebayes, I did everything following GATK and the phasing is working!
If you do everything in a right way, it will also work with freebayes as it did for me