Question

Imputation with BEAGLE 5.1 giving an inconsistent number of alleles error

1

Entering edit mode

4.6 years ago

User000 ▴ 710

Hello,

I did a variant calling of 200 genotypes with freebayes. I filtered for the DP and GQ values and the genotypes that did not pass the filter were set to ./.. I now want to impute these filtered vcf files with BEAGLE v5.1. But it is giving me the following error:

java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: ERROR: inconsistent number of alleles for sample Sample_1469 at marker [chr4A   305381905   .   G   A]

What could be the problem? I had a look at the position and it looks like this: .:.:.:.:.:.:.:.:. This is missing data. Could it be the reason? If yes, how could I deal with this?

freebayes beagle • 5.7k views

ADD COMMENT • link updated 22 months ago by yzliu01 ▴ 10 • written 4.6 years ago by User000 ▴ 710

score 0 · Answer 1 · 2020-05-01

0

Entering edit mode

4.6 years ago

gubrins ▴ 350

Heys, I'm in the same situation as you, did you solve it? For me is not missing data, as I don't have the pattern you have. If not, let's see if somebody can help us!

ADD COMMENT • link 4.6 years ago by gubrins ▴ 350

2

Entering edit mode

Hey, yes at the end I changed all . in missing data ./.. In my case the . is really a missing data, while ./. is the missing genotype after I filtered the vcf. I found an answer how to change here on BioStars, but I cannot find the thread to give it credits:

zcat vcf.gz | perl -pe "s/\s\.:/\t.\/.:/g" | bgzip -c > out.vcf

ADD REPLY • link 4.6 years ago by User000 ▴ 710

0

Entering edit mode

Sorry to bring up this issue again. But I was wondering if anyone be able to find out why this occurs?

I have the same issue where one sample (from 800) on one variant site (out of 20k) is called as ".:0,0:3:.:0|1:74380344_TAAAA_T:0,0,0:74380344", making it look like it's called haploid form

Rather than use a fix, I'd rather just prevent this form happening

ADD REPLY • link 24 months ago by shaunjc • 0

0

Entering edit mode

I got the same problem (.) when using Beagle after using vcftools to thin my data (previously ./.). The original un-thinned vcf file works well. Not knowing the reasons.

ADD REPLY • link 22 months ago by yzliu01 ▴ 10

1

Entering edit mode

Could you paste the pattern that you get?

ADD REPLY • link 4.6 years ago by User000 ▴ 710

0

Entering edit mode

Thank you very much for your answer! I've seen that we are quite a lot of users with doubts about Beagle but not a lot of people answering them, so your help is really appreacite it right now! I'm going to try right what you did, but just in case, here is my error message:

java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: ERROR: inconsistent number of alleles for sample unknown at marker [NC_041312.1  1098286 .       T       C]

As you can see, is similar to the one you got (NC_041312.1 is one of my chromosomes)

And here is the observation itself: NC_041312.1 1098286 . T C 65.76 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=2;CIGAR=1X;DP=2;DPB=2;DPRA=0;EPP=7.35324;EPPR=0;GTI=0;LEN=1;MEANALT=1; MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=7.37776;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=82;QR=0; RO=0;RPL=1;RPP=3.0103;RPPR=0;RPR=1;RUN=1;SAF=1;SAP=3.0103;SAR=1;SF=1;SRF=0;SRP=0;SRR=0;TYPE=snp GT:QA:RO:AO:AD:DP:GL:QR . 1/1:82:0:2:0,2:2:-7.77968,-0.60206,0:0

Let's see if you can help me, I hit a wall...

ADD REPLY • link 4.6 years ago by gubrins ▴ 350

1

Entering edit mode

My solution will not help in your case, since it is simply replacing the . with ./..Why is the name of your sample is unknown? How many samples you have in your vcf file? Also is there a .between QR . 1/1?

ADD REPLY • link 4.6 years ago by User000 ▴ 710

0

Entering edit mode

I don't know why when I merge the different vcf files, the first one is called always unknown. I was just trying with a total of 2 vcf files this time. And yes, it seems there is a . between them, do you think that could be the problem?

ADD REPLY • link 4.6 years ago by gubrins ▴ 350

1

Entering edit mode

Does you bam files have read groups? In my opinion this error is something to do with the previous SNP calling step...I do not think it is ok to have an unknown sample name

ADD REPLY • link 4.6 years ago by User000 ▴ 710

0

Entering edit mode

I was following this post and anything appeared, so I imagine I don't have them. https://gatkforums.broadinstitute.org/gatk/discussion/6472/read-groups

bwa mem -M -t 10 /mnt/CIBIO/homes/gabri.mochales/data/OXPHOS_run2/align/prueba/genome/index.fna /mnt/CIBIO/homes/gabri.mochales/data/OXPHOS_run2/RAPiD-Genomics_HL5T3BBXX_POR_100801_P01_WA01_i5-505_i7-59_S97_L001_R1_001.fastq.gz /mnt/CIBIO/homes/gabri.mochales/data/OXPHOS_run2/RAPiD-Genomics_HL5T3BBXX_POR_100801_P01_WA01_i5-505_i7-59_S97_L001_R2_001.fastq.gz | samtools sort -o /mnt/CIBIO/homes/gabri.mochales/data/OXPHOS_run2/results_alignment/RAPiD-Genomics_HL5T3BBXX_POR_100801_P01_WA01_i5-505_i7-59_S97_L001_.bam

where index is my genome, the paired end fastq files and the output. How can I create my read groups?

ADD REPLY • link 4.6 years ago by gubrins ▴ 350

1

Entering edit mode

could you please describe all the steps you are using to do variant calling? For example, I am following this freebayes protocol.

ADD REPLY • link 4.6 years ago by User000 ▴ 710

0

Entering edit mode

I followed that or another similar link:

freebayes -f index.fna /mnt/CIBIO/homes/gabri.mochales/data/OXPHOS_run2/results_gz/RAPiD-Genomics_HL5T3BBXX_POR_100801_P01_WA01_i5-505_i7-59_S97_L001_.bam > /mnt/CIBIO/homes/gabri.mochales/ecoli_SNP_calling/results_SNP_calling/RAPiD-Genomics_HL5T3BBXX_POR_100801_P01_WA01_i5-505_i7-59_S97_L001_.vcf

Is quite straightforward, let's see if it can help you

ADD REPLY • link 4.5 years ago by gubrins ▴ 350

0

Entering edit mode

Heys again, When I do this: java -jar picard.jar ValidateSamFile \ I=input.bam \ MODE=SUMMARY

I get all these errors, also they warn me that there is a missing read group:

Error Type Count ERROR:INVALID_FLAG_FIRST_OF_PAIR 23510 ERROR:INVALID_FLAG_MATE_UNMAPPED 5459 ERROR:INVALID_FLAG_SECOND_OF_PAIR 17307 ERROR:MISSING_READ_GROUP 1 WARNING:RECORD_MISSING_READ_GROUP 797715

ADD REPLY • link 4.5 years ago by gubrins ▴ 350

1

Entering edit mode

I can only suggest to use the best practice for every single method you use (freebayes, GATK or bcftools/samtools etc) and follow all the steps. For now for me it is very confusing to understand what you are doing. For sure read groups are missing, that is why you have unknown sample. In case of problems create and ask a new question here on the forum, I am sure you will find a solution.

ADD REPLY • link 4.5 years ago by User000 ▴ 710

0

Entering edit mode

Just for the information of everybody, rather than using samtools + bwa + freebayes, I did everything following GATK and the phasing is working!

ADD REPLY • link 4.5 years ago by gubrins ▴ 350

1

Entering edit mode

If you do everything in a right way, it will also work with freebayes as it did for me

ADD REPLY • link 4.5 years ago by User000 ▴ 710