Entering edit mode
6.5 years ago
kirannbishwa01
★
1.6k
I am trying to get both the left and right reference genome using a sample name in phase VCF file.
This works:
bcftools consensus -f lyrata_chr01-short.fasta phasedVCF-short.vcf.gz -s ms01e -H 1 > ms01e-left.fa
This doens't work though: when I am trying to make the new Ref Sequence by imputing genotype from the right haplotype.
bcftools consensus -f lyrata_chr01-short.fasta phasedVCF-short.vcf.gz -s ms01e -H 2 > ms01e-right.fa
Broken VCF, too few alts at 1:141
What is the issue ?
Hello kirannbishwa01,
could you show us the vcf entry of the position 1:141?
fin swimmer
@finswimmer:
Here is my VCF data:
The
GT
field is empty atPOS 141
forSAMPLE ms01e
, but in that situation it should by default pick aREF
allele. Isn't it? And, it only is the problem when doing-H 2
not-H 1
.Thanks,
My guess is that at position 141, Alternate allele is supported by only one sample. Imputation might be considering all samples or certain percentage. Probably that is the reason it is failing. To cross check, try to fill up record 141 with dummy data and re run the imputation code.
@cpad : that is already tested and works. But, I don't want to add a dummy GT in there - which on large data is lots of work. Since, I am trying to create a personal genome here, an empty
GT = "."
for any sample of interest should be automatically treated as reference allele.I believe this is just a bug in the program or a feature that was not fixed on final software.
This is not stated out in the documentation. You could try to include only sites where there is a genotype information.
fin swimmer
@ finswimmer: I am still getting exactly the same error message.
@fin swimmer: I have been trying to fix this issue by playing with
-i and -e
using the manual. But, still cannot find the fix.Hm,
it seems that
-i
and-e
doesn't look only on the specified sample. What maybe work is first filter sites with no genotype for the sample usingbcftools view
and use thenbcftools consensus
.According to your question here, this could work either:
fin wimmer
I will try that tomorrow. I tried to take a different approach to this problem, but another issue came up. I raised a github issue. If you can shed light on the problem. Thanks for beginning helpful !
What version of bcftools are you running, can you try with the latest github version? I believe it should work, the tests in contain cases with missing genotypes.
Meanwhile I'm pretty sure that the ploidy is the problem. It could be that just fixing this is enough, and there is no need to replace it with
0|0
.fin swimmer
@fin swimmer: Did you had chance to look at this issue ?