Entering edit mode
6.3 years ago
BAGeno
▴
190
Hi,
I have vcf of 1000 samples. But I am facing the problem that I have different allele number of every site in vcf. There are some samples which have dots instead of 0 and 1 in genotype column. Can any one please tell me how should in correct the problem of different allele number?
Hello BAGeno,
you can do this with bcftools:
There are more option available that might be useful. Have a look at:
fin swimmer
EDIT:
I moved my post to an comment. Because I first thought that you have
.
as genotype and wanted./.
. If so the above solution should work (and I can move my post back to an answer). If you already have./.
in your vcf, than see the comment by Ram what this means.TIL! I did not know this. What does this do exactly?
I cannot tell much more than what the help file do:
So one can use it for example that male sample have gentotypes on X chromosome like
0
and1
but females0/0
,0/1
,1/1
.I have ./. in my vcf. Should I remove these calls from my vcf. I have do different population analysis. I did not called variants so I cannot do anything on that step.
This mainly depends on what exactly is your goal and how many samples have no calls in regions where that variant was found.
Without knowing this there is no general answer.
fin swimmer
I want to do population analysis. whether a certain disease variants is present in the population or not. Also can you please tell me should you I check this?
One way is to use gatk VariantsToTable.
Or with
awk
(inspired by Kevin) :fin swimmer
./.
is where the caller could not confidently call a genotype. I don't think there is much you can do computationally to address that, unless you had stringent filters set in the current call.