I have a vcf file in which I want to extract the genotypes and convert them into a 0,1, 2 matrix. Is there a way to extract only the unphased genotypes using Plink or VCF tools or maybe with grep?
Thank you in advance
I have a vcf file in which I want to extract the genotypes and convert them into a 0,1, 2 matrix. Is there a way to extract only the unphased genotypes using Plink or VCF tools or maybe with grep?
Thank you in advance
Using my tool bioalcidae https://github.com/lindenb/jvarkit/wiki/BioAlcidae
while(iter.hasNext())
{
var ctx = iter.next();
out.print(ctx.contig);
out.print("\t");
out.print(ctx.start);
for(var I = 0;i< ctx.getNSamples();++i)
{
var g = ctx.getGenotype(i);
out.print("\t");
if(g.isCalled() && !g.isPhased())
{
if(g.isHomVar())
{
out.print("2");
}
else if(g.isHomRef())
{
out.print("0");
}
else if(g.isHet() && !g.isHetNonRef())
{
out.print("1");
}
else
{
out.print("9");
}
}
else
{
out.print("9");
}
}
out.println();
}
Usage:
java -jar dist-1.133/bioalcidae.jar -F VCF -f filter.js your.vcf
It's cool, I figured out how to print the names with the example VCF script on the bioalcidae homepage. :) On a different note, though, I think there's a typo in the script above - I believe the "g.isHomRef()" clause should print a 0, not a 1, and vice-versa with the "g.isHet() && !g.isHetNonRef()" statement.
how can I get the filter.js file?
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
not clear. what is the result of "I've already extracted all of the genotypes,"
Sorry, I rephrased. Thanks for pointing that out.