How to extract allel, Genotype from vcf file using python or other language for 23GB files?
Well, I am able to right script to get allele but for large VCF file it's difficult? what should other possible way to get allele, Genotype information?
How to extract allel, Genotype from vcf file using python or other language for 23GB files?
Well, I am able to right script to get allele but for large VCF file it's difficult? what should other possible way to get allele, Genotype information?
See bcftools query.
EDIT: WIth bcftools query
you can print any information you like. So in your case e.g.:
$ bcftools query -f '%CHROM %POS %REF %ALT [ %GT]\n' input.vcf
The output looks now like this:
chr1 10177 ACC ACCC 0/1
chr1 10327 T C 0/0
chr1 10352 TAC TAAC 1/1
chr1 12783 G A 1/1
fin swimmer
Hello Ram,
if an "answer" is just intended for full copy&paste solution then my post is indeed more a comment. But I thought that telling the tool with it's subcommand and linking to the good manual is an answer enough.
I extended my post now to an full answer :)
cpad was faster than me, right. I didn't saw his answer as I haven't reload the page.
fin swimmer
Extracting genotype information using R.
library(vcfR)
vcf <- read.vcfR(vcf_file, verbose = FALSE )
gt <- extract.gt(vcf, element = c('GT'), as.numeric = TRUE)
For python take a look at the following article.
http://alimanfoo.github.io/2017/06/14/read-vcf.html
Genotypes can also be extracted using SnpSift.jar
in snpEff using the following command.
java -jar ../snpEff/SnpSift.jar extractFields annotated.vcf CHROM POS REF ALT "GEN[*].GT" > output.tsv
Doesn't look like vcfR does streaming read, so I would not recommend it as it's not a great idea to build an in-memory object of an entire VCF file. A better strategy would be to use closer-to-bare-metal tools such as bcftools to extract information, then use R or Python to compute on extracted information.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
try
bcftools query
.how about VCFtools?
Why is this a
tool
post? A question about tools should be aquestion
-type post, not atool
-type post.What have you tried?
May help the user (AWK ideas):A: How to get sample names and genotype for SNP in multi-sample VCF fileActually, I have a Python script that can parse a VCF, in fact: Filtering VCF with python
Why have you replied to my comment, Kevin?
Did not want to create yet another 4th and independent comment
You can take a look at this two scripts wrote in python to split a vcf and select what you want : A: VCF file help and C: parsing vcf file