Extracting information from my VCF file
0
0
Entering edit mode
8 months ago
Chrom Pos ID  REF ALT     QUAL    FILTER  INFO    FORMAT  EPISL_12878 [other samples...]
NC_045512.2 2   NC_045512.2_2_T_C   T   C   .   PASS ANN=C|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|GU280_gp01|protein_coding||c.-264T> GT   .   .   .   .   .   1   .   .   .

I'm new to using VCF data and have merged all my VCF files for my samples, resulting in a combined VCF file. It contains mutation information represented as '.' and '1'. How can I extract mutations, (for e.g.,all mutation in 'ORF1ab' gene) from this file?

If merging VCF files isn't recommended, how can I efficiently extract information from individual VCF files, considering I have over 5 thousand samples? Thank you.

VCF • 459 views
ADD COMMENT
1
Entering edit mode

If merging VCF files isn't recommended

Where did you get that idea from?

Please read the bcftools manual for a multitude of ways to work with VCF files.

ADD REPLY
1
Entering edit mode
bcftools view --regions bed_for_your_gene.bed indexed.vcf.gz
ADD REPLY
0
Entering edit mode

Just a note - if multiallelic sites are not split in your VCF, then you don't have only "." and "1"; you can have 2, 3... referring to the second, third etc. alt alleles.

ADD REPLY

Login before adding your answer.

Traffic: 2492 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6