Entering edit mode
8 months ago
realtreeecat
•
0
Chrom Pos ID REF ALT QUAL FILTER INFO FORMAT EPISL_12878 [other samples...]
NC_045512.2 2 NC_045512.2_2_T_C T C . PASS ANN=C|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|GU280_gp01|protein_coding||c.-264T> GT . . . . . 1 . . .
I'm new to using VCF data and have merged all my VCF files for my samples, resulting in a combined VCF file. It contains mutation information represented as '.' and '1'. How can I extract mutations, (for e.g.,all mutation in 'ORF1ab' gene) from this file?
If merging VCF files isn't recommended, how can I efficiently extract information from individual VCF files, considering I have over 5 thousand samples? Thank you.
Where did you get that idea from?
Please read the bcftools manual for a multitude of ways to work with VCF files.
Just a note - if multiallelic sites are not split in your VCF, then you don't have only "." and "1"; you can have 2, 3... referring to the second, third etc. alt alleles.