Question

Extracting mutation information from a VCF file.

0

Entering edit mode

9 months ago

realtreeecat • 0

I have a generated VCF files for each sequences (around 3500) and I am new to this. From the information in the VCF file, I want to identify mutations (nt, aa, Spik, ORFs,E M) all these Sars-cov-2 genetic variations that have occured frequently or has higher occurance in this 3500 sample population . Is there any tool or CLI to do this? I just want to Identify Mutations that have higher occurences or unique mutations (Although I am not sure what mutations can be labeled 'unique') . Below is the snippet of what my VCF file looks like:

POS REF ALT EFFECT  IMPACT  GENE    GENEID  FEATURE FEATUREID   BIOTYPE RANK    HGVS_C  HGVS_P  CDNA_POS    CDS_POS AA_POS  DISTANCE    ERRORS
210 G   T   upstream_gene_variant   MODIFIER    ORF1ab  GU280_gp01  transcript  GU280_gp01  protein_coding      c.-56G>T                    56  
241 C   T   upstream_gene_variant   MODIFIER    ORF1ab  GU280_gp01  transcript  GU280_gp01  protein_coding      c.-25C>T                    25  
707 G   A   missense_variant    MODERATE    ORF1ab  GU280_gp01  transcript  GU280_gp01  protein_coding  1/2 c.442G>A    p.Glu148Lys 442/21291   442/21291   148/7096        
1298    G   T   missense_variant    MODERATE    ORF1ab  GU280_gp01  transcript  GU280_gp01  protein_coding  1/2 c.1033G>T   p.Gly345Cys 1033/21291  1033/21291  345/7096        
3037    C   T   synonymous_variant  LOW ORF1ab  GU280_gp01  transcript  GU280_gp01

Biopython VCF phylogenetic UShER • 490 views

ADD COMMENT • link updated 9 months ago by Pierre Lindenbaum 164k • written 9 months ago by realtreeecat • 0

2

Entering edit mode

Below is the snippet of what my VCF file looks like:

FYI : this is not a VCF file. This is just a table.

ADD REPLY • link 9 months ago by Pierre Lindenbaum 164k

0

Entering edit mode

Apologies, I pasted the snpEff tsv. . Do you any how do we identify mutations with the data from this table as I have askd above? This is my VCF file :

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  EPI_ISL_2235221
NC_045512.2 241 NC_045512.2_241_C_T C   T   .   PASS    ANN=T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|GU280_gp01|protein_coding||c.-25C>T|||||25|,T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009725297.1|protein_coding||c.-25C>T|||||25|WARNING_TRANSCRIPT_NO_STOP_CODON,T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009742608.1|protein_coding||c.-25C>T|||||25|WARNING_TRANSCRIPT_NO_STOP_CODON,T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|GU280_gp01.2|protein_coding||c.-25C>T|||||25|,T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009725298.1|protein_coding||c.-565C>T|||||565|WARNING_TRANSCRIPT_NO_START_CODON,T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009742609.1|protein_coding||c.-565C>T|||||565|WARNING_TRANSCRIPT_NO_START_CODON,T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009725299.1|protein_coding||c.-2479C>T|||||2479|WARNING_TRANSCRIPT_NO_START_CODON,T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009742610.1|protein_coding||c.-2479C>T|||||2479|WARNING_TRANSCRIPT_NO_START_CODON,T|intergenic_region|MODIFIER|CHR_START-ORF1ab|CHR_START-GU280_gp01|intergenic_region|CHR_START-GU280_gp01|||n.241C>T||||||    GT  1

ADD REPLY • link updated 9 months ago by Pierre Lindenbaum 164k • written 9 months ago by realtreeecat • 0