Extracting mutation information from a VCF file.
0
0
Entering edit mode
9 months ago

I have a generated VCF files for each sequences (around 3500) and I am new to this. From the information in the VCF file, I want to identify mutations (nt, aa, Spik, ORFs,E M) all these Sars-cov-2 genetic variations that have occured frequently or has higher occurance in this 3500 sample population . Is there any tool or CLI to do this? I just want to Identify Mutations that have higher occurences or unique mutations (Although I am not sure what mutations can be labeled 'unique') . Below is the snippet of what my VCF file looks like:

POS REF ALT EFFECT  IMPACT  GENE    GENEID  FEATURE FEATUREID   BIOTYPE RANK    HGVS_C  HGVS_P  CDNA_POS    CDS_POS AA_POS  DISTANCE    ERRORS
210 G   T   upstream_gene_variant   MODIFIER    ORF1ab  GU280_gp01  transcript  GU280_gp01  protein_coding      c.-56G>T                    56  
241 C   T   upstream_gene_variant   MODIFIER    ORF1ab  GU280_gp01  transcript  GU280_gp01  protein_coding      c.-25C>T                    25  
707 G   A   missense_variant    MODERATE    ORF1ab  GU280_gp01  transcript  GU280_gp01  protein_coding  1/2 c.442G>A    p.Glu148Lys 442/21291   442/21291   148/7096        
1298    G   T   missense_variant    MODERATE    ORF1ab  GU280_gp01  transcript  GU280_gp01  protein_coding  1/2 c.1033G>T   p.Gly345Cys 1033/21291  1033/21291  345/7096        
3037    C   T   synonymous_variant  LOW ORF1ab  GU280_gp01  transcript  GU280_gp01  
Biopython VCF phylogenetic UShER • 491 views
ADD COMMENT
2
Entering edit mode

Below is the snippet of what my VCF file looks like:

FYI : this is not a VCF file. This is just a table.

ADD REPLY
0
Entering edit mode

Apologies, I pasted the snpEff tsv. . Do you any how do we identify mutations with the data from this table as I have askd above? This is my VCF file :

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  EPI_ISL_2235221
NC_045512.2 241 NC_045512.2_241_C_T C   T   .   PASS    ANN=T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|GU280_gp01|protein_coding||c.-25C>T|||||25|,T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009725297.1|protein_coding||c.-25C>T|||||25|WARNING_TRANSCRIPT_NO_STOP_CODON,T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009742608.1|protein_coding||c.-25C>T|||||25|WARNING_TRANSCRIPT_NO_STOP_CODON,T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|GU280_gp01.2|protein_coding||c.-25C>T|||||25|,T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009725298.1|protein_coding||c.-565C>T|||||565|WARNING_TRANSCRIPT_NO_START_CODON,T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009742609.1|protein_coding||c.-565C>T|||||565|WARNING_TRANSCRIPT_NO_START_CODON,T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009725299.1|protein_coding||c.-2479C>T|||||2479|WARNING_TRANSCRIPT_NO_START_CODON,T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009742610.1|protein_coding||c.-2479C>T|||||2479|WARNING_TRANSCRIPT_NO_START_CODON,T|intergenic_region|MODIFIER|CHR_START-ORF1ab|CHR_START-GU280_gp01|intergenic_region|CHR_START-GU280_gp01|||n.241C>T||||||    GT  1
ADD REPLY

Login before adding your answer.

Traffic: 1797 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6