Entering edit mode
9 months ago
realtreeecat
•
0
I have a generated VCF files for each sequences (around 3500) and I am new to this. From the information in the VCF file, I want to identify mutations (nt, aa, Spik, ORFs,E M) all these Sars-cov-2 genetic variations that have occured frequently or has higher occurance in this 3500 sample population . Is there any tool or CLI to do this? I just want to Identify Mutations that have higher occurences or unique mutations (Although I am not sure what mutations can be labeled 'unique') . Below is the snippet of what my VCF file looks like:
POS REF ALT EFFECT IMPACT GENE GENEID FEATURE FEATUREID BIOTYPE RANK HGVS_C HGVS_P CDNA_POS CDS_POS AA_POS DISTANCE ERRORS
210 G T upstream_gene_variant MODIFIER ORF1ab GU280_gp01 transcript GU280_gp01 protein_coding c.-56G>T 56
241 C T upstream_gene_variant MODIFIER ORF1ab GU280_gp01 transcript GU280_gp01 protein_coding c.-25C>T 25
707 G A missense_variant MODERATE ORF1ab GU280_gp01 transcript GU280_gp01 protein_coding 1/2 c.442G>A p.Glu148Lys 442/21291 442/21291 148/7096
1298 G T missense_variant MODERATE ORF1ab GU280_gp01 transcript GU280_gp01 protein_coding 1/2 c.1033G>T p.Gly345Cys 1033/21291 1033/21291 345/7096
3037 C T synonymous_variant LOW ORF1ab GU280_gp01 transcript GU280_gp01
FYI : this is not a VCF file. This is just a table.
Apologies, I pasted the snpEff tsv. . Do you any how do we identify mutations with the data from this table as I have askd above? This is my VCF file :