How to calculate amino acid coding changes and predict coding outcomes
0
0
Entering edit mode
2.2 years ago
yoser4 ▴ 10

Hello everyone. I have a snp vcf file of 99 samples from which I screened for my gene of interest and extracted 14 loci in the CDS region of this gene. I would like to know how to use software to calculate amino acid coding changes from screened files and predict coding results. (Actually, the main direction is that I want to know what software can directly convert my snp sites into amino acid sequences. I want to compare the differences between 99 sample proteins, and by the way, make an evolutionary tree)

coding acid Amino changes • 1.6k views
ADD COMMENT
0
Entering edit mode

Do you want to run something like transeq for each sequence? You could then run multiple sequence alignment and generate a tree from the results.

ADD REPLY
0
Entering edit mode

Thank you, Asaf

But I currently only have information on SNPs and I'm trying to figure out how to bring these sites into the sequence and then convert to amino acid sequence (I personally think it might be the logic). And because of my large amount of data, it may be more suitable for an intelligent software that can obtain amino acid sequences from samples through the information in my snp vcf file.

ADD REPLY
0
Entering edit mode

Sounds like a simple Biopython script.

ADD REPLY
0
Entering edit mode

enter image description here

ADD REPLY
0
Entering edit mode

In my imagination, I end up with results similar to the picture by analyzing the snp vcf file. It can be understood as:

  1. The line represents an amino acid sequence
  2. Different lines represent different samples
  3. The squares represent amino acid variations caused by different SNPs (perhaps also consider synonymous and missense mutations).
ADD REPLY
0
Entering edit mode

I am not aware of a tool that could take you directly from variants within a vcf to the protein sequences for the wild-type and all the mutants within a gene (translated to protein) - but I could be wrong. Depending on whether the species that you have is listed on Ensembl- you could solve half of your problem by using Ensembl's VEP tool via REST-API, this would help you in identifying non-synonymous variants within your variant set. However, to do what you are interested in, you first might need to decide on the representative transcript per gene to obtain a gene-specific protein sequence

ADD REPLY
0
Entering edit mode

thank you very much,manaswwm

You are talking about one of my current predicaments, in fact, I tried some "mainstream" software, they will do quick analysis based on the database of ensenbl. But the species I am studying is sheep, which has very little information available, and ensenbl has not updated its annotation and index information in time (a version lower than ncbi). As for its representative transcript (gene specific protein sequence) I think I can check it from ncbi.

ADD REPLY
0
Entering edit mode

I see, in that case - have you tried SnpEff already? I used it a while ago, but if I remember correctly - this tool uses a vcf file (that ideally had the variants of interest) and an annotation file (gff) to predict the consequence of variants and is designed specifically to work when annotation and variant information is available offline (i.e not on Ensembl in this case). On the level of translating these changes into protein sequences - I would say that the best way would be to download all the CDS sequences per transcript and write a short script to create new variant specific sequences and translate them individually. AFAIK Python and R, both, have excellent sequence processing packages that could also translate nucleotide sequences. Again, I am not aware of any tool that could directly do all of these steps (from variant of non-listed species) to protein sequences of mutants - but I could be wrong too

ADD REPLY

Login before adding your answer.

Traffic: 1997 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6