Question

cDNA to protein conversion

0

Entering edit mode

6.2 years ago

Yoosef ▴ 60

Hello we have identified new mutations in genomic sequence of ABCD1 gene.I have cDNA mutation results which has mentioned the nucleotides changes like c.562 C>A. we need to convert our results to protein sequence in order to perform multiple bioinformatic analysis on its protein.can you please introduce my a database shows that a mutation causes a frameshift or not? Also please introduce me a database which shows proper variant nomenclature. I Know HGVS, it only offers instructions. is there a database which can help in variant nomenclature in case of proteins? I will be so pleased if you could introduce me a book or article which has instructions about DNA to protein conversion. I am really stuck in this simple conversion!!!!

Protein • 5.3k views

ADD COMMENT • link updated 6.2 years ago by Joe 22k • written 6.2 years ago by Yoosef ▴ 60

2

Entering edit mode

Good start would be to find out which transcript was used to describe the nucleotide change. Then you can choose corresponding transcript for example from Ensembl and download its sequence from the box on the left (Sequence > cDNA/Protein) and modify corresponding nucleotide). Good tools for sequence translation is SMS. There is also TransVar tool, which can help you to determine which transcript was possibly used and also convert the nucleodite change to corresponding genomic coordinate. (However for ABCD1:c.562C>A it is not giving any result, so I am not sure if it is just an example or if it is a problem with TransVar or if there was some mistake in describing the nucleotide change, so you should better check it) To predict what the mutation does to your protein, you can use various tools, for example PredictSNP2 or other tools mentioned before (VEP, SnpEff or Annovar). Well, hope it helps...

ADD REPLY • link 6.2 years ago by pristanna ▴ 750

0

Entering edit mode

Thanks for your answer, what about protein nomenclature? is there a database which can help me with the best the name of proteins? Yes the mutation i gave was just an example, c.1978 C>T is one of our identified mutations in ABCD1 gene.

ADD REPLY • link 6.2 years ago by Yoosef ▴ 60

1

Entering edit mode

It is unclear which data you have, please elaborate.

ADD REPLY • link 6.2 years ago by WouterDeCoster 47k

0

Entering edit mode

Thanks for your attention, I have edited as you wish.

ADD REPLY • link 6.2 years ago by Yoosef ▴ 60

1

Entering edit mode

We still don't know in which format your data is. Do you have vcf files? If so the most straightforward way is to annotate your results with tools like VEP, SnpEff or Annovar.

ADD REPLY • link 6.2 years ago by WouterDeCoster 47k

0

Entering edit mode

I have cDNA mutation results which has mentioned the nucleotides changes like c.562 C>A.

ADD REPLY • link 6.2 years ago by Yoosef ▴ 60

1

Entering edit mode

So you just have list of changes which looks like this?

c.562 C>A
t.712 C>G
etc.

Do you have FASTA sequence of the unmutated gene?

ADD REPLY • link 6.2 years ago by ahaswer ▴ 150

0

Entering edit mode

yes,i only have the list of changes. I can find FASTA sequence of unmutated gene from Ensemble database.

ADD REPLY • link 6.2 years ago by Yoosef ▴ 60

0

Entering edit mode

Do you know to which annotation these cDNA mutations correspond?

It would really be a lot easier if you could get the original data.

ADD REPLY • link 6.2 years ago by WouterDeCoster 47k

0

Entering edit mode

Those coordinates look useless without a transcript ID. (And I hope you are not broadcasting real novel results here)

ADD REPLY • link 6.2 years ago by swbarnes2 14k

0

Entering edit mode

I have no access to its NGS results data , I only know which nucleotides have been changed.

ADD REPLY • link 6.2 years ago by Yoosef ▴ 60

1

Entering edit mode

6.2 years ago

pltbiotech_tkarthi ▴ 180

If you have cDNA or complete transcript, you can just use expasy translate and choose the correct single frame from the 3 frames result from sense strand.

ADD COMMENT • link 6.2 years ago by pltbiotech_tkarthi ▴ 180

0

Entering edit mode

Thanks, yes i have used Expasy . But i don't know that i'm using it correctly or not! First, i insert my wild sequences and then it gives me three frames, then i check the correct frame with ensemble database, next i insert my mutant sequence and then choose the same frame and compare it to the wild sequence... am i doing right?

ADD REPLY • link 6.2 years ago by Yoosef ▴ 60

1

Entering edit mode

6.2 years ago

pltbiotech_tkarthi ▴ 180

You can also use tblastx to search your cDNA query using translated nucleotide database to findout any paralogs or orthologs that match with your wild or mutant sequences. https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=tblastx&PAGE_TYPE=BlastSearch&BLAST_SPEC=&LINK_LOC=blasttab

ADD COMMENT • link 6.2 years ago by pltbiotech_tkarthi ▴ 180

score 3 · Accepted Answer · 2019-02-01

Here's a simple workflow for converting all your sequences (I have no idea about variant nomeclature databases or any of that).

You can use your WT sequence, and my code here to generate all of the FASTA sequences that correspond to your mutations.

You'll need a 'map file' which lists the Sequence ID, and the switch that's made:

 SequenceID,A123B
 SequenceID2,X234Y

You'll need to convert your format c.562 C>A to SequenceID,C562A (for example).

It will generate a mutated fasta sequence for each input sequence/mutation.

You can then use this BioPython snippet to read in a file of mutated sequences and translate them to proteins.

from Bio import SeqIO
r = SeqIO.parse('single.fasta' , 'fasta')
for s in r:
    s.translate()
SeqRecord(seq=Seq('MSTTADQIAVQYPIPTYRFVVTIGDEQMCFQSVSGLDISYDTIEYRDGVGNWLQ...FH*', HasStopCodon(ExtendedIUPACProtein(), '*')), id='<unknown id>', name='<unknown name>', description='<unknown description>', dbxrefs=[])