Hello,
I have 200 human fasta dna files from a region of chr6. Each sequence is 5,500 bp each. I've combined these fasta files and uploaded them into Clustal Omega to generate multiple sequence alignments and phylogenetic tree.
It worked well, however, I would like to convert these sequences into protein and highlights epitopes present in the sequences. What is the best tool to used for this purpose? What format do I need to select for the output?
Thank you so much for your advice.
I am not sure how many sequences you have but you can achieve this with MEGA
MEGA is cool, thanks
Could you please explain what MEGA is and how to find it. Are there other alternatives? Thanks
Please use
ADD REPLY/ADD COMMENT
when responding to existing posts to keep threads logically organized.MEGA is a phylogenetic data analysis package. You can find it here. I am assuming this is the MEGA that @sridhar56 was referring to.
Doing alignments at the nucleotide level can be very different than doing them at the protein level. I hope you were planning to translate the sequences and then redo the MSA? You could use one the EMBOSS tools for doing the translation, if you need a web based option.
@genomax2 Thanks for the reply. You are referring to EMBOSS Transeq, correct? Should I combine all fasta files into one and then translate them, or do I have to do this one-by-one?
Please suggest if should use 1 frame with HLA DPB1 gene or multiple? Should I select 'Standard Code'?
I really appreciate your advice,
If you know the frame you are interested in (and if all the files are in the same frame) then this may be easier to do. Transeq appears to accept multiple sequences from the web interface so you should be able to use a single multi-fasta format file (keeping the frame consideration in mind). Standard code should be fine.
Are the 200 files for the same region/gene or are there multiple locations present?
ExPASy has a simple Web interface translation tool with support for multiple tables if that is what you need.
I'm guessing you want something command line though?
yes, these files are for the same region. Thanks
Then give transeq a try.
unfortunately, EMBL is down today.:( Hopefully it will come back soon.
Try the translate tool at ExPASy.
So far, I've tried EMBOS Transeq and the run aborted before generating anyting for some reason. I've combined multiple fasta files (size was under 1 MB). It worked when I tried it with a really small fasta file. I'm not sure what the problem is? I ran cat *.fas > output.txt to combine multiple fasta files and uploaded this file into Transeq. The message said it was processing data but a few minutes later I received an email about a failure. I will try again tomorrow.
I've also tried ExPasy tool and it generated an output on the screen but it's not clear to me how to download the result. Also, my final goal is to import the result into Culstal Omega to do the alignment and generate a tree, so the format of the output has to be compatible. Should I stick with ExPasy?
As far I understand, the translation has to happen first followed by the alignment in Clustal Omega, correct?
EBI Web sites appear to be undergoing maintenance at the moment. Use the ExPaSy result.
Option 1: Highlight and copy/paste the result data into a separate text file (I assume result is already in fasta format). Be sure to save the file in text format.
Option 2: Choose "file" --> "Save Page as" from your browser window. Be sure to select format as "text file" for the file being saved.
First option may be cleaner. You can then open the file in MEGA or upload to Clustal Omega.
ExPaSy output generates multiple frames (3) and provided 5'3' and 3'5 for each. Do I need to select a particular frame and 5'3'/3'5' sequence to upload into Clustal Omega. All sequences should be in the same frame since they represent different alleles of the same gene. Perhaps I'm wrong? Also, I chose 'compact' output. Thanks
I hesitate to give you a blanket answer without being able to see the data you are using.
You should choose the frame that actually encodes the protein you are interested in. You can determine the frame you need by doing a blastp search with the translated proteins.
If you don't get this right (choose the correct protein) then you could end up doing a lot of work for nothing.
Hello yelekley7!
It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?p=204732
This is typically not recommended as it runs the risk of annoying people in both communities.
Sorry, I'm new to this. However, this forum is far more superior since no one bother replying to my question in the other forum.
The number of responses does not imply superiority or inferiority of one forum versus another. Your question is hard to answer, and it's not clear to me that any of the responses in this thread will resolve it. But once you have resolved it, it would be helpful to post the resolution in all forums in which you have posted the question.
Many participate in both forums and that is precisely the reason why cross-posting leads to duplication of effort. Since I answered your question here I did not do so on SeqAnswers.
And for those only active in one forum, it's also duplication of effort since it doesn't make sense that two persons spend time answering the question each on their favourite forum.