I did blastn and blastx for my sequences (~400,000 sequences). How do I find and label the start and stop condon for each sequence in a fasta file?
I did blastn and blastx for my sequences (~400,000 sequences). How do I find and label the start and stop condon for each sequence in a fasta file?
I suggest that you read about the genetic code to find the codons relevant to your organism.
You'll want to search for codons, perhaps with a tool like fasgrep. You might write your own script if you have a particular output format in mind.
On second glance, it seems that fasgrep is only useful for searching for sequence identifiers, not the sequences themselves.
Depending upon you got these sequence, it is likely that the start and/or the stop codon are missing
BlastX will be able to find a homologous protein sequence based upon the translation of a internal part of your sequence even though it lack the start and stop codon
For an individual sequence, you can try services like:
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thank you for your information, Kamil. So this fastgrep works like ExPASy? It picks the longest possible translated sequence?
fasgrep is like grep. It searches for a string in a body of text. In your question, you ask about finding codons. I'd recommend using a search tool like grep to find codons.If you have a different goal, you should edit your question. For example, if you wish to find possible coding sequences within a nucleotide sequence, you might consider other tools designed for this purpose:
As you mentioned, ExPASy is a nice portal to find other tools that might meet your needs.
Yeah, I want to identify start and stop codon for each sequence but how do I know the codons grepped by fastgrep are correct for the coding sequence? I mean there are multiple "ATG"s or "TAG"s. Does this program take frame shift into consideration?
Besides, how do I label those codons when I grep them in a fasta file?
If existing programs do not meet your needs, then you should write your own scripts to achieve your goals. If you're familiar with Python, this looks like a good starting point: Identifying open reading frames
Consider providing an example of your input and an example of your desired output. That might increase the clarity of your question.
Great! I'll take a look at the code. Thank you, Kamil!