Hello,
I would like to know how to remove the comments from a list of FASTA sequences: I think that awk could provide a good solution, but I am not able to deal with it for this purpose. I welcome all the possible solutions, but those Bash-based are preferred.
A little bit of history for those who are interested... The fasta/Pearson sequence format as described in the FASTA documentation describes the both the contents of the, commonly used, header line ('>') and additional comment lines (starting with ';') as "comments". In common usage only the header lines are used, and most programs don't support the comment lines. See the Wikipedia article (http://en.wikipedia.org/wiki/FASTA_format) for a description of the full format.
To my knowledge, FASTA format doesn't include "comments": it has a header (ID + description), then sequence. Can you give an example of what you want to remove?
Maybe I call "comment" what you call "description", sorry. An example could be:
I would like to remove the part " range=chr1:3205714-3205863 5'pad=0 3'pad=0 strand=- repeatMasking=none". It could be also useful to know how to remove just the parts " 5'pad=0 3'pad=0 strand=- repeatMasking=none" and the ID (but I agree if you prefer to talk about this in a separate question).
Maybe I call "comment" what you call "description", sorry. An example could be:
It could be also useful to know how to remove just the parts " 5'pad=0 3'pad=0 strand=- repeatMasking=none" and the ID (but I agree if you prefer to talk about this in a separate question).
Maybe I call "comment" what you call "description", sorry. An example could be: >mm9knownGeneuc007aet.1 range=chr1:3815714-3825863 5'pad=0 3'pad=0 strand=- repeatMasking=none. I would like to remove the part " range=chr1:3205714-3205863 5'pad=0 3'pad=0 strand=- repeatMasking=none". It could be also useful to know how to remove just the parts " 5'pad=0 3'pad=0 strand=- repeatMasking=none" and the ID (but I agree if you prefer to talk about this in a separate question).
PS: Thanks for the edit, neilfws: in effect it is very hard to deal with a Phocoenidae using awk :D.