Hi, I have a file includes both fasta sequences and non fasta sequences, like this;
454 - PolA 2436284 1.88
454 - 1 CDSl 2436471 - 2436637 17.09 2436471 - 2436635 165
454 - 2 CDSf 2436688 - 2436928 18.36 2436689 - 2436928 240
454 - TSS 2437349 -1.10
enter code here
455 + TSS 2439215 5.09
455 + 1 CDSf 2439438 - 2439570 13.30 2439438 - 2439569 132
Predicted protein(s):
>FGENESH: 1 3 exon (s) 37 - 4224 154 aa, chain +
MCLADYAIICHREGTLHEVV``DPIIRDQIAPQCLRKFAEMTEQCVNEVGTTGGASALRAPG
AGPEAEAREKMCLADYAIICHREGTLHEAVDPIIRDQTRRNASGNSLRRQSNLINGHEAY
TTTARTHETRVVEETGDELANSAAFSQLVRPIGR
>FGENESH: 2 5 exon (s) 5130 - 6247 229 aa, chain -
MAPCQDIVDEGWGWERLVPCRFDGCVKWPDFKRYLVHYYHKNADKKVGELVGMRKPYPVE
QPDGATDDSLHAIVNQCIEAEYRFIRTCREKFTIDDFLLSRDITDRAKQLLQSGCESSIA
TVALLCITKEDELLCELFACQDISKALAFANVIRRSASNLMLFKGSESDAAGGGIMLGLA
REAEVALLAMHSGDEYAIANYITAVDARMRVPWCRCPVAMTTVSEVAAM
How to extract all fasta sequences? I want to get like a file only includes this:
>FGENESH: 1 3 exon (s) 37 - 4224 154 aa, chain +
MCLADYAIICHREGTLHEVVDPIIRDQIAPQCLRKFAEMTEQCVNEVGTTGGASALRAPG
AGPEAEAREKMCLADYAIICHREGTLHEAVDPIIRDQTRRNASGNSLRRQSNLINGHEAY
TTTARTHETRVVEETGDELANSAAFSQLVRPIGR
>FGENESH: 2 5 exon (s) 5130 - 6247 229 aa, chain -
MAPCQDIVDEGWGWERLVPCRFDGCVKWPDFKRYLVHYYHKNADKKVGELVGMRKPYPVE
QPDGATDDSLHAIVNQCIEAEYRFIRTCREKFTIDDFLLSRDITDRAKQLLQSGCESSIA
TVALLCITKEDELLCELFACQDISKALAFANVIRRSASNLMLFKGSESDAAGGGIMLGLA
REAEVALLAMHSGDEYAIANYITAVDARMRVPWCRCPVAMTTVSEVAAM
Thanks in advance.
Can you reformat your post? The site interprets
>
as the beginning of a quotation, so enclose your sequence information in code form using the button with101010
on itAre there always the same amount of header lines before the sequences start?