Hi to all,
I have a genome sequence in FASTA FORMAT, and I need to edit and omit the headers which do not contain any sequence as I have to run it in a tool which needs the sequences in FASTA FORMAT.
My sequence is pasted below. Can anyone suggest me a script which can edit the FASTA headers in the script. I have tried using online tools but I have too many sequences, so I cannot do it manually. Can someone suggest me any script? Thanks in advance
>Throat_LANL_12_orf00001 begin=81 end=9 rf=-3 score=2.57
>Throat_LANL_14_orf00002 begin=45 end=212 rf=-3 score=4.34
CTTTACGTAGCGGAAAATTAGATACGGACAGATAAATGTTAGAAGAATTAAATATCGATC
TATCTAGCTTAAAAGCAAATAAAGTAGATAACAGAATATTAGGGGTTGTTNNTTTGACGA
TCTCCGCTCAAAAGAAAAAGAAGAGATCATTCAATGTGTTTACAATGT
>Throat_LANL_15_orf00001 begin=407 end=125 rf=-2 score=6.05
>Throat_LANL_19_orf00001 begin=394 end=7 rf=2 score=1.93
>Throat_LANL_20_orf00001 begin=651 end=445 rf=2 score=2.78
>Throat_LANL_20_orf00003 begin=455 end=393 rf=-2 score=2.95
>Throat_LANL_37_orf00001 begin=116 end=41 rf=-2 score=1.40
>Throat_LANL_47_orf00001 begin=328 end=139 rf=2 score=0.63
>Throat_LANL_52_orf00001 begin=514 end=148 rf=-1 score=0.31
>Throat_LANL_58_orf00001 begin=208 end=40 rf=2 score=0.56
>Throat_LANL_59_orf00001 begin=316 end=262 rf=2 score=5.28
>Throat_LANL_59_orf00003 begin=338 end=266 rf=-2 score=4.74
>Throat_LANL_64_orf00001 begin=619 end=602 rf=-1 score=2.92
>Throat_LANL_73_orf00002 begin=91 end=222 rf=1 score=0.18
GATCGTCAAANNAACAACCCCTAATATTCTGTTATCTACTTTATTTGCTTTTAAGCTAGA
TAGATCGATATTTAATTCTTCTAACATTTATCTGTCCGTATCTAATTTTCCGCTACGTAA
AGCGTCAAGTAA
Example of output (manual edit by moderator)
>Throat_LANL_14_orf00002 begin=45 end=212 rf=-3 score=4.34
CTTTACGTAGCGGAAAATTAGATACGGACAGATAAATGTTAGAAGAATTAAATATCGATC
TATCTAGCTTAAAAGCAAATAAAGTAGATAACAGAATATTAGGGGTTGTTNNTTTGACGA
TCTCCGCTCAAAAGAAAAAGAAGAGATCATTCAATGTGTTTACAATGT
>Throat_LANL_73_orf00002 begin=91 end=222 rf=1 score=0.18
GATCGTCAAANNAACAACCCCTAATATTCTGTTATCTACTTTATTTGCTTTTAAGCTAGA
TAGATCGATATTTAATTCTTCTAACATTTATCTGTCCGTATCTAATTTTCCGCTACGTAA
AGCGTCAAGTAA
can you please add an example of the output that you would expect from the file you have posted? If I have understood it correctly, it would contain only the sequences LAN_73 and LAN14. Can you please confirm it?
thank you for responding to the query, you are right it would only contain LAN_73 and 14. I have given the detailed explanation in the answers.
You have received many answers now. Please vote the ones that you consider useful.