Hi,
I have different fasta files. I want to keep some part of the headers and add a name to simplify the downstream analysis and since the ids in files are not in continuation, so simply renaming in series using awk won't help. Some of my fasta headers are like this (augustus output file)
>g1134t1 geneg1134
I want to keep the header and just add the species_genus name after >
or better like this
>Species_genus gene1134
Similarly, for file with headers like this,
>AG1IA_00006 contig1:1338:4722:+ [translate_table: standard]
I want to keep >AG1IA_00006
p.s. my OS= Ubuntu16.04
p.p.s. I couldn't find a suitable command in the other similar posts and I also asked there but couldn't get any help. It's a bit urgent.
Thanks in advance.
On Ubuntu you can use the sed command to remove anything followed by a space.
For future reference you can use this book to learn basic Unix and Perl.
http://korflab.ucdavis.edu/Unix_and_Perl/current.pdf
@Sej
Thank you very much for the document and the answer. Let me try the command.
That is not necessarily a good idea, a lot of tools need a unique sequence identifier. Where do you get the species name from by the way?
well, we sequenced and assembled a few genomes, so for the ease of identification, I want to add the respective species_genus name. Right now I want to name the sequences this way for orthofinder and related analysis. It will be easier to visualize the orthlogs/ paralogs. I am keeping the original files for other analysis/ tools.
I would try smth like
sed -e 's/>/>species_name_/g'
the > is not supposed to occur anywhere else in a fasta file, that way you get both species name and unique id.Thanks Michael. I'll try tomorrow and let you know.
@Michael
Hi, it did work, thank you. But, what if I want to keep one out of the two terms here. For
and for
I just want to keep >AG1IA_00006
I did searched for sed. Also in the pdf sent above by Sej. But I could only find, That ‘s’ part of the sed command puts sed in ‘substitute’ mode, where you specify one pattern (between the first two forward slashes) to be replaced by another pattern (specified between the second set of forward slashes). Couldn't find an option to delete some parts selectively. I am a newie and will be grateful if you can help. thanks.