How to take a specific column in sequence header identifiers of fasta file?
I am having my header such as:
>PGM0100236.1 [Candida] scaffold00238
>PGM0100236.1 [Candida] scaffold00239
>PGM0100236.1 [Candida] scaffold00240
>PGM0100236.1 [Candida] scaffold00241
I would like to take my third column alone i.e scaffold00238 for all the headers in my fasta file. Please give a simple command solution. I am new to bioinfo and linux script.
Thank you.
This solution also prints the words
scaffold
losing all other information.What OP wants.
If your file only contains the headers and not the sequence, another easy solution is
If it does contain the sequence then
This assumes that the delimitator between columns is a tab (\t). If it is an empty space, you need to define the delimitator with a
cut -d " " -f3
Neither of these solutions are doing what OP wants as far as I can tell.
OP wants to use a word to modify the header of a multi-fasta file.
palani : Please confirm that you want to change
to
Yes, exactly like that. Thanks for all the response. This is my first time in biostars. I am happy for all the suggestions. Thank you all.
Thank you all for your suggestions, I will try it. I am glad for all your support.