Dear All, please, I would like to modify my Fasta file header:
>gi|51039021|ref|NC_006130.1| Streptococcus pyogenes 71-724 plasmid pDN571, complete sequence
to
> Streptococcus pyogenes 71-724 plasmid pDN571
Please could somebody help?
Many thanks
Do you want a general purpose solution for multiple files or do you just want this exact fasta modified?
@genomax and @sej, for running a BLAT, Please, I need:
to
Thank you
Question is do ALL of your fasta headers follow that exact format in terms of where the spaces are etc. That is why we need more than one record.
BTW: This request already does not match what you had originally asked.
Try this in mean time:
awk -F " " '{ if ($0 ~ /^>/) { print ">"$6;} else { print $0}}' input.fa | sed -e 's/,//' > output.fa
@genomax2, Sorry, I was not very clear in my request. It is in fact a multifasta file for conducting a BLAT.
No need to be sorry but we need additional information to ensure that solutions provided so far will work.
I have to transform:
and so on....
to:
and so on....
Can you try the
awk
solution I posted above? It should work. Assumption here is the actual sequence part is left as is.Simple regex for lowercase p followed by 1 or more uppercase/digits should work I think.