Hi
I have the following headers for my fasta files downloaded from IMG/JGI
2648318750 Ga0098755_14192 DNA gyrase subunit B [Microbacterium sp. GCS4 : Ga0098755_14]
I would like this:
Microbacterium sp. GCS4 : Ga0098755_14
The strings/characters are all different for each header. I found this to try:
sed 's/.[([^]])].*/\1/g'
Its works, but I need to keep the '>' at the start to obviously denote each sequence in the fasta file. Is there some parenthesis I can add to keep this character alongside my current command?
Cheers in advance!
Yes, you may use another pair of parenthesis to catch the '>' at the beginning of the line. On a Debian system I also have to use option '-r' to allow referencing of subpatterns.