Hello, I'm trying to remove the - symbol and anything after that in the following fasta sequence headers in the gene 1 alignment. I have ~500 genes like this to do the same thing. I could get this done only for one gene alignment but I need some help to reiterate this to ~500 alignments. I prefer .FNA alignments without the number, as new output files or changing the original file is fine too. Can someone help me to figure this out? I would appreciate an explanation on what each symbol does, so that I can learn. Sorry for the bad format in my example alignment.
Thanks!
gene 1
> P_dilatata-COMP100028
ACTGTCTTG
> P_limo-COMP100028
ACTGTCTTC
>P_leuco-COMP100028
ACTGTCTTA
I tried following, this worked for a single file
sed '/>/ s/\(.*\)-.*$/\1/g' test.FNA
This loop didn't work, and keeps running.
for filename in *.FNA; do
sed '/>/ s/\(.*\)-.*$/\1/g';
done
One can even simplify the
sed
command like this:Which means:
>
(/>/
)s/
-
followed by zero or more character (-.*
)//
)g
)Thank you very much for all the solutions JC and Dave. I tried the first one and it worked perfectly!!