Removing the last part of fasta header in many alignmnet files
2
0
Entering edit mode
5.2 years ago
Badh2 • 0

Hello, I'm trying to remove the - symbol and anything after that in the following fasta sequence headers in the gene 1 alignment. I have ~500 genes like this to do the same thing. I could get this done only for one gene alignment but I need some help to reiterate this to ~500 alignments. I prefer .FNA alignments without the number, as new output files or changing the original file is fine too. Can someone help me to figure this out? I would appreciate an explanation on what each symbol does, so that I can learn. Sorry for the bad format in my example alignment.

Thanks!

gene 1

> P_dilatata-COMP100028
ACTGTCTTG
> P_limo-COMP100028
ACTGTCTTC
>P_leuco-COMP100028
ACTGTCTTA

I tried following, this worked for a single file

sed '/>/ s/\(.*\)-.*$/\1/g' test.FNA

This loop didn't work, and keeps running.

for filename in *.FNA; do
   sed '/>/ s/\(.*\)-.*$/\1/g';
done
sed fasta loop bash • 2.0k views
ADD COMMENT
4
Entering edit mode
5.2 years ago
JC 13k

You need to use the for cycle like this:

for filename in *.FNA; do
   sed '/>/ s/\(.*\)-.*$/\1/g' $filename > ${filename%.FNA}_new.FNA
done

What are you doing is iterating over all *.FNA files, each time you save the file name in the filename variable, so when you exec sed, just use the current value of the variable and save the output as a new file.

ADD COMMENT
1
Entering edit mode

One can even simplify the sed command like this:

sed '/>/ s/-.*//g'

Which means:

  • in every line that contains a > (/>/)
  • substitute (s/
  • a - followed by zero or more character (-.*)
  • with nothing (//)
  • and take as much characters as possible ( g)
ADD REPLY
0
Entering edit mode

Thank you very much for all the solutions JC and Dave. I tried the first one and it worked perfectly!!

ADD REPLY
3
Entering edit mode
5.2 years ago
Dave Carlson ★ 2.1k

Your loop doesn't supply sed with a file to modify. This should work:

for filename in *.FNA; do
   sed '/>/ s/\(.*\)-.*$/\1/g' $filename;
done
ADD COMMENT

Login before adding your answer.

Traffic: 3805 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6