Rename .fasta headers and save parsed files in new folder [bash]
2
0
Entering edit mode
2.0 years ago
Jimpix ▴ 10

Hi!

I have folder with multiple fasta files, each file has few sequences like this:

>KLTH0E08624g KLTH0E08624g
MAREITDIKEFLELARRADVKTATVKINKKLNKSGKAFRQTKFKVRGSRYLYTLIVNDAG

I need to make a bash script which parse that files to get new headers in each fasta (first 4 letters):

>KLTH
MAREITDIKEFLELARRADVKTATVKINKKLNKSGKAFRQTKFKVRGSRYLYTLIVNDAG

and save these files in another folder. I am new in bash and and I can not handle it by myself. For now I have:

for f in $(ls path_to_folder/GL3*.fasta)
do
    # here bash command to correct that headers and save in:
    "/corrected/$f"
done

Kindly help

fasta bash • 1.1k views
ADD COMMENT
2
Entering edit mode

This is probably the most asked question on the forum - have you looked at other threads for ideas?

ADD REPLY
0
Entering edit mode

I do not know how to extract exactly first 4 letters. I have checked other posts but in is not clear for me.

ADD REPLY
0
Entering edit mode

Here's a hint (you want the first 5 chars if you intend to keep the header marker too (>) ):

"${string:0:5}"

https://stackoverflow.com/questions/8928224/trying-to-retrieve-first-5-characters-from-string-in-bash-error

ADD REPLY
1
Entering edit mode
2.0 years ago
barslmn ★ 2.3k

You can use cut command with -c. https://colab.research.google.com/drive/1O3KUjo7qwV5bLUjy5eAqfgUu3wriJQLE#scrollTo=l7KgHme0vjeY

printf ">KLTH0E08624g KLTH0E08624g\nMAREITDIKEFLELARRADVKTATVKINKKLNKSGKAFRQTKFKVRGSRYLYTLIVNDAG\n" > example.fasta
while IFS='' read -r line; do
  case $line in
    ">"*) echo "$(echo $line | cut -c -5)";;
    *) echo "$line";;
  esac
done < example.fasta

Results in:

>KLTH
MAREITDIKEFLELARRADVKTATVKINKKLNKSGKAFRQTKFKVRGSRYLYTLIVNDAG
ADD COMMENT
1
Entering edit mode
2.0 years ago
5heikki 11k

With awk

awk '{if(/^>/){print substr($0,1,5)}else{print $0}}' input.fasta > output.fasta
ADD COMMENT

Login before adding your answer.

Traffic: 2891 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6