Entering edit mode
13 months ago
sumitra.20
•
0
Hi everyone,
I've been trying to edit the headers of my fasta file which is intend to upload on NCBI TSA. Can't seem to successfully upload my file on TSA and if im not mistaken it could be because of the header format. The headers of my fasta file are as below:
>TRINITY_DN1078649_c1_g1_i1 len=235 path=[0:0-234]
>TRINITY_DN1078643_c0_g1_i1 len=204 path=[0:0-203]
and i would like to change them to the format below as suggested by the guideline:
>TRINITY_DN1078696_c0_g1_i1 [modifier=soil metatranscriptome] [moltype=mRNA] [tech=TSA]
I tried using the command below but failed :
sed ‘s/len.*$/[modifier=soil metatranscriptome] [moltype=mRNA] [tech=TSA]/g' trinity_OUT.Trinity.fsa' > trinity_OUT2.Trinity.fsa
I'm new to coding and i can seem to understand where im going wrong. Any help will be appreciated.
Thank you
Have a look at SEDA (https://www.sing-group.org/seda/). Under the rename header operation (https://www.sing-group.org/seda/manual/operations.html#rename-header) you have several options to achieve what you want: first keep only the fields you need (using multipart header) and then add the suffix you want (using add prefix/suffix).
Something is off regarding quotes: right after
sed
you have‘
and it is closing with'
after/g
. They should have been the same character. Moreover, you would not need the extra'
aftertrinity_OUT.Trinity.fsa
. And copy-pasting the error message would be helpful for understanding the nature of the failure.