Hi all,
Can anyone advise me on how to rename the header in my fasta file? Say I have a seq.fa file with transcript sequences:
>TR1|c0_g1_i1
GTCGAGCATGGTCTTGGTCATCTTCCTTTCAAAGAA
>TR6|c0_g1_i1
GTGGAATATCGCCAGTGACCATCACTGATTAACCTG
I also have a file with contigs matching the transcripts - names.txt:
TR1|c0_g1_i1 scaf0432344_50037.734_wgs
TR6|c0_g1_i1 scaf0159424_10142.072_wgs
How to I add contig names to the fasta file headers so that the "scaf0..." identifier comes before the "TR..."?
Desired output:
>scaf0432344_50037.734_wgs|TR1|c0_g1_i1
GTCGAGCATGGTCTTGGTCATCTTCCTTTCAAAGAA
>scaf0159424_10142.072_wgs|TR6|c0_g1_i1
GTGGAATATCGCCAGTGACCATCACTGATTAACCTG
Cheers!
There are a lot of posts regarding this issue. I'd suggest you to have a look before asking because I think that this topic has been widely covered:
I have looked for a couple of hours now but didn't manage to find anything matching exactly what I need to do. I'm not trained so things that are not exactly what I need, unfortunately, were not a help to me. Since I don't need to replace the header but rather modify it, this particular post wasn't helpful to me.
Did you try anything? If yes post what you've tried which is not working?
No, sorry, I have no clue how to go about it.
Your fasta headers look like Trinity output. Did you run Trinity on linux? Or were you given the fasta?
I am trying to use above mentioned perl script and modified it and trying to get a desirable output, but I am unable to get it. My input fasta file looks like-
and the text file-
The expected outcome which i need is-
I need to rename the header name before first underscore, and keep the rest of the header name as such.
Any suggestions?
Use SeqKit:
CAN YOU PLZ EXPLAIN THIS COMMAND, giving explanation with parts of it's functions so that i can use it according to my task , bcz i need to just add unique set of strings that i created right after the accession number with an #. for better understanding i eed my files like this-
That works. Thanks!
seqkit is a life saver
Dear Shenwei, As I'm a biologist who is totally new to any programming task would you please tell me what would be the Seqkit's command if I want to do the following.
My input fasta is like
and the text file is:
The expected output is:
I will greatly appreciate if you could help
Firstly preparing the mapping file of accession and GI:
If you have Unix/Linux, it's simple:
If not, you may need help of csvtk which has windows version:
Then you can replace
AFA46815.1
withgi|222528058|ref|AFA46815.1|
:Hi Shenwei
Can you modify this command for me?
This is my fasta header
My output header should look like
Thanks alot
What is the relationship between current fasta header and expected one?
I want to replace orf05188 with geneA