Hi,
I have a fasta file, which has some same headers like below. They have different sequence but same header. How can I merge them or what should I do? I want to run orthoMCL but it requires unique headers.
>c12358_g1_i9
>c12358_g1_i9
Hi,
I have a fasta file, which has some same headers like below. They have different sequence but same header. How can I merge them or what should I do? I want to run orthoMCL but it requires unique headers.
>c12358_g1_i9
>c12358_g1_i9
I don't know about orthoMCL, but if you just want to change the header and make them unique, do the following (in linux, or install GnuWin32 from here for Windows to get gawk command)
gawk '{if ($0 ~/^>/) {h[$1]++; $1=$1 "_" h[$1]} print}' myfasta.fa > updatedIDs_myfasta.fa
# myfasta.fa is your fasta file.
>c10047_g1_i1|m.4145 c10047_g1_i1|g.4145 ORF c10047_g1_i1|g.4145 c10047_g1_i1|m.4145 type:complete len:387 (-) c10047_g1_i1:511-1671(-)</p>
>c10047_g2_i1|m.4146 c10047_g2_i1|g.4146 ORF c10047_g2_i1|g.4146 c10047_g2_i1|m.4146 type:5prime_partial len:589 (+) c10047_g2_i1:2-1768(+)
These are headers of my fasta file. The same headers I want to merge or remove for my next work. The headers have different sequence.
Oh, this is different from what you gave in the question. In your fasta file, the tools that generated it form unique headers like this: one header: c10047_g1_i1|m.4145
; another header: c10047_g2_i1|m.4146
but orthoMCL propably only consider header before the pipe '|' signs. Therefore you can make this:
gawk 'BEGIN{FS=" "}{if ($0 ~/^>/){gsub("\\|", "pp", $1)} print}' myfasta.fa >updatedIDs_myfasta.fa
Change "pp" to anything you like, but keep it distinguishable.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
It seems that your upstream tool spit out different fragments of the same sequence. Merge them with same padding 'N' may work, but the quicker and better method is to make the headers unique.