HOw to merge multifasta sequence into a single sequence having only one header?
4
I have a multifasta sequence file. I want to merge all the sequences together to create a single sequence file.
I men that the ">IDs" in the sequences be removed to create a super sequence. THis would take much time doing mannualy.
how can it be done in linux
THanks
fasta
merge
• 12k views
Using the union command from the EMBOSS package:
$ cat test.fasta
>seq1
AAAATTGGG
>seq2
GGCCCTTTT
>seq3
AAATGGGG
$ union -filter test.fasta
>seq1
AAAATTGGGGGCCCTTTTAAATGGGG
grep -v "^>" test.fasta | awk 'BEGIN { ORS=""; print ">My_New_Sequence_name\n" } { print }' > new.fasta
test.fasta
>seq1
AAAATTGGG
>seq2
GGCCCTTTT
>seq3
AAATGGGG
new.fasta
>My_New_Sequence_name
AAAATTGGGGGCCCTTTTAAATGGGG
cat multifasta.fa | sed -e '1!{/^>.*/d;}' | sed ':a;N;$!ba;s/\n//2g' > output.fa
E.g:
$ cat ~/test/seqs.fasta
>tpg|Magnaporthiopsis_incrustans|JF414846
ACTGTAGTAGCTACGATCGATCAGATGATCACGTAGCATCGATCGATCATCGACTAGTAGATCACTCGACATAGATCCACATCAATAGATCATCATCATCATAATCGATCACTAGCAGCNNNNNN
>tpg|Pyricularia_pennisetigena|AB818016
NNNNNNGCAAGNTTCATGACGATGTAGAATGGCTTATCGAAGGGAGCAGGCCAGGGATTGAGGTCCGTCTCACGGGTTGGCTTCACTCCCCCACTGCCAGCCCTCTTGCTGCAACTCCACCAGAA
>tpg|Inocybe_sororia|EU525947
NNNAACCANGCCGCGACGGCGGTGCGATCGGGAAACGCGGCGGTGGCGGAGGAATCGGCCATCCTTCACCATATCGGCCAAGGATTGTGGTTCCTGTAGGGCTCGCGCAGCCCAGGACGCGCNNN
$ cat ~/test/seqs.fasta | sed -e '1!{/^>.*/d;}' | sed ':a;N;$!ba;s/\n//2g'
>tpg|Magnaporthiopsis_incrustans|JF414846
ACTGTAGTAGCTACGATCGATCAGATGATCACGTAGCATCGATCGATCATCGACTAGTAGATCACTCGACATAGATCCACATCAATAGATCATCATCATCATAATCGATCACTAGCAGCNNNNNNNNNNNNGCAAGNTTCATGACGATGTAGAATGGCTTATCGAAGGGAGCAGGCCAGGGATTGAGGTCCGTCTCACGGGTTGGCTTCACTCCCCCACTGCCAGCCCTCTTGCTGCAACTCCACCAGAANNNAACCANGCCGCGACGGCGGTGCGATCGGGAAACGCGGCGGTGGCGGAGGAATCGGCCATCCTTCACCATATCGGCCAAGGATTGTGGTTCCTGTAGGGCTCGCGCAGCCCAGGACGCGCNNN
(retains just the header of the first seq in the multifasta)
Bonus:
If you also want to hard line-wrap the fasta to 80 chars (or whatever), the command becomes;
cat $1 | sed -e '1!{/^>.*/d;}' | sed ':a;N;$!ba;s/\n//2g' | sed '1!s/.\{80\}/&\n/g'
grep -v '^>' in.fa > out.fa
if in.fa =
>chr1
ttttccccaaaagggg
>chr2
ACTGACTGnnnnACTG
>chr3.1
ACTGACTGaaaac
>chr3.2
ACTGACTGaaaacc
>chr3.3
ACTGACTGaaaaccc
>chr4
ACTGnnnn
>chr5
nnACTG
then out.fa becomes:
ttttccccaaaagggg
ACTGACTGnnnnACTG
ACTGACTGaaaac
ACTGACTGaaaacc
ACTGACTGaaaaccc
ACTGnnnn
nnACTG
Login before adding your answer.
Traffic: 2701 users visited in the last hour
If I may ask, for what need?
@majeedaasim please choose the accept answer option if it works for you, It will help us motivated. Good Luck!