HOw to merge multifasta sequence into a single sequence having only one header?
4
I have a multifasta sequence file. I want to merge all the sequences together to create a single sequence file.
I men that the ">IDs" in the sequences be removed to create a super sequence. THis would take much time doing mannualy.
how can it be done in linux
THanks
fasta
merge
• 13k views
Using the union command from the EMBOSS package:
$ cat test.fasta
> seq1
AAAATTGGG
> seq2
GGCCCTTTT
> seq3
AAATGGGG
$ union -filter test.fasta
> seq1
AAAATTGGGGGCCCTTTTAAATGGGG
grep -v "^>" test.fasta | awk 'BEGIN { ORS=""; print ">My_New_Sequence_name\n" } { print }' > new.fasta
test.fasta
> seq1
AAAATTGGG
> seq2
GGCCCTTTT
> seq3
AAATGGGG
new.fasta
> My_New_Sequence_name
AAAATTGGGGGCCCTTTTAAATGGGG
cat multifasta.fa | sed -e '1!{/^>.*/d;}' | sed ':a;N;$!ba ;s/\n//2g' > output.fa
E.g:
$ cat ~/test/seqs.fasta
> tpg| Magnaporthiopsis_incrustans| JF414846
ACTGTAGTAGCTACGATCGATCAGATGATCACGTAGCATCGATCGATCATCGACTAGTAGATCACTCGACATAGATCCACATCAATAGATCATCATCATCATAATCGATCACTAGCAGCNNNNNN
> tpg| Pyricularia_pennisetigena| AB818016
NNNNNNGCAAGNTTCATGACGATGTAGAATGGCTTATCGAAGGGAGCAGGCCAGGGATTGAGGTCCGTCTCACGGGTTGGCTTCACTCCCCCACTGCCAGCCCTCTTGCTGCAACTCCACCAGAA
> tpg| Inocybe_sororia| EU525947
NNNAACCANGCCGCGACGGCGGTGCGATCGGGAAACGCGGCGGTGGCGGAGGAATCGGCCATCCTTCACCATATCGGCCAAGGATTGTGGTTCCTGTAGGGCTCGCGCAGCCCAGGACGCGCNNN
$ cat ~/test/seqs.fasta | sed -e '1!{/^>.*/d;}' | sed ':a;N;$!ba ;s/\n//2g'
> tpg| Magnaporthiopsis_incrustans| JF414846
ACTGTAGTAGCTACGATCGATCAGATGATCACGTAGCATCGATCGATCATCGACTAGTAGATCACTCGACATAGATCCACATCAATAGATCATCATCATCATAATCGATCACTAGCAGCNNNNNNNNNNNNGCAAGNTTCATGACGATGTAGAATGGCTTATCGAAGGGAGCAGGCCAGGGATTGAGGTCCGTCTCACGGGTTGGCTTCACTCCCCCACTGCCAGCCCTCTTGCTGCAACTCCACCAGAANNNAACCANGCCGCGACGGCGGTGCGATCGGGAAACGCGGCGGTGGCGGAGGAATCGGCCATCCTTCACCATATCGGCCAAGGATTGTGGTTCCTGTAGGGCTCGCGCAGCCCAGGACGCGCNNN
(retains just the header of the first seq in the multifasta)
Bonus:
If you also want to hard line-wrap the fasta to 80 chars (or whatever), the command becomes;
cat $1 | sed -e '1!{/^>.*/d;}' | sed ':a;N;$!ba ;s/\n//2g' | sed '1!s/.\{80\}/&\n/g'
grep -v '^>' in.fa > out.fa
if in.fa =
> chr1
ttttccccaaaagggg
> chr2
ACTGACTGnnnnACTG
> chr3.1
ACTGACTGaaaac
> chr3.2
ACTGACTGaaaacc
> chr3.3
ACTGACTGaaaaccc
> chr4
ACTGnnnn
> chr5
nnACTG
then out.fa becomes:
ttttccccaaaagggg
ACTGACTGnnnnACTG
ACTGACTGaaaac
ACTGACTGaaaacc
ACTGACTGaaaaccc
ACTGnnnn
nnACTG
Login before adding your answer.
Traffic: 1745 users visited in the last hour
If I may ask, for what need?
@majeedaasim please choose the accept answer option if it works for you, It will help us motivated. Good Luck!