Remove fasta sequence on the basis of header name
1
0
Entering edit mode
3.8 years ago
harry ▴ 40
>exon18_ENST00000194900|exon19_ENST00000194900
GCTATCAAGAGACTGCAGCAAGCACAACTTTACCCCATTGCCATTTTCATCAAGCCCAAGTCCATTGAAGCCCTTATTACTACCCGGCCTCGACGTGATAATGAGGTGGATGGACAAGACTACCACTTTGTGGTGTCCCGAGAACAAATG
>exon18_ENST00000194900|exon21_ENST00000194900
TCTAACAGAGGAAGAACTGTATTATCAAACCTTATGGTCTACCCATGAATAAATAAAATATTTGTTCAAGCACAAATACTACCCGGCCTCGACGTGATAATGAGGTGGATGGACAAGACTACCACTTTGTGGTGTCCCGAGAACAAATGG
>exon18_ENST00000194900|exon20_ENST00000194900
CAGACATATGAACAAGCAAATAAGATCTATGACAAAGCCATGAAACTGGAGCAGGAATTTGGAGAGTACTTTACAGTACTACCCGGCCTCGACGTGATAATGAGGTGGATGGACAAGACTACCACTTTGTGGTGTCCCGAGAACAAATGG
>exon18_ENST00000194900|exon21_ENST00000194900
CTCTAACAGAGGAAGAACTGTATTATCAAACCTTATGGTCTACCCATGAATAAATAAAATATTTGTTCAAGCACAAATACTACCCGGCCTCGACGTGATAATGAGGTGGATGGACAAGACTACCACTTTGTGGTGTCCCGAGAACAAATG
>exon17_ENST00000194900|>exon17_ENST00000194900
ATGATGACCTGATCTCCGAATTTCCACATAAATTTGGATCCTGTGTGCCACTCACTATGCAAGGCCTGTGATCATCCTGGGCCCAATGAAGGACCGAGTCA
>exon19_ENST00000194900|exon21_ENST00000194900
TCTAACAGAGGAAGAACTGTATTATCAAACCTTATGGTCTACCCATGAATAAATAAAATATTTGTTCAAGCACAAAGCAAGCACTGCATCTTAGATGTTTCCGGCAATGCTATCAAGAGACTGCAGCAAGCACAACTTTACCCCATTGCC
>exon19_ENST00000194900|exon20_ENST00000194900
CAGACATATGAACAAGCAAATAAGATCTATGACAAAGCCATGAAACTGGAGCAGGAATTTGGAGAGTACTTTACAGGCAAGCACTGCATCTTAGATGTTTCCGGCAATGCTATCAAGAGACTGCAGCAAGCACAACTTTACCCCATTGCC
>exon19_ENST00000194900|exon21_ENST00000194900
CTCTAACAGAGGAAGAACTGTATTATCAAACCTTATGGTCTACCCATGAATAAATAAAATATTTGTTCAAGCACAAAGCAAGCACTGCATCTTAGATGTTTCCGGCAATGCTATCAAGAGACTGCAGCAAGCACAACTTTACCCCATTGC

In this example of fasta sequence, you see there is some repeat of fasta sequence many times.for example- exon19_ENST00000194900|exon21_ENST00000194900 , exon18_ENST00000194900|exon21_ENST00000194900 So I want to remove all fasta sequence which has the same header in the fasta file and keep only 1 fasta sequnece. I want to remove fasta sequence on the basis of header not the sequence. Thanks in advance

fasta header • 785 views
ADD COMMENT
1
Entering edit mode

Try seqkit rmdup.

ADD REPLY
1
Entering edit mode
3.8 years ago
5heikki 11k
paste -d $'\t' - - <fastaFileWithNoLinebreaksInSeq | sort -t $'\t' -uk1,1 | awk 'BEGIN{FS="\t";OFS="\n"}{print $1,$2}'
ADD COMMENT

Login before adding your answer.

Traffic: 2444 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6