Entering edit mode
23 months ago
martta95
▴
10
Hello,
I would like to remove duplicate in fasta file based on sequence, not header. The file is a large.
For example:
>A01968:16:HJM3MDSX3:1:1101:7654:1125 1:N:0:ATCACG
GCGTCTGTAGTCCAACGGTTAGGATAATTGCCTTCC
>A01968:16:HJM3MDSX3:1:1101:31096:1141 1:N:0:ATCACG
CTCAGTTTTGTAGTAGGACTCCCACTCTGACATGTT
>A01968:16:HJM3MDSX3:1:1101:27552:1204 1:N:0:ATCACG
CTCAGTTTTGTAGTAGGACTCCCACTCTGACATGTT
>A01968:16:HJM3MDSX3:1:1101:29830:1297 1:N:0:ATCACG
CTCAGTTTTGTAGTAGGACTCCCACTCTGACATGTT
>A01968:16:HJM3MDSX3:1:1101:6017:1329 1:N:0:ATCACG
ACGGGGCATTGTAAGTGAGATCGGAAGAGCCACGTC
and I would like to obtain a file containing only:
>A01968:16:HJM3MDSX3:1:1101:7654:1125 1:N:0:ATCACG
GCGTCTGTAGTCCAACGGTTAGGATAATTGCCTTCC
>A01968:16:HJM3MDSX3:1:1101:31096:1141 1:N:0:ATCACG
CTCAGTTTTGTAGTAGGACTCCCACTCTGACATGTT
>A01968:16:HJM3MDSX3:1:1101:6017:1329 1:N:0:ATCACG
ACGGGGCATTGTAAGTGAGATCGGAAGAGCCACGTC