How to remove redundant sequences from fasta file ?
0
0
Entering edit mode
3.2 years ago
ANAM • 0

I've fasta file containing nucleotide sequences. How can I remove the redundant sequences?I'm trying to access cd-hit but web server is not available. Is there any other tool available for removing redundancy? I really appreciate any help or suggestion!

fasta cd-hit • 1.2k views
ADD COMMENT
0
Entering edit mode

seqkit rmdup can remove duplicated sequences in a fasta file.

printf '> A1\nATTG\n> A2\nTTTA\n> A3\nATTG' | seqkit rmdup -sP

> A1
ATTG
> A2
TTTA
[INFO] 1 duplicated records removed
ADD REPLY
0
Entering edit mode
$ printf '>A1\nATTG\n>A2\nTTTA\n>A3\nATTG\n' | awk '/^>/ NR > 1 {getline seq; print $0,seq}' | sort -uk2,2 | tr -s " " "\n"

works if sequence is in a single line.

ADD REPLY

Login before adding your answer.

Traffic: 1431 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6