Remove Duplicate Reads From Fasta File
3
0
Entering edit mode
12.2 years ago

Hi all,

I want to remove duplicate reads from my fasta file. I tried to use fastx_collapser. But since my reads contains lowercase letters and hyphens it failed.

Please help.

Thanks,D.

fasta read • 11k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
1
Entering edit mode

It's like everybody wants to remove duplicates here!

ADD REPLY
3
Entering edit mode
12.2 years ago

Try the sequniq tool from the GenomeTools suite:

gt sequniq -o output.fasta input.fasta
ADD COMMENT
0
Entering edit mode

i tried this command, plz could u tell how this command applied...

ADD REPLY
1
Entering edit mode
12.2 years ago
Rm 8.3k

Try CD-hit or Uclust

You can remove unwanted hyphens and convert to uppercase using sed:

echo FaSta-TEst | sed "s/-//g ; s/(.*)/\U&/g"

ADD COMMENT
0
Entering edit mode

Or just tr: echo FaSta-TEst | tr -d - | tr 'a-z' 'A-Z'

ADD REPLY
0
Entering edit mode
7.8 years ago
Eslam Samir ▴ 110

Here is my free program on Github Sequence database curator (https://github.com/Eslam-Samir-Ragab/Sequence-database-curator)

It is a very fast program and it can deal with:

  1. Nucleotide sequences
  2. Protein sequences

It can work under Operating systems:

  1. Windows
  2. Mac
  3. Linux

It also works for:

  1. Fasta format
  2. Fastq format

Best Regards

ADD COMMENT

Login before adding your answer.

Traffic: 2534 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6