Question

to find same or similar sequences within fasta seq

0

Entering edit mode

10.2 years ago

Kurban ▴ 230

Hello, I am trying to find out the best way to find same or similar sequences to the defined sequence within the transcriptome sequences in fasta file, which is assembled from RNA-seq data. I know there are many tools, but I don't know which one is developed for this purpose. could any one give me some tips? thanks?

similar-sequence • 3.5k views

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Kurban ▴ 230

0

Entering edit mode

I'm not clear on what exactly you are looking to do -- compare sequences from different samples or within the same sample? There are many different strategies to do both - from clustering (usearch/UPARSE/cd-HIT, etc) to alignment (BLAST, etc.). Can you please clarify your original post with your research question?

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Josh Herr 5.8k

0

Entering edit mode

sorry @Josh Herr, I have not been clear.

I have a fasta file which contain around 144,000 transcripts/sequences(transcriptome of an insect). My boss gave me several nucleotide sequences and asked me is there any similar or same sequences in the fasta file with those sequences? If any, which ones and how are their similarity?

I want to align those sequences one by one with the transcriptome(fasta file).

I am new at this kind of analysis

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Kurban ▴ 230

0

Entering edit mode

Sounds like blast would be a good solution. You can install it locally and use it from the command line.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Devon Ryan 105k

0

Entering edit mode

Thank you @Josh Herr ,@Siva and @Geek_y.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Kurban ▴ 230

Ram · Answer 1 · 2014-11-19

1

Entering edit mode

10.2 years ago

Siva ★ 1.9k

You can create a BLAST database of those 144,000 transcripts and do BLASTN search using the nucleotide sequences as query.

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Siva ★ 1.9k

Ram · Answer 2 · 2014-11-19

0

Entering edit mode

10.2 years ago

GouthamAtla 12k

You can use cd-hit-est or usearch for this purpose. They will make one representative sequence from similar sequences, which is based on user defined % similarity.

If you need to compare them against another set of sequences, you need to perform blast or any similar alignment.

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by GouthamAtla 12k

0

Entering edit mode

Free version of usearch (32 bit) will be very slow.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by GouthamAtla 12k