Question

Is there a good service for annotating transcripts?

0

Entering edit mode

11.2 years ago

will • 0

After getting say 100k transcripts from an rna-seq project, generally one wants to annotate them against a database like nr, using say blastx. Problem is, this is very slow, taking e.g. a week with 24 CPUs. What have people done to overcome this?

blast rna-seq • 2.4k views

ADD COMMENT • link updated 9.9 years ago by Biostar 20 • written 11.2 years ago by will • 0

score 1 · Answer 1 · 2014-06-10

1

Entering edit mode

11.2 years ago

Philipp Bayer 8.8k

I use GNU Parallel to run several BLAST jobs at once with each job getting one CPU, have a look here: Gnu Parallel - Parallelize Serial Command Line Programs Without Changing Them

That way it should only take a day or so.

ADD COMMENT • link 11.2 years ago by Philipp Bayer 8.8k

Ram · Answer 2 · 2014-06-10

1

Entering edit mode

11.2 years ago

mikhail.shugay 3.5k

Check out those guidelines: http://trinotate.sourceforge.net/ . In some cases blastp + domain prediction are quite enough for annotation.. By the way, have you considered using Cloud services?

ADD COMMENT • link updated 5.9 years ago by Ram 45k • written 11.2 years ago by mikhail.shugay 3.5k

score 0 · Answer 3 · 2014-06-10

0

Entering edit mode

11.2 years ago

Prakki Rama ★ 2.7k

Another possibility can be reducing the database size you search in. Instead of taking complete NR database, you can take species which are very near in the tree, and also some what distant species sequences which are comprehensively studied and have substantial information such as human, mouse etc.

This way it reduces the search space in magnitudes, and most of your sequences should get annotated. But, there are also chances you might not be able to annotate a small fraction of your 100K transcripts.

~Rama.

ADD COMMENT • link 11.2 years ago by Prakki Rama ★ 2.7k

0

Entering edit mode

Expanding on this answer: Have a look at SwissProt, it's manually curated so you get less noisy results, but it's also much smaller than nr, so you'll get less results in much faster time.

ADD REPLY • link 11.2 years ago by Philipp Bayer 8.8k