Annotating Rna-Seq Data Using A Reference Genome.
2
3
Entering edit mode
13.2 years ago
Linda ▴ 160

I have RNA-seq reads from a non-model organism. I used cufflinks to identify transcripts. Is there an existing pipeline to BLAST these transcripts to a model organism's proteins to identify orthologs?

rna blast • 4.8k views
ADD COMMENT
2
Entering edit mode
13.2 years ago
Zhidkov ▴ 600

Hi Linda, Regarding BLAST usage: you can download local blast from here: http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download

available databases can be found here: ftp://ftp.ncbi.nlm.nih.gov/blast/db/

most probably you would like to run blastx against nr, or you can download uniref90. I suggest to use tabular format for output (saves some time and space). You can filter out obtained results by alignment length and /or E-value smaller than -5 (for example).

Ilia

ADD COMMENT
0
Entering edit mode

HI @Zhidkov, I have a similar question. To identify the sequence conservation of our de novo assembled transcriptome of a non-model plant, we blasted our transcriptome against several plants' proteome database using blastx (NCBI Blast+ 2.2.26), with the output of tabular format. As you mentioned, we can filter our results by alignment length and/or E-value. Since I already set the e-value to 1e-5, how to set the alignment length in filter? Generally, the value of the alignment length. Thank you. Regards

ADD REPLY
0
Entering edit mode

Hi, I don't really understand where is the problem... if you used default tabular output, column number 4 correspond to alignment length. Do you run blast from command line (terminal) or web-based? In any case, I'm not sure you can set filter "minimum alignment length" as parameter in blast search. If you run blast from command line, you can give something like: 'blastx -query <File_In> -db <your_database> -evalue <your favorite=""> -outfmt 6 |awk -F "\t" '{if ($4>=yourlength) print}' > Tabularblastx.txt'

ADD REPLY
0
Entering edit mode

Hi @Zhidkov, sorry for the late reply. I did run blastx from command line, I just set the evalue to 1e-5, leaving the alignment length undefined, it it ok?

ADD REPLY
0
Entering edit mode
11.8 years ago
Zhidkov ▴ 600

Hi,
the length of alignment is additional filter, if you get to many results using e-value cutoff only, you can stringent your filter by filtering out to short alignments, query coverage etc.
Your data, your goals, your filters.
Just for example: you have transcript-Y 2kb long, after BLASTX you got hit with 1e-7 to X-protein and alignment length was 200bp with several indels, can you conclude that transcript-Y is X-protein?
What will be your filters for reliable annotation in that case?

Ilia

ADD COMMENT
0
Entering edit mode

Hi Ilia,

Thank you very much.

I have a query file contains ~200,000 sequences with various lengths, and the tabular output of blastx also contains many sequences with different lengths, is it possible to set only one alignment length value in the command "...($4>=yourlength)"? If I was wrong, please figure it out.

Regards, lzsph

ADD REPLY
0
Entering edit mode

Yes it possible, (you set a minimum length >= something), if that doesn't feel right for you you can filter on coverage, for example you can demand that at least 50% of your query sequence will be covered. I suggest you to perform small test (you'll feel much more confident after that) - take several known transcripts , run BLASTX against all plants proteins and check which alignments get you unreliable results (i.e you used SOD1 for query but getting p53 as hit), set your filters accordingly.

Ilia

ADD REPLY
0
Entering edit mode

OK, Ilia, I'll get it a try.

Thank you!

Regards,lzsph

ADD REPLY

Login before adding your answer.

Traffic: 2348 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6