Question

Sequence Blast Against Rfam Database

3

Entering edit mode

11.4 years ago

liran0921 ▴ 150

Hi All,

I am using blastn to blast my data against tRNAs which is extracted from RFam database. Then I picked one match from the blast result to confirm it by directly searching my read in RFam online (http://rfam.sanger.ac.uk/search).

Here is my read which have a 100% match with a tRNA: CCCATTCTTGCGACCCGGGTTCGATTCCCGGGCGGCGC

This is the matched tRNA from RFam: GCGCCGCTGGTGTAGTGGTATCATGCAAGATTCCCATTCTTGCGACCCGGGTTCGATTCCCGGGCGGCGCA

Clearly it's a perfect match (in bold). But when you search my query in RFam, there is no hit. However, the subject tRNA can be found in RFam.

Could anybody tell me what's the problem?

blast • 6.4k views

ADD COMMENT • link updated 3.1 years ago by Michael 55k • written 11.4 years ago by liran0921 ▴ 150

0

Entering edit mode

Could you paste the exact command you used for the BLAST? And the exact result line? Also, are you aware tha RFam provides a script for just this purpose? Here's the part of the code which is relevant:

$blastcmd = "blastall -p blastn -i $fafile -d $blastdb -e $blastcut -W7 -F F -b 1000000 -v 1000000 -m 8"; (By default $blastcut is 0.01)

ADD REPLY • link 11.4 years ago by David Westergaard ★ 1.5k

0

Entering edit mode

Hi David, I will try the script you mentioned. my blast command is : blastn -query mapped_reads.fa -task megablast -db tRNA_for_blast -out mapped_vs_tRNA.txt -outfmt "6 qseqid sseqid qlen evalue pident nident mismatch qcovs qcovhsp" .

The matched record is: 10002-681 2341-1 38 2e-15 100.00 38 0 100 100

The blantn result shows that it indeed is a perfect match. I am just wondering why this query read can not be found in RFam database. Is it too short?

ADD REPLY • link 11.4 years ago by liran0921 ▴ 150

0

Entering edit mode

I just skimmed through the script and the associated notes which I forgot to link to. Sorry!

It seem that blastn is just an initial step when querying the RFam database to narrow down the search. In the README, try jumping to the section (3) General comments on what rfam_scan.pl is doing. It seems that RFam does an additional step of matching the covariate models cmsearch (The run_multi_infernal_search() subroutine from the perl script.)

ADD REPLY • link 11.4 years ago by David Westergaard ★ 1.5k

0

Entering edit mode

Thanks. It should be better using rfam_scan.pl . But I suppose that if a read is 100% match with a rRNA, then it should be from tRNA regardless of matching with CM model. I still want to know why my read can not be found by RFam online search.

ADD REPLY • link 11.4 years ago by liran0921 ▴ 150

0

Entering edit mode

Have you tried contacting the lovely Rfam folks with this question? (http://rfam.sanger.ac.uk/help#tabview=tab12)

ADD REPLY • link 11.4 years ago by sarahhunter ▴ 600

0

Entering edit mode

Thanks for your advice. i really should try to contact them.

ADD REPLY • link 11.4 years ago by liran0921 ▴ 150

score 4 · Accepted Answer · 2013-08-13

I think that it is because your tRNA sequence is truncated by having short reads. Covariance models use (among other things) information about the internal base-pairing of the RNA, in this case the cloverleaf structure.

(By Yikrazuul (Own work) CC-BY-SA-3.0, via Wikimedia Commons)

If you imagine an extreme case, where your tRNA fragment is exactly divided in half (this is almost the case for your example!), or you by chance get only a large part of the variable loop, then you see that you loose a large portion of the most significant base pairing (acceptor stem, anticodon loop+ stem). That will possibly drop the score immediately, because the structure left would be a single hairpin loop, which are quite frequent and not significant for a tRNA. That means that for truncated sequences, blast might be more sensitive than RFam or tRNAscan-SE.