Question

Blast output query

0

Entering edit mode

10.7 years ago

The Last Word ▴ 230

I did a local standalone blast pre-miRNAs against a genome in tabular format (-m 8) and got the results. My next steps for refining the results include GC content analysis, RepeatMasker etc. I am currently developing a Perl program to extract the part of the sequence that has matched to any pre-MiRNAs from the Tabular column. My logic include

Matching the supercontig name with a specific sequence block name in the genome file.
Extracting the matched area in between the sequence match and end points mentioned in the tabular file.

In the tabular output, there is query sequence match start and end point as well as subject sequence start and end point. Which should I be using as a start and end point for sequence extraction? Query sequence or subject sequence?

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall_node93.html

I have read published papers on miRNA where they mention a sliding window of 70 or 100 nucleotides on either side of the match area. I presume that these researchers extract 70 nucleotides before the start of the match area as well as 70 nucleotides after the end of the match area. Am I right in presuming this and should I be doing the same thing?

Please help

Blast • 3.3k views

ADD COMMENT • link updated 3.4 years ago by Ram 45k • written 10.7 years ago by The Last Word ▴ 230

Ram · Answer 1 · 2014-07-26

0

Entering edit mode

10.7 years ago

Torst ▴ 980

In response to 2) In the tabular output, there is query sequence match start and end point as well as subject sequence start and end point. Which should I be using as a start and end point for sequence extraction? Query sequence or subject sequence?

The "query" is the input file you specified with the "-i" option, which is your pre-miRNA sequences.

The "subject" is the database you specified with the "-d" option, which is your "genome" (assuming human/mourse)

So the query coordinates will be small numbers (as the pre-miRNAs are short), and the subject numbers will be big (as they are genome coordinates).

ADD COMMENT • link updated 3.4 years ago by Ram 45k • written 10.7 years ago by Torst ▴ 980

0

Entering edit mode

It would be helpful if you could just answer any one part of the question as well.

ADD REPLY • link 10.7 years ago by The Last Word ▴ 230