Question

Extract data from blast results

0

Entering edit mode

6.8 years ago

Janey ▴ 30

Hi

By running this command:

makeblastdb -in Total.assembly.fasta -parse_seqids -dbtype nucl -out my_db

blastn -db my_db -query X.fasta -out results.out

The following results were obtained:

Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
Miller (2000), "A greedy algorithm for aligning DNA sequences", J
Comput Biol 2000; 7(1-2):203-14.

Database: Total.assembly.fasta
           87,103 sequences; 164,122,436 total letters

Query= c41837_g1_i1

Length=1353
                                                                      Score     E
Sequences producing significant alignments:                          (Bits)  Value

c41837_g1_i1  len=1353 path=[1:0-297 299:298-304 @306@!:305-511 5...  2499    0.0


>c41837_g1_i1 len=1353 path=[1:0-297 299:298-304 @306@!:305-511 513:512-532
534:533-730 732:731-733 @735@!:734-1164 1166:1165-1223 1225:1224-1352]
Length=1353

 Score = 2499 bits (1353),  Expect = 0.0
 Identities = 1353/1353 (100%), Gaps = 0/1353 (0%)
 Strand=Plus/Plus

Query  1     CaaaaacaaaaacaaagaaaacttaagaaaaaaTGCGCGCAATCCTCGCTCTTGCATTCA  60
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  1     CAAAAACAAAAACAAAGAAAACTTAAGAAAAAATGCGCGCAATCCTCGCTCTTGCATTCA  60

Query  61    TAGGCGCTGTCTTTGCTCAAACCACCGTCACTGACGTCCTTCAATCATACCGTGTCACCT  120
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  61    TAGGCGCTGTCTTTGCTCAAACCACCGTCACTGACGTCCTTCAATCATACCGTGTCACCT  120

How can filter these data based on the positive and negative numbers or low or high numbers of Scores and Values?

And also, how can extract IDs from these data?

RNA-Seq • 4.4k views

ADD COMMENT • link updated 6.8 years ago by Pierre Lindenbaum 164k • written 6.8 years ago by Janey ▴ 30

0

Entering edit mode

can you elaborate what exactly you mean with "based on the positive and negative numbers or low or high numbers of Scores and Values" ?

ADD REPLY • link 6.8 years ago by lieven.sterck 15k

score 5 · Answer 1 · 2018-03-21

5

Entering edit mode

6.8 years ago

Buffo ★ 2.4k

parse your result file using

blastn -db my_db -query X.fasta -out results.out -outfmt 6

outfmt 6= Tabular format, first column correspond to query ID, second is the subject id

OUTFMT 6 HEADER:

 1.  qseqid  query (e.g., gene) sequence id
 2.  sseqid  subject (e.g., reference genome) sequence id
 3.  pident  percentage of identical matches
 4.  length  alignment length
 5.  mismatch    number of mismatches
 6.  gapopen     number of gap openings
 7.  qstart  start of alignment in query
 8.  qend    end of alignment in query
 9.  sstart  start of alignment in subject
 10.     send    end of alignment in subject
 11.     evalue  expect value
 12.     bitscore    bit score

ADD COMMENT • link 6.8 years ago by Buffo ★ 2.4k

0

Entering edit mode

Thank you very much Buffo I got the answer to my second question, but is there any answer to my first question???

ADD REPLY • link 6.8 years ago by Janey ▴ 30

2

Entering edit mode

copying the results to excel and ordering by 3,11 or 12 column?

ADD REPLY • link 6.8 years ago by Buffo ★ 2.4k

3

Entering edit mode

"copying to excel" ????

BLASPHEMY ! :) , just use linux sort to sort the data based on certain columns.

ADD REPLY • link 6.8 years ago by lieven.sterck 15k

2

Entering edit mode

you probably, but if Janey doesnt know how to parse a blast output I think that sort columns by command line would be more complicated issue. By the way, I´m an enthusiastic reader of biostars because it has been helpful for my bioinformatic problems, but, do you really consider necessary waste time for write answers like that? Do you really consider it helpful? Which is your suggestion for Janey? Blasphemy is criticizing without demonstrating any ability.

ADD REPLY • link 6.8 years ago by Buffo ★ 2.4k

1

Entering edit mode

It was just a joke - he even used :).

I personally prefer :-), as I have a rather beautiful nose, but I guess ugly-nosed people will go for :), or :(.

ADD REPLY • link 6.8 years ago by h.mon 35k

1

Entering edit mode

Parsing blast outputs is a more complicated task than sorting columns to me.

I do agree that any approach that helps to resolve the issue is a good answer, thus to some extent I can follow your reasoning. On the other hand we're here to help (and to teach!) others , so in that context I feel it's common sense you at least provide (the better) alternatives in your answers.

there is no better help than to learn to do things cmdline! ;-)

ADD REPLY • link 6.8 years ago by lieven.sterck 15k

0

Entering edit mode

Thanks to all the friends for their suggestions, especially dear Buffo

ADD REPLY • link 6.8 years ago by Janey ▴ 30