Blast reads classification
0
0
Entering edit mode
17 months ago

I am using MagicBLAST to classify reads obtained from NGS analysis on highly degraded DNA samples.

I chose to use MagicBLAST because I read that it is optimal for short sequences and it allows me to use Fastq format and perform paired-end analysis, unlike blast+.

The command I am running is as follows:

magicblast -query sample.Q20.ml30.pair1.truncated.gz -query_mate sample.Q20.ml30.pair2.truncated.gz -db plantdb-refseq-release-partition1 plantdb-refseq-release-partition2 -infmt fastq -outfmt tabular -out ../../CG/blast/T1.ml30.q20.magicblast.plantdb.partition1-2.txt -splice F

Where plantdb-refseq-release-partition1 and plantdb-refseq-release-partition2 are to alias of a custom database I constructed using makeblastdb, starting from all plant sequences contained in RefseqRelease.

My doubt is regarding the output I obtained. Below are two examples where you can see that the identity is reported as 100. However, if we consider the actual alignment of the query sequences to the reference sequences, it is only partial. In the first case, we can see that the total length of the read is 35, but the alignment only occurs from base 8 to 30, and the same applies to the second query.

Fields: query acc.    reference acc.  % identity      not used        not used        not used        query start     query end       reference start       reference end   not used        not used        score   query strand    reference strand        query length    BTOP   num placements        not used        compartment     left overhang   right overhang  mate reference  mate ref. start composite score

A00619:170:HJ5HVDRXY:2:2101:17662:1016  NW_025225066.1  100     0      0       0       8       30      166621639       166621661       0    99       23      plus    plus    35      23      2       -       1:0   CAGCGGN ATCGA   -       166621661       46 
A00619:170:HJ5HVDRXY:2:2101:17662:1016  NW_025225066.1  100     0      0       0       6       28      166621661       166621639       0    99       23      plus    minus   35      23      2       -       1:0   ATCGA   CAGCGGC -       166621639       46 
A00619:170:HJ5HVDRXY:2:2101:17662:1016  NC_057802.1     100     0      0       0       8       30      166621639       166621661       0    99       23      plus    plus    35      23      2       -       1:1   CAGCGGN ATCGA   -       166621661       46 
A00619:170:HJ5HVDRXY:2:2101:17662:1016  NC_057802.1     100     0      0       0       6       28      166621661       166621639       0    99       23      plus    minus   35      23      2       -       1:1   ATCGA   CAGCGGC -       166621639       46 
A00619:170:HJ5HVDRXY:2:2101:8504:1031   NW_015379182.1  100     0      0       0       2       21      9442622 9442641 0       99      20   plus     plus    31      20      4       -       1:2     A       TCGACGACGG      -       9442641 40 
A00619:170:HJ5HVDRXY:2:2101:8504:1031   NW_015379182.1  100     0      0       0       11      30      9442641 9442622 0       99      20   plus     minus   31      20      4       -       1:2     TCGACGACGG    A       -       9442622 40 
A00619:170:HJ5HVDRXY:2:2101:8504:1031   NC_068334.1     100     0       0       0       2       21      8380236 8380255 0       99      20   plus     plus    31      20      4       -       1:3     A       TCGACGACGG      -       8380255 40 
A00619:170:HJ5HVDRXY:2:2101:8504:1031   NC_068334.1     100     0      0       0       11      30      8380255 8380236 0       99      20   plus     minus   31      20      4       -       1:3     TCGACGACGG    A       -       8380236 40 
A00619:170:HJ5HVDRXY:2:2101:8504:1031   NC_029264.1     100     0       0       0       2       21      9442622 9442641 0       99      20   plus     plus    31      20      4       -       1:4     A       TCGACGACGG      -       9442641 40 
A00619:170:HJ5HVDRXY:2:2101:8504:1031   NC_029264.1     100     0      0       0       11      30      9442641 9442622 0       99      20   plus     minus   31      20      4       -       1:4     TCGACGACGG    A       -       9442622 40

Additionally, I tried taking the sequence and running it on the nt database using blastweb, specifying the taxon 'viridiplantae,' but I didn't get any matches.

Can anyone help me understand the situation?

Thank you for your help

NGS magicblast blast • 798 views
ADD COMMENT
1
Entering edit mode

couple of things:

  • BLAST: basic LOCAL alignment search tool , hence it only does local alignments and will as such report the best hit, which is often not at all a full alignment of your input. You can try to re-calculate the scores yourself and compensate for the partial input.

  • magicblast uses different settings then the 'normal' blast and as such you can (& will) get hits from magicblast that the normal one will not report as they do not pass the minimum scores for normal blast

  • your custom DB will also have other statistics (mainly number of sequences) than the full NR , and those will also give rise to other scores , such that a given query sequence will have different alignment scores (and again might or might not pas the thresholds)

ADD REPLY
1
Entering edit mode

Your input file names seem to indicate these data are quality filtered, scanned and trimmed reads. If that was not the case I would have suspected adapter contamination.

While it is tempting to use magicblast you may still want to use a tool like kraken if you are interested in classifying things. I don't recall if magicblast can output SAM/BAM files (I think it does). Examine these alignments in detail via that mechanism.

ADD REPLY

Login before adding your answer.

Traffic: 1953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6