I am using MagicBLAST to classify reads obtained from NGS analysis on highly degraded DNA samples.
I chose to use MagicBLAST because I read that it is optimal for short sequences and it allows me to use Fastq format and perform paired-end analysis, unlike blast+.
The command I am running is as follows:
magicblast -query sample.Q20.ml30.pair1.truncated.gz -query_mate sample.Q20.ml30.pair2.truncated.gz -db plantdb-refseq-release-partition1 plantdb-refseq-release-partition2 -infmt fastq -outfmt tabular -out ../../CG/blast/T1.ml30.q20.magicblast.plantdb.partition1-2.txt -splice F
Where plantdb-refseq-release-partition1 and plantdb-refseq-release-partition2 are to alias of a custom database I constructed using makeblastdb, starting from all plant sequences contained in RefseqRelease.
My doubt is regarding the output I obtained. Below are two examples where you can see that the identity is reported as 100. However, if we consider the actual alignment of the query sequences to the reference sequences, it is only partial. In the first case, we can see that the total length of the read is 35, but the alignment only occurs from base 8 to 30, and the same applies to the second query.
Fields: query acc. reference acc. % identity not used not used not used query start query end reference start reference end not used not used score query strand reference strand query length BTOP num placements not used compartment left overhang right overhang mate reference mate ref. start composite score
A00619:170:HJ5HVDRXY:2:2101:17662:1016 NW_025225066.1 100 0 0 0 8 30 166621639 166621661 0 99 23 plus plus 35 23 2 - 1:0 CAGCGGN ATCGA - 166621661 46
A00619:170:HJ5HVDRXY:2:2101:17662:1016 NW_025225066.1 100 0 0 0 6 28 166621661 166621639 0 99 23 plus minus 35 23 2 - 1:0 ATCGA CAGCGGC - 166621639 46
A00619:170:HJ5HVDRXY:2:2101:17662:1016 NC_057802.1 100 0 0 0 8 30 166621639 166621661 0 99 23 plus plus 35 23 2 - 1:1 CAGCGGN ATCGA - 166621661 46
A00619:170:HJ5HVDRXY:2:2101:17662:1016 NC_057802.1 100 0 0 0 6 28 166621661 166621639 0 99 23 plus minus 35 23 2 - 1:1 ATCGA CAGCGGC - 166621639 46
A00619:170:HJ5HVDRXY:2:2101:8504:1031 NW_015379182.1 100 0 0 0 2 21 9442622 9442641 0 99 20 plus plus 31 20 4 - 1:2 A TCGACGACGG - 9442641 40
A00619:170:HJ5HVDRXY:2:2101:8504:1031 NW_015379182.1 100 0 0 0 11 30 9442641 9442622 0 99 20 plus minus 31 20 4 - 1:2 TCGACGACGG A - 9442622 40
A00619:170:HJ5HVDRXY:2:2101:8504:1031 NC_068334.1 100 0 0 0 2 21 8380236 8380255 0 99 20 plus plus 31 20 4 - 1:3 A TCGACGACGG - 8380255 40
A00619:170:HJ5HVDRXY:2:2101:8504:1031 NC_068334.1 100 0 0 0 11 30 8380255 8380236 0 99 20 plus minus 31 20 4 - 1:3 TCGACGACGG A - 8380236 40
A00619:170:HJ5HVDRXY:2:2101:8504:1031 NC_029264.1 100 0 0 0 2 21 9442622 9442641 0 99 20 plus plus 31 20 4 - 1:4 A TCGACGACGG - 9442641 40
A00619:170:HJ5HVDRXY:2:2101:8504:1031 NC_029264.1 100 0 0 0 11 30 9442641 9442622 0 99 20 plus minus 31 20 4 - 1:4 TCGACGACGG A - 9442622 40
Additionally, I tried taking the sequence and running it on the nt database using blastweb, specifying the taxon 'viridiplantae,' but I didn't get any matches.
Can anyone help me understand the situation?
Thank you for your help
couple of things:
BLAST: basic LOCAL alignment search tool , hence it only does local alignments and will as such report the best hit, which is often not at all a full alignment of your input. You can try to re-calculate the scores yourself and compensate for the partial input.
magicblast uses different settings then the 'normal' blast and as such you can (& will) get hits from magicblast that the normal one will not report as they do not pass the minimum scores for normal blast
your custom DB will also have other statistics (mainly number of sequences) than the full NR , and those will also give rise to other scores , such that a given query sequence will have different alignment scores (and again might or might not pas the thresholds)
Your input file names seem to indicate these data are quality filtered, scanned and trimmed reads. If that was not the case I would have suspected adapter contamination.
While it is tempting to use magicblast you may still want to use a tool like
kraken
if you are interested in classifying things. I don't recall if magicblast can output SAM/BAM files (I think it does). Examine these alignments in detail via that mechanism.