Differences in unique mapping between STAR and BLAST?
0
0
Entering edit mode
2.8 years ago
steel1990 ▴ 20

I am working on a viral alignment project. Using STAR, I aligned RNA data against some viral genomes, then filtered for uniquely mapped reads re STARs default value of MAPQ=255.

Then I built a custom BLAST database using the same viral genome reference sequences, and blasted the uniquely mapped reads against these. I found there are a couple of reads (which uniquely mapped to a single virus with STAR) which match with 100% identity (without any read clipping), to more than one viral genome. How can this happen?

alignment RNAseq BLAST STAR • 2.3k views
ADD COMMENT
0
Entering edit mode

do add some more detail to your question.

what are the exact command lines used? what is the input data? paired-end/single-end ....

ADD REPLY
0
Entering edit mode

Thanks for your response. Paired-end data.. My Star command: (vgenome is a file containing many virus genomes)

STAR --runThreadN {threads} 
        --genomeDir {input.vgenome} 
        --outFileNamePrefix {wildcards.sra}_virus_ 
        --readFilesIn {input.fq1} {input.fq2} 
        --outFilterMismatchNmax 4 
        --outFilterMultimapNmax 1000 
        --limitOutSAMoneReadBytes 1000000

I then grepped for "NH:i:1[[:blank:]]" in the output, which outputs lines of unique mapping (MAPQ=255)...

Made a blastdb (using same virus genomes) and performed blasntn with all standard parameters using snakemake BLASTN and BLAST MAKEBLASTDB wrappers (specifying for tabular output)

https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/blast.html

Then filtered for 100 matches (pident), were length is equal to query length..

Then found some reads (only two or three) which mapped to uniquely to more than one viral genome.. which is surprising, because my inputs were reads which STAR told me mapped with to just one loci/genome..

Is there something in the algorithms that STAR and BLAST use which could cause this? I would like to understand what's going on here..

ADD REPLY
1
Entering edit mode

BLAST as the name indicates is a local aligner. Most NGS aligners are global aligners.

ADD REPLY
0
Entering edit mode

I appreciate that.. But If you read my description, I filtered BLAST hits for length of hit equal to query length... So in this way, my BLAST hits are global.

ADD REPLY
1
Entering edit mode

I think you are confused about the definition of global vs local alignment. The blast results are still local irrespective of filtering.

https://biology.stackexchange.com/questions/11263/what-is-the-difference-between-local-and-global-sequence-alignments

I think STAR is the incorrect tool for alignment against viral genomes. STAR is a splice-aware aligner, what you doing would be better served with bwa or bowtie2. But without knowing what your objectives are, we're just guessing here.

ADD REPLY
0
Entering edit mode

I think perhaps you may be confused? Or maybe I didn't explain clearly enough. So using blast, aligning a sequence, and then looking at results where the query length (the total length of input sequence) is equal to the hit length (the length of matched nucleotides) and including only results in which there is 100% mapping (no mismatches) is in effect global alignment. Although I do appreciate that BLAST wasn't designed for this purpose. re STAR, many viral transcripts undergo splicing, hence the reason for using a splice aware aligner.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7121103/

ADD REPLY
1
Entering edit mode

not sure but it could also be STAR takes the paired-info into account and blast will not (hence the difference)

ADD REPLY
0
Entering edit mode

Interesting.. I didn't think of this.. Do you think you might be able to go into a bit more detail on why you think this might happen?

Many thanks

ADD REPLY
1
Entering edit mode

well, it can be that STAR labels it as unique as both the pairs are unique aligned (or, that only the pair mapping is unique but not either one of the reads of a pair)

ADD REPLY
0
Entering edit mode

So in a bam file the sequences listed are single ends of a pair, and not the mates reconstituted?

Thanks

ADD REPLY
1
Entering edit mode

should know that (but would have to look it up though :) ) but I think so indeed.

ADD REPLY

Login before adding your answer.

Traffic: 2638 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6