Hi All
I have some samples with low mapping in Salmon (40% and less) that have higher alignments in Tophat, and trying to troubleshoot.
I picked some of the unmapped reads (from writeunmapped salmon parameter) and Blat them to human.
Some have 2 or more matches with identity 99% to 100% And some have many many matches, I need to scroll the page down too much. Many of these matches are 100% and some range between 85% to 100% identity.
I looked also into the “ambig_info.tsv’ , found some records with 0 unique mapping and more than 100 ambiguous mapping, but couldn’t relate to those unmapped.
This is how one match of one of them look in one mate and the other:
ACTIONS QUERY SCORE START END QSIZE IDENTITY CHRO STRAND START END SPAN
browser details YourSeq 22 100 122 151 100.0% 10 + 37378275 37378303 29
browser details YourSeq 22 52 74 151 100.0% 10 - 37378275 37378303 29
So why this is not counted as mapped for example? Any hint, clue?
Thanks
Take a look at: Salmon very low mapping
Even if they are aligning/mapping then you don't want to use them while counting since you are not sure where the read came from.
bbmap.sh
has an option of placing the read at one of all best locations (ambig=random
). You could try using that option to recover some of this data.so 3 questions please:
You are not
fixing
it. It is one way of handling multi-mappers. Take a look at theambig=
options to see others. If you want to be strict about it then you throw the multi-mappers away/not count them.You can use the BBMap generated alignment for featureCounts and then DESeq2/edgeR.
There are genes with multiple copies (e.g. rDNA repeat, there are ~400-500 copies in human genome). If your reads are short(er) there is a chance that they may spuriously map in multiple locations (besides the copy example).
TopHat will by default place a multi-mapper in up to 20 top spots where read aligns well (you can check on that number).
Got it and many many thanks for the explanation genomax. Much appreciated !
Another one with high span that doesn't map too: