Hello everyone, i am new here. Recently i am learning RNA-seq analysis with some mouse cell ,but one of my samples got a very low alignment rates with tophat, then i want to use blast to check the original fasta file to identify is there any contamination like bacterial or other species. however the original file is huge,so i just make a random sample (100 reads) and upload this fasta file to the blast.
However,the alignment results from the batch blast was scattered. and i don't know how to make a summary just for the species distribution analysis. i have checked the local blast and other alignment software,but none of them supply with the taxonomic profiling information. so is there some tools can help me with this problem? or i need to change my strategy?:)
thanks for your attention about this problem.
If you have a "species distribution" instead of just mouse then you may have a much bigger problem on your hand. It may not be wise to use these samples if the predominant data (> 95%) is not mouse. If you have a small amount of contamination you can bin the mouse reads using
bbsplit.sh
from BBMap suite like this:How to remove contamination from NGS dataFirst of all, you should not use tophat anymore for read alignment!!! not my words but those of the developer of tophat, see also here