I am doing RNAseq expression analysis. To count reads I've mapped reads into contigs with BWA and STAR.
I read that STAR is very popular in RNAseq mapping, but I was curious how well BWA can handle it.
As a results I have higher percentage of reads mapped for BWA aligner (78.24% vs 65.8%), and I would like to ask - are there any reasons why I should not use BWA for mapping RNAseq data?
BWA isn't splice aware, so is not appropriate if you are mapping RNAseq to the genome - unless you are dealing with bacteria, which have no introns.
If you are mapping to the transcriptome, you can use both, but I've seldom seem BWA used for this purpose, only Bowtie2 and STAR.
However, your mapping rate, specially with STAR, is not very good. Are you mapping to the genome or tanscriptome? What is the quality of the reference?
edit:
I read that STAR is very popular in RNAseq assembly
You mean popular in RNAseq mapping, no?
edit 2: you probably have to tweak STAR index build step to account for the small genome size. Set --genomeSAindexNbases to min(14, log2(GenomeLength)/2 - 1) - GenomeLength is in base pairs.
Although I am not sure, I think assembling super-transcripts: 1) is not necessary, as bacteria don't have differential splicing, 2) could lead to artifacts, such as creating chimeric genes.
I used it to create long genes. After Trans-Abyss I have 163684 contigs. I wanted to create some kind of scaffolds and ended up with super transcripts since there is no reference to assemble. At the end I have ~3000 super transcripts for which 70% have blastp hit.
If you want to check quality of your reference, you can use several tools;
For completeness check;
BUSCO: it checks how many percentage your reference is completed. It uses several databases that have well annotated genes. Find best matched database in BUSCO databases, and use it.
For contamination check;
KRAKEN; it checks whether your reference is contaminated with other organisms; bacteria, viruses etc.
You don't have a genomic DNA reference? Since this is bacterial RNAseq you should be able to use any NGS aligner as noted by @h.mon above. Perhaps you need to tell STAR not to look for splicing and that will improve alignment results.
I don't have a reference, I am doing de novo RNAseq for no model organism. I used bowtie2, which gave me similar results to BWA, but it's more popular in this kind of analyses. I checked STAR with option
Yes, mapping. I am mapping to bacteria transcriptome. The reference include super transcripts. How can I check the "quality of the reference"?
Although I am not sure, I think assembling super-transcripts: 1) is not necessary, as bacteria don't have differential splicing, 2) could lead to artifacts, such as creating chimeric genes.
I became confused so I asked the Author of Super Transcripts, and the answer is here: https://groups.google.com/forum/#!category-topic/oshlack-lab/supertranscript-analysis/aoSwVvo4IMQ
I used it to create long genes. After Trans-Abyss I have 163684 contigs. I wanted to create some kind of scaffolds and ended up with super transcripts since there is no reference to assemble. At the end I have ~3000 super transcripts for which 70% have blastp hit.
I was following this tutorial:
https://github.com/Oshlack/Lace/wiki/Example%3A-Differential-Transcript-Usage-on-a-non-model-organism
If you want to check quality of your reference, you can use several tools;
For completeness check;
BUSCO: it checks how many percentage your reference is completed. It uses several databases that have well annotated genes. Find best matched database in BUSCO databases, and use it.
For contamination check;
KRAKEN; it checks whether your reference is contaminated with other organisms; bacteria, viruses etc.
You don't have a genomic DNA reference? Since this is bacterial RNAseq you should be able to use any NGS aligner as noted by @h.mon above. Perhaps you need to tell STAR not to look for splicing and that will improve alignment results.
I don't have a reference, I am doing de novo RNAseq for no model organism. I used bowtie2, which gave me similar results to BWA, but it's more popular in this kind of analyses. I checked STAR with option
but it did't change anything - still 65.8%.
Thank you very much for help!
Have you checked your transcriptome assembly befogging moving mapping of RNA reads to it?