Hi Everyone
When I used a comprehensive transcript gencode.v27.transcripts.fa
for Salmon mapping , the mapping increased from 43 % to 46.6% for one sample and from 26% to 37.7% in another sample.
Other samples have ~75% mapping.
Can I proceed with the differential expression now, or still too low?
How low is too low mapping to proceed with differential expression analysis?
Thanks
Thanks Devon so much. Any hint how can I find if it is enriched in pre-mRNA or not? Like using STAR, what can I see to make the conclusion it is pre-mRNA enriched?
Look at the BAM files in IGV. You should be able to visually see evidence for pre-mRNAs or other issues. Also make sure to align against rRNA, if it's not present in your reference genome.
Thanks Devon, much appreciated.
@Devon: This is an ongoing issue (multiple posts by @Tania on this from before e.g. Does insert size short length affect mapping rate? )
@Tania: Can you remind us if you ever looked at the unmapped reads using blast? There have been so many similar questions from you (and one other person of late) that I have lost track.
I was checking insert sizes to see if adapters go through reads. I am still not sure how bad my insert sizes distribution is affecting the mapping. But fastqc shows no adapters, so I think of moving forward for now.
I worked on finding the rRNA inclusion, as mentioned in this post, it jumps from 3% in one sample and 10% in another. I aligned those samples with tophat, they have ~68-70% much higher than Salmon. Other samples aligned badly by tophat but much better with salmon, so I can't even rely on one and discard the other aligner/mapper. One post also lost track of, suggested that tophat has more alignment coz it aligns to a genome and has introns and intergenic regions. I took some unmapped reads and blast them they align well. So, I think they are not contaminants. So, I am actually confused and don't know how to proceed so far.
Where and to what genome?
At this point you should be taking the trimmed reads, aligning them to the genome with STAR/BBMap to create aligned BAM's. Count the reads with featureCounts/GTF and then proceed to DESeq2/edgeR DE expression analysis.
I use blat, they are human samples. Could this be related to tumor issues? Those 2 problematic samples are human tumors. You mean, I switch to BBMap then edgeR?
That is new information. Are the rest of the samples normal? Was there anything particular about the two bad libraries? Not enough material? If you already have the salmon files/other alignment files then you can proceed to the next step of analysis. If those libraries are bad quality then nothing you do is going to "fix" this problem, short of trying to re-make the libraries.
Ah, nothing like stringing along a story.