How low is too low mapping to proceed with differential expression analysis?
1
0
Entering edit mode
7.0 years ago
Tania ▴ 180

Hi Everyone

When I used a comprehensive transcript gencode.v27.transcripts.fa for Salmon mapping , the mapping increased from 43 % to 46.6% for one sample and from 26% to 37.7% in another sample. Other samples have ~75% mapping. Can I proceed with the differential expression now, or still too low? How low is too low mapping to proceed with differential expression analysis?

Thanks

RNA-Seq salmon mapping • 2.2k views
ADD COMMENT
1
Entering edit mode
7.0 years ago

Regardless of the exact numbers themselves, it'd be useful to know exactly why such a disparity in mapping percentages exist. If this is simply rRNA inclusion then you can proceed. However if there seems to be some other cause (e.g., problematic samples are enriched in pre-mRNA) might cause you to just exclude the problematic samples. It's probably best to align to the genome with something like STAR so you can try to figure this out.

ADD COMMENT
0
Entering edit mode

Thanks Devon so much. Any hint how can I find if it is enriched in pre-mRNA or not? Like using STAR, what can I see to make the conclusion it is pre-mRNA enriched?

ADD REPLY
1
Entering edit mode

Look at the BAM files in IGV. You should be able to visually see evidence for pre-mRNAs or other issues. Also make sure to align against rRNA, if it's not present in your reference genome.

ADD REPLY
0
Entering edit mode

Thanks Devon, much appreciated.

ADD REPLY
0
Entering edit mode

@Devon: This is an ongoing issue (multiple posts by @Tania on this from before e.g. Does insert size short length affect mapping rate? )

@Tania: Can you remind us if you ever looked at the unmapped reads using blast? There have been so many similar questions from you (and one other person of late) that I have lost track.

ADD REPLY
0
Entering edit mode

I was checking insert sizes to see if adapters go through reads. I am still not sure how bad my insert sizes distribution is affecting the mapping. But fastqc shows no adapters, so I think of moving forward for now.

I worked on finding the rRNA inclusion, as mentioned in this post, it jumps from 3% in one sample and 10% in another. I aligned those samples with tophat, they have ~68-70% much higher than Salmon. Other samples aligned badly by tophat but much better with salmon, so I can't even rely on one and discard the other aligner/mapper. One post also lost track of, suggested that tophat has more alignment coz it aligns to a genome and has introns and intergenic regions. I took some unmapped reads and blast them they align well. So, I think they are not contaminants. So, I am actually confused and don't know how to proceed so far.

ADD REPLY
1
Entering edit mode

I took some unmapped reads and blast them they align well.

Where and to what genome?

At this point you should be taking the trimmed reads, aligning them to the genome with STAR/BBMap to create aligned BAM's. Count the reads with featureCounts/GTF and then proceed to DESeq2/edgeR DE expression analysis.

ADD REPLY
0
Entering edit mode

I use blat, they are human samples. Could this be related to tumor issues? Those 2 problematic samples are human tumors. You mean, I switch to BBMap then edgeR?

ADD REPLY
0
Entering edit mode

That is new information. Are the rest of the samples normal? Was there anything particular about the two bad libraries? Not enough material? If you already have the salmon files/other alignment files then you can proceed to the next step of analysis. If those libraries are bad quality then nothing you do is going to "fix" this problem, short of trying to re-make the libraries.

ADD REPLY
0
Entering edit mode

Ah, nothing like stringing along a story.

ADD REPLY

Login before adding your answer.

Traffic: 1909 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6