I am wondering what could cause the jump in the plot ?
The plot is generated using Qualimap. BOTH the jump sample shows weird expression data that I am trying to understand why and maybe troubleshoot. I am begin to believe that the sample have DNA contamination but I am wondering how to explain this jump.
I do find some over-represented sequence. However, this is geneBody coverage plot. if a transcript is over-represented, why would it cause a jump at the position and that is like 10 bp ?
If an over-represented sequence is located in the middle of a transcript, then you will see a jump in the middle.
The length of transcripts has a large variation, some transcripts only have less than 100bp. So 10bp over-represented sequence occupied 10% region on these transcripts
so you are saying only a few gene's few sequence got overrepresented in the case and causing this overall jump ? I have check the alignment distribution across the genome which looks normal. DO you think removing the overrepresented sequence would be a way to correct the data ?
One possibility is high expression of a small RNA that is embedded in a longer sequence. For example, we still this sort of a pattern being caused by snoRNAs when we do iCLIP or NET-seq analysis.
Yes, I have seen some data having this problem, but there could be some other reason I don't know.
I don't know how you find the alignment distribution normal, whether to remove the overrepresented sequence should be done after you have precisely known why this jump happens.
I still recommend you to see the coverage per gene first to make decision