Hi,
I'm observing strange mapping on the miRNA precursors from RNA-seq PE reads library. I have checked the library and almost 100% of the reads are 100nt. So probably bowtie/tophat make some trimming and mapping here (I made a mapping only with bowtie (by Galaxy - option end-to-end fast) the result is picture below. Strangely I observed this only on miRNAs. The mapped fragments are 22-24nt. (PS. This is an general TrueSeq RNA-seq library 100 PE). I'm confused what it can be?
Yes, but the question is how, as there is no small RNAs in the sample library....and no <30nt reads in the file....
I assume these are clipped alignments. If you hover over one of these reads it will display the cigar string... I suggest you post that here. But more importantly, I recommend you adapter-trim your reads prior to mapping, so they don't need to be clipped.
They are adapter-trimmed:
According to that <1% of reads had adapters, doesn't that seem odd?
It doesn't, to me... as long as the average insert size is substantially greater than read length, and particularly if the reads are size-selected to a longer insert size, you can get pretty close to zero percent of reads having adapters. To validate this, an insert-size histogram from mapping or merging is quite useful.
valid point - I agree that a histogram of insert sizes would be useful
RNA was selected 200-400nt, general TrueSeq protocol.
I used Picard to estimate the insert size mean - ~193, so the distance between the pairs is around zero.
It looks to me like these are proper small RNAs, and only a small fraction (<1%) of the reads map to them. Why are you doubting this result? Did you do some chemical process that you expected to remove all of the small RNAs?
I'm doubting because there is no smallRNA reads after adapter removal according histogram (100% of reads fall into 100nt group):
There's something wrong here... you started with 101bp reads. Adapter removal trimmed 0.79% of them and discarded 0.12%. Therefore, you should still have 0.67% of reads remaining that are shorter than 101bp, but the histogram shows zero. Are you sure you ran it on the trimmed reads?
Also, the cigar string of the read mapped is 23M, indicating that there is no clipping and it is 23bp in length.
Oh by bad I have mistaken the filnames so after re-run of the trimmed reads from R1 pair:
So it seems that these are actual smallRNAs probably contaminations in RNA-seq? Do you think I can use it to see come diff.expressed miRs or the mapped reads are very few, As I have some values from cuffdiff on miR genes from that?
The differential expression program will have a statistical model in it that determines whether the number of reads is sufficient to be significant. However, if you did some selection process that was supposed to remove small RNAs, then you should ignore them because that will bias the results.