I observe weird distribution of intronic reads in my single-nuclei data where specific parts of introns and intergenic regions have unexpectedly high coverage. First I though I did something wrong, but I checked other single cell and single nuclei RNAseq experiments (including publicly available) and apparently it's very common issue. On the picture below I've plotted side by side coverage tracks for my snRNAseq experiment, another snRNAseq, single cell RNAseq and bulk RNAseq experiments. It turns out that other single-cell/single-nuclei datasets reproduce the same picture that I see in my data. This plot shows Wt1 gene but many other genes behave similarly.
https://drive.google.com/file/d/1ulq9yAGS7MAJOl_-Wx_qo_tHE6S0_0gn/view?usp=sharing
Details:
- I mapped reads with STARsolo and then visualised pseudo-bulk read coverage in IGV browser.
- I double-checked genome and gene annotation versions.
- To rule out that STAR does something wrong with single cell data I've inspected also coverage in bam files produced with cell-ranger and it was virtually identical with STAR bams.
To me it seems like single-cell/single-nuclei RNAseq protocols introduce a bias in distribution of intronic reads that was perhaps not well documented previously. Has anyone described this before? How would you explain these results?
Thanks! this seems like a plausible explanation.
UPD: I checked my data and found that spikes in the distribution of intronic reads indeed correspond to polyA/polyT signal!!