I have an experimental group where I observed the presence of a handful of genes (5 genes) that I was not expecting to see. To assess the validity, I look at the read alignments on IGV and the read phred scores from the SAM files - they appear to be real; the reads are mapping to exonal regions in the gene body and I'm observing corresponding phred scores of 60. I did perform a stranded library protocol, though these reads (and many others) are not colored corresponding to strandedness.
Question 1: Are there any other ways to assess/sanity check to assess whether these reads are "real"?
Upon looking at their read alignments on IGV, there appears to be a number of splice junctions and I'm curious to explore whether these genes are differentially spliced in a group.These genes are NOT differentially expressed at the counts level.
Questions 2 + 3
2. Assuming the reads are real, can I do differential expression on the transcripts (eg ENST) vs. the genes (ENSG) counts? Would that suffice?
3. More specifically: are there recommendations for assessing if a handful of genes are differentially spliced vs. using a software package that considers all possible measured splice junctions that immediately necessitates MH correction? EG for RNA-seq at the counts level upon normalizing for variability in sequencing depth (eg TMM) I could two-tail mannwhitney u a couple genes without having to go through the whole diffex framework (assuming no major covariates).
It does. The reads are displayed as arrows, pointing into the direction of their orientation with regard to strand, or am I missing something?
I meant with the coverage plots, which are a lot more convenient to look at.
Yes, that's true. Misunderstanding from my side. Coverage is indeed unstranded by the individual reads are displayed with orientation. Deeptools can produce bigwigs with stranded coverage, see for example BigWig file of strand-specific RNA-seq data
the reads are typically displayed as arrows and are colored red or blue.
im wondering if something went awry here since the strand specific info is not present for many of the reads in my lib (but is there for some). i did account for the strandedness when counting alignments w htseq-count.
@ATpoint what do you mean by "Coverage is indeed unstranded by the individual reads are displayed with orientation. "
Trivas do you have recs for the quick and dirty analysis/good places to look? I'm currently getting ensnared by conda environments / trying to do the miso index with hg38 ...