Question

differential splicing between two groups for 5 genes

0

Entering edit mode

23 months ago

wiscoyogi ▴ 40

I have an experimental group where I observed the presence of a handful of genes (5 genes) that I was not expecting to see. To assess the validity, I look at the read alignments on IGV and the read phred scores from the SAM files - they appear to be real; the reads are mapping to exonal regions in the gene body and I'm observing corresponding phred scores of 60. I did perform a stranded library protocol, though these reads (and many others) are not colored corresponding to strandedness.

Question 1: Are there any other ways to assess/sanity check to assess whether these reads are "real"?

Upon looking at their read alignments on IGV, there appears to be a number of splice junctions and I'm curious to explore whether these genes are differentially spliced in a group.These genes are NOT differentially expressed at the counts level.

Questions 2 + 3

2. Assuming the reads are real, can I do differential expression on the transcripts (eg ENST) vs. the genes (ENSG) counts? Would that suffice?

3. More specifically: are there recommendations for assessing if a handful of genes are differentially spliced vs. using a software package that considers all possible measured splice junctions that immediately necessitates MH correction? EG for RNA-seq at the counts level upon normalizing for variability in sequencing depth (eg TMM) I could two-tail mannwhitney u a couple genes without having to go through the whole diffex framework (assuming no major covariates).

diffex genes IGV • 2.2k views

ADD COMMENT • link updated 23 months ago by LauferVA 4.7k • written 23 months ago by wiscoyogi ▴ 40

score 0 · Answer 1 · 2023-06-21

0

Entering edit mode

23 months ago

Trivas ★ 1.9k

IGV doesn't split BAM/SAM files by strand, so you would have to manually split your file into +/- strands then load both of them into IGV separately.
You might be able to do some quick and dirty analysis via sashimi plots/PSI using something like MISO, but I'm also interested in whether a transcriptome-wide splicing analysis is always necessary.

ADD COMMENT • link 23 months ago by Trivas ★ 1.9k

1

Entering edit mode

IGV doesn't split BAM/SAM files by strand

It does. The reads are displayed as arrows, pointing into the direction of their orientation with regard to strand, or am I missing something?

ADD REPLY • link 23 months ago by ATpoint 88k

1

Entering edit mode

I meant with the coverage plots, which are a lot more convenient to look at.

ADD REPLY • link 23 months ago by Trivas ★ 1.9k

0

Entering edit mode

Yes, that's true. Misunderstanding from my side. Coverage is indeed unstranded by the individual reads are displayed with orientation. Deeptools can produce bigwigs with stranded coverage, see for example BigWig file of strand-specific RNA-seq data

ADD REPLY • link 23 months ago by ATpoint 88k

0

Entering edit mode

the reads are typically displayed as arrows and are colored red or blue.

im wondering if something went awry here since the strand specific info is not present for many of the reads in my lib (but is there for some). i did account for the strandedness when counting alignments w htseq-count.

@ATpoint what do you mean by "Coverage is indeed unstranded by the individual reads are displayed with orientation. "

ADD REPLY • link 23 months ago by wiscoyogi ▴ 40

0

Entering edit mode

Trivas do you have recs for the quick and dirty analysis/good places to look? I'm currently getting ensnared by conda environments / trying to do the miso index with hg38 ...

ADD REPLY • link 23 months ago by wiscoyogi ▴ 40

score 0 · Answer 2 · 2023-06-23

in addition to whats been said, a very different (but pretty effective) way of going about it is to mine cell-type specific data (from your phenotype or similar phenotypes in the same cells, if available). In humans, you can usually get pretty close to this depending on the transcript isoforms involved, and whether they contain sequence differences (rather than just differential inclusion of exons). If the latter, see jeremy leipzig's post, below.

in the former case, its still not easy, per se. you would want to not only be certain that the sequences you have are aligning fairly-to-entirely uniquely, but you'd also want to be pretty sure you are calling variants with fairly high accuracy from your RNA-seq data (there are several approaches to doing this now, in fact for certain reasons in the right hands it can be as or more accurate than variant calling by DNA sequencing).

Yet another way to go about this is to download lots and lots of paired DNA RNA samples in the same cell type(s) that you have, and see if you can detect differential splicing by genotype for the genotypes that you think you have in hand from the DNA data and confirmed in your RNA.

score 0 · Answer 3 · 2023-06-23

Assuming the reads are real, can I do differential expression on the transcripts (eg ENST) vs. the genes (ENSG) counts? Would that suffice?

unless you have very long reads, you can't know the actual transcript isoforms, just the exons, and maybe a few junction spanning reads. that's why so many of the differential splicing tools are focused on percent-spliced-in (PSI) of exons.

you could definitely roll your own PSI measurements but a reviewer might ask you why you decided to reinvent the wheel