Question

Transcriptional noise detection and Salmon TPMs

0

Entering edit mode

3.3 years ago

karlaarz ▴ 110

Hello,

I'm analysing RNA-seq data from two datasets (from healthy samples) and created a unique GTF file to identify new isoforms by using StringTie. Then I used Salmon to estimate their TPMs, but I have some questions hoping anyone can help me:

1) Besides PCR, how do I know that these putative novel transcripts are not "transcriptional noise"?

2) I used tximport to import my Salmon outputs as following:

txi <- tximport(files, type="salmon", txOut=TRUE,
                           countsFromAbundance="scaledTPM") cts <- txi$counts cts <- cts[rowSums(cts) > 0,]

This generates a matrix with the TPM value per sample that I used to calculate the median across all samples from the previous matrix to have a "general TPM value" just as a reference for each novel transcript. Is this approach correct?

I'm not interested in DGE nor DTU as I don't have any "condition" to compare against with as my goal is to identify novel isoforms of my gene of interest. Is there also any other feedback you can share?

Thanks!

salmon isoforms tximport rna-seq tpms • 997 views

ADD COMMENT • link 3.2 years ago by karlaarz ▴ 110

score 4 · Answer 1 · 2021-08-25

There is no real way to distingish novel transcripts from transcriptional noise purely bioinformatically.

FIlters that might help are

The transcript is higly expressed
The transcript is spliced
The same transcript appears in several independent samples at roughtly equivalent levels.
Cognate transcripts exist in outher species
The splice sites are conserved.

But in the end its difficult to distinguish transcriptional noise from a functional transcript even experimentally. Generally the only good way is to mutate the transcript and observe a phenotype that is rescued by transgenitcally expressing the transcript from a different location.