Question

Looking for transcription evidence in pooled-tissue RNA-seq data

0

Entering edit mode

10.2 years ago

GR ▴ 400

Dear All,

I have a set of genes where I want to check whether these genes are transcribed in different individuals (from the same species ) or not. I have the RNA-seq data where total mRNA was pooled from different tissues (without any replicates). By checking the coverage of different genes from this data, how correctly I will be able to tell whether a gene is transcribed or not? And what further experiments can confirm this? What should be the threshold for no-coverage (like 0 read-coverage or I should be bit relaxed) ?

Thanks, RT

transcription-evidence RNA-Seq • 2.7k views

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by GR ▴ 400

Ram · Answer 1 · 2014-09-08

1

Entering edit mode

10.2 years ago

Devon Ryan 104k

It's probably easiest to make a histogram of each sample's FPKM distribution and just threshold things (you'll probably see two peaks, with the right-most one being "expressed" genes). This won't yield 100% certain results...but then again nothing will. You can get much fancier than this, but I don't really know if it's worth it.

For follow-up, qPCR is pretty common. Note that there's a difference between sub-threshold and not expressed (though this is the case for RNAseq as well). Alternatively, you could just run some Westerns, use a protein array, etc.. None of these are perfect.

Zero coverage genes are that way only because of your sequencing depth. There's enough noise in biology to assume that everything is transcribed at some level in a given cell type (at least if you look at enough cells).

ADD COMMENT • link 10.2 years ago by Devon Ryan 104k

0

Entering edit mode

Hi Devon, Thanks a lot for this. I have now got FPKMs for all my samples. Following explains a bit more about my experiment and objective:

I have data from 30 individuals and a set of 2000 genes, where I am interested to check a). transcriptome evidence of these genes b). core set (out of these 2000 genes) that are showing expression in all the samples. Is it possible to say this on the basis of FPKM threshold? like if I say genes with FPKM below 0.5 are not expressed. If yes, then what would be this threshold?

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by GR ▴ 400

0

Entering edit mode

I feel like I answered a question similar to this a couple days ago but can't find it at the moment. Have a look at a histogram of the FPKMs. If you're lucky, they'll be bimodal, in which case you can set a reasonable threshold (or better yet, fit with two curves and then assign a p-value for the probability of being expressed).

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Devon Ryan 104k

0

Entering edit mode

Hi Devon. Thanks for the prompt help. I am new to RNA-seq so thought to double check with you. I have attached a figure for one sample. So for this sample can I say that the genes with FPKM values <0.2 should be considered as not expressed (figure shows log2 transformed values). I have already discarded genes with FPKM < 0.5 to get this graph.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by GR ▴ 400

0

Entering edit mode

What happens if you just use hist() and specify a higher number of breaks? If you already discarded genes with FPKM<0.5 then it looks like the kernel smoothing is making the density plot harder to interpret.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Devon Ryan 104k

0

Entering edit mode

Hi Devon, If I plot histogram of my data then it does not say much.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by GR ▴ 400

0

Entering edit mode

Do something like hist(something, breaks=50, ylim=c(0,50))

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Devon Ryan 104k

0

Entering edit mode

Here is the histogram. Most of the genes have very low FPKM. Is there anything wrong with my dataset? Can you help me with this further?

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by GR ▴ 400

0

Entering edit mode

Try changing xlim and ylim so you get more than an exponential distribution and then post that.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Devon Ryan 104k