Looking for transcription evidence in pooled-tissue RNA-seq data
1
0
Entering edit mode
10.2 years ago
GR ▴ 400

Dear All,

I have a set of genes where I want to check whether these genes are transcribed in different individuals (from the same species ) or not. I have the RNA-seq data where total mRNA was pooled from different tissues (without any replicates). By checking the coverage of different genes from this data, how correctly I will be able to tell whether a gene is transcribed or not? And what further experiments can confirm this? What should be the threshold for no-coverage (like 0 read-coverage or I should be bit relaxed) ?

Thanks, RT

transcription-evidence RNA-Seq • 2.7k views
ADD COMMENT
1
Entering edit mode
10.2 years ago

It's probably easiest to make a histogram of each sample's FPKM distribution and just threshold things (you'll probably see two peaks, with the right-most one being "expressed" genes). This won't yield 100% certain results...but then again nothing will. You can get much fancier than this, but I don't really know if it's worth it.

For follow-up, qPCR is pretty common. Note that there's a difference between sub-threshold and not expressed (though this is the case for RNAseq as well). Alternatively, you could just run some Westerns, use a protein array, etc.. None of these are perfect.

Zero coverage genes are that way only because of your sequencing depth. There's enough noise in biology to assume that everything is transcribed at some level in a given cell type (at least if you look at enough cells).

ADD COMMENT
0
Entering edit mode

Hi Devon, Thanks a lot for this. I have now got FPKMs for all my samples. Following explains a bit more about my experiment and objective:

I have data from 30 individuals and a set of 2000 genes, where I am interested to check a). transcriptome evidence of these genes b). core set (out of these 2000 genes) that are showing expression in all the samples. Is it possible to say this on the basis of FPKM threshold? like if I say genes with FPKM below 0.5 are not expressed. If yes, then what would be this threshold?

ADD REPLY
0
Entering edit mode

I feel like I answered a question similar to this a couple days ago but can't find it at the moment. Have a look at a histogram of the FPKMs. If you're lucky, they'll be bimodal, in which case you can set a reasonable threshold (or better yet, fit with two curves and then assign a p-value for the probability of being expressed).

ADD REPLY
0
Entering edit mode

Hi Devon. Thanks for the prompt help. I am new to RNA-seq so thought to double check with you. I have attached a figure for one sample. So for this sample can I say that the genes with FPKM values <0.2 should be considered as not expressed (figure shows log2 transformed values). I have already discarded genes with FPKM < 0.5 to get this graph.

ADD REPLY
0
Entering edit mode

What happens if you just use hist() and specify a higher number of breaks? If you already discarded genes with FPKM<0.5 then it looks like the kernel smoothing is making the density plot harder to interpret.

ADD REPLY
0
Entering edit mode

Hi Devon, If I plot histogram of my data then it does not say much.

ADD REPLY
0
Entering edit mode

Do something like hist(something, breaks=50, ylim=c(0,50))

ADD REPLY
0
Entering edit mode

Here is the histogram. Most of the genes have very low FPKM. Is there anything wrong with my dataset? Can you help me with this further?

ADD REPLY
0
Entering edit mode

Try changing xlim and ylim so you get more than an exponential distribution and then post that.

ADD REPLY

Login before adding your answer.

Traffic: 1847 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6