Dear All,
I have a set of genes where I want to check whether these genes are transcribed in different individuals (from the same species ) or not. I have the RNA-seq data where total mRNA was pooled from different tissues (without any replicates). By checking the coverage of different genes from this data, how correctly I will be able to tell whether a gene is transcribed or not? And what further experiments can confirm this? What should be the threshold for no-coverage (like 0 read-coverage or I should be bit relaxed) ?
Thanks, RT
Hi Devon, Thanks a lot for this. I have now got FPKMs for all my samples. Following explains a bit more about my experiment and objective:
I have data from 30 individuals and a set of 2000 genes, where I am interested to check a). transcriptome evidence of these genes b). core set (out of these 2000 genes) that are showing expression in all the samples. Is it possible to say this on the basis of FPKM threshold? like if I say genes with FPKM below 0.5 are not expressed. If yes, then what would be this threshold?
I feel like I answered a question similar to this a couple days ago but can't find it at the moment. Have a look at a histogram of the FPKMs. If you're lucky, they'll be bimodal, in which case you can set a reasonable threshold (or better yet, fit with two curves and then assign a p-value for the probability of being expressed).
Hi Devon. Thanks for the prompt help. I am new to RNA-seq so thought to double check with you. I have attached a figure for one sample. So for this sample can I say that the genes with FPKM values <0.2 should be considered as not expressed (figure shows log2 transformed values). I have already discarded genes with FPKM < 0.5 to get this graph.
What happens if you just use
hist()
and specify a higher number of breaks? If you already discarded genes with FPKM<0.5 then it looks like the kernel smoothing is making the density plot harder to interpret.Hi Devon, If I plot histogram of my data then it does not say much.
Do something like
hist(something, breaks=50, ylim=c(0,50))
Here is the histogram. Most of the genes have very low FPKM. Is there anything wrong with my dataset? Can you help me with this further?
Try changing xlim and ylim so you get more than an exponential distribution and then post that.