How To Determine 'How Complete The Rna-Seq Data Is'?
3
3
Entering edit mode
13.1 years ago
Ken ▴ 170

Hi all, just wondering if anyone has an idea of how to judge how complete the RNA-seq data is? Of course this should depend on what genome it is from. Thanks in advanced.

rna • 4.0k views
ADD COMMENT
1
Entering edit mode

Perhaps you could give a definition of what you mean by 'complete'. All known loci covered by some number of reads? All splice variants represented? if that's you're definition (you'll require billions of reads for a mammalian genome). What are you actually looking for?

ADD REPLY
0
Entering edit mode

This question makes no sense as is. Please clarify.

ADD REPLY
0
Entering edit mode

Hi seidel and neilfws, my intention for 'complete' refers to 'all known loci covered by some number of reads' as what seidel pointed out. Thanks.

ADD REPLY
5
Entering edit mode
13.1 years ago

You can look at gene representation from some fraction (say 50% of your samples) and compare changes in coverage as you add another 10% or 25%, for example, of the reads. You can do this in terms of total number of genes or mRNA isoforms observed as well as representation of some select genes that are expressed to high, moderate and low levels. Basically, you would do this to see where discovery (of expressed genes) starts to plateau.

I have seen this approach presented at genome conferences.

Edit (6 Oct 2011): I don't recall seeing data from the group who authored the paper Istvan mentioned, but the results are indeed similar to those I have heard and observed others discuss. I suggest taking a good look at their figure 1, showing saturation curves. There is, however, much more to this paper that should be explored for those facing similar issues of gene coverage and saturation.

ADD COMMENT
0
Entering edit mode

Do you have a link to an example presentation that uses this? It would be nice to see what it looks like.

ADD REPLY
0
Entering edit mode

No, I don't have anything on hand. If I can find the time, I will try to redraw someone else's data - but that is risky...

ADD REPLY
0
Entering edit mode

Oh no worries then, don't worry about it thank you though

ADD REPLY
0
Entering edit mode

@GWW, I think the paper Istvan suggests is the one.

ADD REPLY
4
Entering edit mode
13.1 years ago

For some ideas consult the paper titled Differential expression in RNA-seq: A matter of depth

ADD COMMENT
4
Entering edit mode
13.1 years ago

Ken, the GenePattern software has a tool that can help you to determine coverage by gene, locus, transcript, etc. - it is called RNAseqMetrics and is available on the GenePattern server at http://genepattern.broadinstitute.org. A publication on this tool is in process. For general information you can go to http://www.genepattern.org.

Best, Michael

ADD COMMENT
0
Entering edit mode

Thank you! Always great when an expert is available to give advice. Welcome to BioStar Michael!

ADD REPLY

Login before adding your answer.

Traffic: 1988 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6