Question

How Does Encode Data Change Design Of Ngs Experiments?

3

Entering edit mode

12.2 years ago

Alex Paciorkowski 3.5k

Now that we've had a week or so to digest the ENCODE publications (nice summary here), this is a question for those groups engaged in next-gen sequencing projects for gene discovery in human disorders. Most of you have probably focused on whole exome.

What elements of the ENCODE data set are ready or near-ready to include in future experiments that capture the "exome-plus"? Are groups designing targets for some of these regions for capture? Which ones? Enhancers? Promotors? Other long-range functional elements? Or do you suspect it's more efficient to just target the whole genome, so that data can be re-analyzed as functional annotations of the non-coding regions continue to improve? Interested in your responses.

encode • 3.7k views

ADD COMMENT • link updated 12.2 years ago by swbarnes2 14k • written 12.2 years ago by Alex Paciorkowski 3.5k

score 5 · Answer 1 · 2012-09-12

5

Entering edit mode

12.2 years ago

Istvan Albert 101k

Allow me to demonstrate

ADD COMMENT • link 12.2 years ago by Istvan Albert 101k

0

Entering edit mode

so, are we going down from the big peak?

ADD REPLY • link 12.2 years ago by JC 13k

1

Entering edit mode

oh I think we are at the technology trigger state

ADD REPLY • link 12.2 years ago by Istvan Albert 101k

0

Entering edit mode

:) Point taken. What're the values on the time axis? Minutes? Hours? Days?

ADD REPLY • link 12.2 years ago by Alex Paciorkowski 3.5k

0

Entering edit mode

I've noticed that even though your proposed time steps span orders of magnitude are all of lengths that a person could easily tolerate. ;-) I have no idea.

I do think that the closer to release the more of a race it is to find that low hanging fruit. I have already heard a few talks of people that are interested in reverse engineering the data to find patterns with little concern to the origins or meaning of it all. It is all binding baby!

ADD REPLY • link 12.2 years ago by Istvan Albert 101k

score 0 · Answer 2 · 2012-09-12

0

Entering edit mode

12.2 years ago

JC 13k

I consider the whole genome sequencing more reliable than the exome capturing techniques, because bias and other missing factor. Besides, as you mentioned, to be able to reanalyze regions.

ADD COMMENT • link 12.2 years ago by JC 13k

0

Entering edit mode

but how would you use the ENCODE data ?

ADD REPLY • link 12.2 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

probably I don't, many of the sequences come from immortalized cell lines, I don't have a direct application in mind for that. For my current projects I prefer the 1000 Genomes data set.

ADD REPLY • link 12.2 years ago by JC 13k

0

Entering edit mode

Thanks, JC. Sure, whole genome seq provides more consistent coverage of the exome, but at what trade-off? In my neighborhood, WGS of 1 sample is about 4x the cost of whole exome of 1 sample, so you can exome a whole trio for less than WGS of 1 sample. Unfortunately funding influences experimental design, especially when data analysis has traditionally focused on the coding regions. My question was more about what elements of the ENCODE data can be incorporated into current analysis workflows.

ADD REPLY • link 12.2 years ago by Alex Paciorkowski 3.5k

0

Entering edit mode

I agree with the money limitation, but I don't think people will expand the exome capture probes right now with ENCODE data, the problem is how much can cost to design and produce specific probes for your regions, at some point, whole genome sequencing will be cheaper.

ADD REPLY • link 12.2 years ago by JC 13k

0

Entering edit mode

True, changing the target capture can be expensive. So let's modify the question -- for those who focus on exome capture, at what point will the possible incorporation of ENCODE data into analysis justify the switch to WGS?

ADD REPLY • link 12.2 years ago by Alex Paciorkowski 3.5k

score 0 · Answer 3 · 2012-09-13

Wasn't ENCODE highly permissive in what they were labeling as biologically active? I think people will have to validate that this stuff is biologically relevent. And maybe some of it will be, but probably not all of it.

As someone on another blog pointed out, the % of non-coding DNA differs widely among species. If so much of our non-coding DNA was important, how are some species getting on with so much less of it?

For instance, there are two closely related onions, and one has a genome 5x as large as the other. Does it make sense to think that one onion really has 5x more going on in its genome than another onion in the same genus?

One group took a mouse, and deleted a 1 Mb region of intergenic DNA, and the mouse was phenotypically indistinguisnable from wild-type. So if there was active stuff in that region, it wasn't doing much, at least in a lab setting.

http://www.nrcresearchpress.com/doi/abs/10.1139/g05-017

http://www.ncbi.nlm.nih.gov/pubmed/15496924