Is It Possible To Infer Population Genetics Parameters Like Ne Using De-Novo Sequencing Data Of Pooled Samples?
4
10
Entering edit mode
14.0 years ago
Lhl ▴ 760

Hi there,

I have used 454 GFLX to (de-novo) sequence two plant ecotypes (two divergent populations which adapted to each of their own habitats) by polling 16 individuals from each ecotype. To date, i have finished assembling, SNPs and Indel detection. And i also calculate population parameters like Watterson's Theta (θ = 4Neμ), Pi (which is expected to be equal to theta under neutral equilibrium). However, i am not sure whether it is possible to inferring some other parameters like Ne (effective population size), divergence time of the two ecotypes.

By the way, i would like to know how do you identify SNP outliers for you data if you have done or doing the same thing. Is it good to use a Fst based approach or Fisher exact test?

Elzed

sequencing population analysis • 9.9k views
ADD COMMENT
0
Entering edit mode

What system did you use to detect SNPs and INDELs in your dataset?

ADD REPLY
0
Entering edit mode

Sorry fot the late reply. By system, do you mean softwares? I tried Mosaik && BWA-SW + Samtools to do alignment and SNP calling.

ADD REPLY
4
Entering edit mode
14.0 years ago

Yes, there is some recent effort to solve on this problem, see Futschik & Schlötterer (2010) Genetics.

EDIT: see associated code base at PoPOOLation (Hat tip to RaghuM's answer on this related thread)

ADD COMMENT
4
Entering edit mode

As you probably are aware, under the standard neutral model you can infer Ne from theta if you assume a mutation rate. You'd have to dig deeper or contact the authors about more complex demographic scenarios. You may want to post your question to evoldir (http://evol.mcmaster.ca/evoldir.html) for a more community-specific response to this question.

ADD REPLY
0
Entering edit mode

Yes, thanks. i read the paper. But do you have any ideas about inferring demographic history,like Ne?

ADD REPLY
0
Entering edit mode

Thanks a lot, i will try that.

ADD REPLY
0
Entering edit mode

Thanks Casey, that's a very cool community.

ADD REPLY
0
Entering edit mode

And does that mean i have to identify regions those are under neutral selection? Could i define a neutral region simply based on those having Theta close to 0?

ADD REPLY
0
Entering edit mode

And does that mean i have to identify regions those are under neutral selection? Could i define a neutral region simply based on those having Theta close to Pi?

ADD REPLY
3
Entering edit mode
11.3 years ago

The software PSMC can infer how the effective population size of a species has changed over time, using only one single diploid sequence.

Estimated history of effective population size in human populations, from Li and Durbin 2012:

image taken from Li, Durbin 2012

ADD COMMENT
0
Entering edit mode

I am sorry if I just interrupting the topic discussed above.

Can I know how to scale down Y axis (effective population size)?,

The scale generated on my PSMC plot is too big and the changes in effective population size across time was unable to estimate.

ADD REPLY
2
Entering edit mode
14.0 years ago
David W 4.9k

Have you considered the Extended Bayesian Skyline (Heled and Drummnd 2008, tutorial here).

Presuming you have aligned sequences, you should be able to infer changes in population size (unless you have an estimate of the mutation rate of some of your genes you won't be able to express in it 'real' numbers, but that's not always the goal anyway)

ADD COMMENT
1
Entering edit mode

Just be aware that skyline assumes no recombinations. To counteract this, we should have sufficient number of loci, I think.

ADD REPLY
1
Entering edit mode

I don't about ms (isn't that a simulation program?). To do the Bayesian analysis you'll need to give each 'partition' in your data a substitution model (so non-coding seqs probably don't need teh ful GTR for instance)

One of the problems with using massive multi-loci datasets in this sort of anaylysis is deciding what a partition is. Is tempting to set each locus as one partition, but that can be a PITA computationally and probably over-fits the data. (I don't have to solution to that problem by the way, just a warning ;)

ADD REPLY
0
Entering edit mode

And should i discriminate between coding and non-coding region when using this software?

ADD REPLY
0
Entering edit mode

Thanks. That is a good point. However,should i discriminate between coding and non-coding regions when processing my datasets?

ADD REPLY
0
Entering edit mode

And do you think it is possible to us ms to solve the same problem?

ADD REPLY
0
Entering edit mode

Thanks David. ms is a coalescent simulation software created by Richard R. Hudson at the University of Chicago. It is available at https://webshare.uchicago.edu/xythoswfs/webui/users/rhudson1/Public/ms.folder?action=frameset&subaction=print&uniq=yzld0b&stk=2B23BE1D462EA92

ADD REPLY
1
Entering edit mode
13.8 years ago

Hello!

I have stepped into this thread, which is really interesting. It seems to me that nobody mentioned what looks to me a very important matter. Elzed's data are from 16 pooled individuals, without tagging, right? Is it possible to retrieve the true frequency of each haplotype in each sample? If it is not, how is it possible to use coalescent based algorithms like Beast - EBSP?

I hope this thread is still active, since I guess I am not grasping something and I would really like to know what.

Paolo

ADD COMMENT
0
Entering edit mode

Thanks for your interests in this topic, Paolo. And i am sorry this late reply because of my travelling to another place out of my own country. I have two pools, with each of them consists of 16 individuals. Each pool has a unique tag. Futschik and Schlötterer (2011) proposed a method to estimate population genetics parameters. http://www.genetics.org/cgi/content/full/186/1/207

I would like to continue our discussion over this rub.

ADD REPLY

Login before adding your answer.

Traffic: 1574 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6