Question

Analyse Population Genomics Data With Different Coverage

3

Entering edit mode

13.2 years ago

Plantae ▴ 390

Hi, all

We have sequenced multiple individual for one species with illumina platform

the sequencing depth for our data: 7 individuals: 60X reads 32 individuals: 2~10X reads

I have called SNPs for all these individuals, now I want to use these SNP data to do further analysis, eg， population structure, LD, FST, etc.

I got strange results when using all individuals in population structure analyses-- individuals with high coverage were clustered together, although they beloning to different sub-populations (some of them are cultivates, ohters wild). And all other individuals (low coverage) were clustered together.

I have checked SNP result, and found high coverage individuals cotain much more SNPs than low coverage individuals.

So, should I exclude all these high cov individuals for further analysis?

next-gen population coverage • 3.1k views

ADD COMMENT • link updated 10.4 years ago by Biostar 20 • written 13.2 years ago by Plantae ▴ 390

0

Entering edit mode

Unfortunately, aside from excluding higher cov. individuals, I would tried downsampling all individuals to your lowest coverage, and then try the analysis.

ADD REPLY • link 10.4 years ago by Adrian Pelin ★ 2.7k

score 0 · Answer 1 · 2012-05-08

For many analysis you do not need all the markers (structure/admixture comes to mind). Indeed you might have to remove markers in LD for some analysis. For these analysis the alternative that you have is use markers that overlap all your sets. So you can exclude the markers that only exist on high cov individuals and use all individuals.

I have used this approach with genotyped data and it worked like a charm (i.e. populations that are closer clustered together, irrespective of original number of markers).

Of course for analysis where marker density is to be maximised, other stategies need to be considered.