Question

Combining own and public samples from different sequencing strategies to perform GWAS

0

Entering edit mode

2.8 years ago

antmantras ▴ 80

Hi all.

I was asked to perform a GWAS with 50 (Whole Genome Sequencing) samples belonging to several plant varieties that we have at the lab. However, I think a sample size of 50 could be too low for GWAS. So, to increase the number of available samples I was thinking of merging public data from a paper (300 samples performed with genotype by sequencing) at the VCF level, performing the typical filter steps, and then doing the GWAS.

The number of detected SNPs would differ significantly between the samples sequenced with GBS and those sequenced with WGS, however, I think there will be enough data after matching the SNPs obtained and removing samples with missing genotypes. In this paper (link), the authors discussed that one could combine GBS and WGS data as in their study, providing a good posterior filtering. Their data showed that the different samples were more related to themselves than to the sequencing platform used. However, I do not have matched data between my samples and the ones in the paper to perform this approximation (just like the authors in the linked paper had). Would it be correct to combine the public available data from GBS experiments and my samples (sequenced with WGS)? Additionally, should I perform some additional analysis to address batch effects between the two groups of data?

Thanks in advance.

gwas gbs wgs • 599 views

ADD COMMENT • link 2.8 years ago by antmantras ▴ 80