combining sequencing variants
2
0
Entering edit mode
10.2 years ago
hmk123 • 0

I have targeted sequencing from 2 different projects. One project was sequenced on SOLiD and the other on Illumina. I have independently called variants on both projects using GATK. Is is possible to combine the called variants (.vcf files) to allow for association analysis with a larger sample size? Do you have a recommendation on a tool that can combine files?

Thanks!

sequencing next-gen • 2.9k views
ADD COMMENT
3
Entering edit mode
10.2 years ago
Emily 24k

VCFtools?

ADD COMMENT
3
Entering edit mode
10.2 years ago
Katie D'Aco ★ 1.1k

You should first normalize the variants if they aren't already. Then use a combination of bedtools intersect (to get the regions where the targets from the two projects overlap) and vcftools merge (to combine the vcf's from the two projects into a single vcf). Before doing any association tests, do LD-pruning and PCA to check for batch effects. Since they were sequenced with different technologies combining them might not be valid.

ADD COMMENT
0
Entering edit mode

Can you describe more what you mean by LD-pruning and PCA checks?

ADD REPLY
0
Entering edit mode

I've used EIGENSTRAT to do PCA, but there are other tools that do this. If you plot your subjects along the first 2 PCs and they are clustered by sequencing technology then you don't want to use the combined data set to do association tests. Before you do PCA you want to do LD-pruning (can be done with EIGENSTRAT).

ADD REPLY

Login before adding your answer.

Traffic: 2450 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6