Merging population vcf files without gvcf
1
0
Entering edit mode
6.0 years ago
Myo Naung ▴ 10

Hi Everyone,

I have two separate raw VCFs dataset processed by GATK version 3.5 (one from the population of ~ 2600 and one from the population of ~160). Since the upstream data cleaning and processing phases were done in elsewhere, I do not have the access to gvcf files. In order to combine these two populations, instead of joint genotyping via gvcf, is it possible to just merge the vcf files using existing tools ? Do you think it will introduce the batch effects or how to minimise it ? I can run from scratch (Bam files), but it will take a lot of computational resources since they are whole genome data. Feel free to contact me if you do not understand my questions.

Sincerely,

Variant Calling next-gen genome • 2.3k views
ADD COMMENT
0
Entering edit mode

I have edited your title to make it more specific about what you are asking, because "Genomics and VCF files" is meaningless.

ADD REPLY
1
Entering edit mode
6.0 years ago

is it possible to just merge the vcf files using existing tools

Definitely. Googling "merge vcf" files will give you plenty of ideas, such as bcftools and vcftools.

Do you think it will introduce the batch effects or how to minimise it ?

Your population A will have no-call (./.) for position in which no variants were found (which should actually be 0/0), if those positions were found only for population B. So your merged vcf will not be perfect. Using GATK or samtools/bcftools there must be a way to "force" the calling of these missing positions to "fill in the blanks". As such you don't need to investigate the whole genome for all samples...

ADD COMMENT
0
Entering edit mode

Thanks for the suggestions

ADD REPLY

Login before adding your answer.

Traffic: 2537 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6