I am working with Complete genomics data from pipeline version 2.5. I need to add 1000 genome data along with my sample and make a multigenome vcf file. Since the 1K genome project data are from 2.0.0 version, I was wondering if this is something I should be concerned about? If there is any batch effect, what would you normally expect in the CG data with 2.0.0 vs 2.5 pipeline version?
Additionally, I would also like to know if mkvcf tool is the right tool to merge multi genome data and make a combined vcf. Is there a proper tool to annotate that vcf?
Thanks Dhana. However, I need to merge all the genomes and looks like join tool only takes two files at a time.
Yes the join tool takes only two files as input. But that does not limit its uses, since the tool is able to read input from stdin and pass output to stdout. You can write a loop in bash/python for it to merge all the files.