Question

Complete Genomics data analysis, pipeline version and batch effect

0

Entering edit mode

9.1 years ago

MAPK ★ 2.1k

I am working with Complete genomics data from pipeline version 2.5. I need to add 1000 genome data along with my sample and make a multigenome vcf file. Since the 1K genome project data are from 2.0.0 version, I was wondering if this is something I should be concerned about? If there is any batch effect, what would you normally expect in the CG data with 2.0.0 vs 2.5 pipeline version?

Additionally, I would also like to know if mkvcf tool is the right tool to merge multi genome data and make a combined vcf. Is there a proper tool to annotate that vcf?

sequencing batch-effect complete-genomics • 2.0k views

ADD COMMENT • link updated 2.3 years ago by Ram 44k • written 9.1 years ago by MAPK ★ 2.1k

Ram · Answer 1 · 2015-11-12

1

Entering edit mode

9.1 years ago

Dhana ▴ 110

For the annotation part, you can use cgatools join command. Since the data is also from Complete Genomics Inc. it will be easier to use cgatools for most part.

You can use it as;

cgatools join --beta
--input <file1> <file2> \
--match <specifications> \
--overlap <specifications> \
 --select <output_fields_required> \
--output-mode <arg> \
--always-dump

these are the minimum specification you have to provide to run the tool.

ADD COMMENT • link updated 5.0 years ago by Ram 44k • written 9.1 years ago by Dhana ▴ 110

0

Entering edit mode

Thanks Dhana. However, I need to merge all the genomes and looks like join tool only takes two files at a time.

ADD REPLY • link 9.1 years ago by MAPK ★ 2.1k

0

Entering edit mode

Yes the join tool takes only two files as input. But that does not limit its uses, since the tool is able to read input from stdin and pass output to stdout. You can write a loop in bash/python for it to merge all the files.

ADD REPLY • link 9.1 years ago by Dhana ▴ 110