Complete Genomics data analysis, pipeline version and batch effect
1
0
Entering edit mode
9.1 years ago
MAPK ★ 2.1k

I am working with Complete genomics data from pipeline version 2.5. I need to add 1000 genome data along with my sample and make a multigenome vcf file. Since the 1K genome project data are from 2.0.0 version, I was wondering if this is something I should be concerned about? If there is any batch effect, what would you normally expect in the CG data with 2.0.0 vs 2.5 pipeline version?

Additionally, I would also like to know if mkvcf tool is the right tool to merge multi genome data and make a combined vcf. Is there a proper tool to annotate that vcf?

sequencing batch-effect complete-genomics • 2.0k views
ADD COMMENT
1
Entering edit mode
9.1 years ago
Dhana ▴ 110

For the annotation part, you can use cgatools join command. Since the data is also from Complete Genomics Inc. it will be easier to use cgatools for most part.

You can use it as;

cgatools join --beta
--input <file1> <file2> \
--match <specifications> \
--overlap <specifications> \
 --select <output_fields_required> \
--output-mode <arg> \
--always-dump

these are the minimum specification you have to provide to run the tool.

ADD COMMENT
0
Entering edit mode

Thanks Dhana. However, I need to merge all the genomes and looks like join tool only takes two files at a time.

ADD REPLY
0
Entering edit mode

Yes the join tool takes only two files as input. But that does not limit its uses, since the tool is able to read input from stdin and pass output to stdout. You can write a loop in bash/python for it to merge all the files.

ADD REPLY

Login before adding your answer.

Traffic: 2519 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6