Combining Data Of Multiple Vcfs Into One.
7
7
Entering edit mode
11.5 years ago
Sheila ▴ 460

I have a number of VCF files, where each VCF file possesses variant data for a single patient (this is the way Illumina provides their data). Is it possible to combine all of the data for the patients into one VCF file? If so, how? Can I use plink/seq to do this?!

Any suggestions and leads would be extremely helpful.

vcf variant-calling • 40k views
ADD COMMENT
7
Entering edit mode
11.5 years ago
William ★ 5.3k

GATK CombineVariants, see:

From the above link usage examples:

Merge two separate callsets

java -jar GenomeAnalysisTK.jar \
   -T CombineVariants \
   -R reference.fasta \
   --variant input1.vcf \
   --variant input2.vcf \
   -o output.vcf \
   -genotypeMergeOptions UNIQUIFY

Get the union of calls made on the same samples

 java -jar GenomeAnalysisTK.jar \
   -T CombineVariants \
   -R reference.fasta \
   --variant:foo input1.vcf \
   --variant:bar input2.vcf \
   -o output.vcf \
   -genotypeMergeOptions PRIORITIZE \
   -priority foo,bar
ADD COMMENT
4
Entering edit mode
11.5 years ago

Related duplicate post

Use vcf-merge

ADD COMMENT
0
Entering edit mode

Thanks. Is it possible to do this with plink/seq too?

ADD REPLY
0
Entering edit mode

I keep getting the following error when I try to use this command (even after loading the module). Do you have any idea what the problem might be?

Please check that your locale settings: LANGUAGE = (unset), LC_ALL = (unset), LC_CTYPE = "UTF-8", LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). sh: tabix: command not found which: no tabix in ... The command "tabix" not found, please add it to your PATH

ADD REPLY
0
Entering edit mode

Please read the error message. It says you don't have tabix in your PATH - do you have tabix/bgzip installed and accessible?

ADD REPLY
2
Entering edit mode
11.5 years ago

Another option is joinx:

joinx vcf-merge [OPTIONS] file1.vcf file2.vcf [file3.vcf ...]
ADD COMMENT
0
Entering edit mode

Hi Malachi,

How will joinx behave when score annotation is absent for reference calls? I have a rather large bunch of VCF files with calls for all positions (gVCF?) but the annotation is different between positions with and without a call GT:DP versus GT:AD:DP:GQ:PL

I tried with the lastest version of bcftools and it seems to merge / report multiple lines randomly.

Will joinx use the snp DP as GT DP for ref calls?

thanks!

Jack

ADD REPLY
2
Entering edit mode
11.5 years ago
ewre ▴ 250

Since you are operating vcf files, vcftools would be a good choice, try

vcf-merge a.vcf.gz b.vcf.gz ... > combined.vcf.gz
ADD COMMENT
2
Entering edit mode
11.5 years ago
zx8754 12k

You can load multiple VCF to one plink/seq project, then output the project as one VCF.

pseq /path/to/project load-vcf

Given a project file has been created (/path/to/project) and contains 1 or more VCF files, this command loads these VCF files into the variant-database.

ADD COMMENT
0
Entering edit mode

Thanks! This is helpful! I'm having trouble loading the vcfs in to a project... these are my commands and output. Can you provide any help?

MY COMMANDS:

pseq testproject new-project --resources hg18
pseq /path/to/project/testproject load-vcf --vcf /path/to/TestVCFs/*.vcf

OUTPUT:

pseq error : database (/ifs/adni/pbhatt/ADNI/testproject_out/vardb) error (5) database is locked
plinkseq warning: database is locked (repeated 6 times)
plinkseq warning: preparing query database is locked
ADD REPLY
0
Entering edit mode

PLINK/SEQ documentation is not well maintained, it took me several hours of trial and errors to load the data. Try creating new project with resources and scratch folders defined, and ensure you have Read/Write access to those folders.

pseq proj1 new-project --resources /share/data/hg19 --scratch /tmp/myfolder.

Try loading 1 VCF file, if works then expand on your solution. There is GoogleGroups for pseq users.

ADD REPLY
0
Entering edit mode

Thanks! Yes I've tried posting in the GoogleGroups but have received more responses here. I agree about the PLINK/SEQ documentation - it's very difficult to understand when you're new to the software.

I loaded one vcf and it works fine - the problem is when i try to load more than one vcf together it seems...I will also try creating a new project with a scratch folder as well. Just so I know, what is the purpose of a scratch folder? - I couldn't find it on the Plink/Seq website.

ADD REPLY
0
Entering edit mode

I am guessing scratch folder is where temp files are created by PLINK/SEQ, before committing to database.

ADD REPLY

Login before adding your answer.

Traffic: 1728 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6