bcftools isec output for multiple vcf files interpretations
1
0
Entering edit mode
23 months ago
minoo ▴ 10

I have four vcf files resulting from mutect2 function. There is one treatment and three control samples. I need to get variants that are unique to the treatment plus variants that are common in all four vcf files files(treatment and control samples). To do so, I first the following command:

bcftools isec -n~1000 -c both treatment.vcf.gz control1.vcf.gz control2.gz control3.vcf.gz -p result

The output contaigns four files from 0000 to 0003. And then

bcftools isec -n~1111 -c both treatment.vcf.gz control1.vcf.gz control2.gz control3.vcf.gz -p result2

And then merge the results of these commands to single vcf file. But as there are four vcf files from each commadn I dont know which one shpould I choose. Any idea

bcftools mutect2 variant-calling gatk • 3.8k views
ADD COMMENT
0
Entering edit mode

There should be a README.txt file in the results directory that describes the files. Did you check it?

ADD REPLY
0
Entering edit mode

Yes, it tells each file is stripped from comparing pairwise of treatment sample with each control seperately. I thought it would return a strict vcf file including the comparison between treatment and all three controls

ADD REPLY
4
Entering edit mode
23 months ago
cfos4698 ★ 1.1k

Your commands are correct, and bcftools is working as it's meant to. As GenoMax said, the info you need is all in the README.txt file. The four files created correspond, in order, with your input files. So, 0000.vcf = treatment.vcf.gz, and so on.

For command 1: 0000.vcf should have the variants found only in treatment.vcf.gz. The other three vcf files should be empty.

For command 2: all output VCF files should contain only the variants found across all input VCF files.

You can either merge the output VCF files, or merge the input files and use the sites in the ouput sites.txt as a filter.

I tested this on toy data and it works.

ADD COMMENT
0
Entering edit mode

Sorry for the naive question: could you let me know what the difference between using ~ 1000 and using ~1111 ? how this relfect on the output. So If I want to obtain common variants among 6 VCF file, so do I need to make the command like this

bcftools isec -p /dir -n=6 file1.vcf file2.vcf file3.vcf file4.vcf file5.vcf file6.vcf 

In this case do I expect a file containing common variants in all VCF files ?

May be another question: if I added -c all to that command, will this produce variants that share (or unique) in position based on only position or position on whatever chromosome ?

Thanks

ADD REPLY
1
Entering edit mode

From the manpage online:

-n, --nfiles [+-=]INT|~BITMAP
output positions present in this many (=), this many or more (+), this many or fewer (-), or the exact same (~) files

-n~1000: extract the variants in VCF file #1 but not in VCF files #2, #3, and #4 -n~1111: extract the variants found in all four VCF files

To find the variants in common in six different VCF files:

bcftools isec -p dir -n=6 file1.vcf.gz file2.vcf.gz file3.vcf.gz file4.vcf.gz file5.vcf.gz file6.vcf.gz

or

bcftools isec -p dir -n~111111 file1.vcf.gz file2.vcf.gz file3.vcf.gz file4.vcf.gz file5.vcf.gz file6.vcf.gz

or

bcftools isec -p dir -n+6 file1.vcf.gz file2.vcf.gz file3.vcf.gz file4.vcf.gz file5.vcf.gz file6.vcf.gz

As you can see, there are multiple ways to achieve what you want. Just carefully follow the instructions in the manual. You'll get many output files - read the output README.txt file.

As for the -c option, I haven't needed to use it before. The easiest way for you to check would be to make a small dummy file with multiple 'chromosomes' and see what happens.

ADD REPLY

Login before adding your answer.

Traffic: 4105 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6