Odd output format for bcftools isec
1
0
Entering edit mode
5 months ago

I have 19 samples, and I wish to get the common loci shared by 8 of them. For this I use the code below:

    # List of VCF files 

vcf_files=(
        "Beg1_1.g.vcf" "Beg1_2.g.vcf" "Beg1_4.g.vcf" "Beg1_5.g.vcf"
        "Beg2_2.g.vcf" "Beg2_5.g.vcf" "Beg3_3.g.vcf" "Beg3_5.g.vcf"
        "Beg3_7.g.vcf" "Beg5_1.g.vcf" "Beg5_7.g.vcf" "Beg6_4.g.vcf"
        "Beg6_6.g.vcf" "Beg6_7.g.vcf" "Beg7_3.g.vcf" "Beg8_7.g.vcf"
        "Beg9_2.g.vcf" "Beg9_7.g.vcf" "hill.g.vcf" )

    # Compress VCF files 

for file in "${vcf_files[@]}"; do
        bgzip -c "$file" > "${file}.gz" done

    # Index the compressed VCF files 

for file in "${vcf_files[@]}"; do
        tabix -p vcf "${file}.gz" done

    # Intersect VCF files 

bcftools isec -n=8 "${vcf_files[@]/%/.gz}" > intersected.vcf

But looking at the result I realized that the intersected.vcf is odd. There is no vcf header, and the file is not really in vcf format. Would anyone know what I am doing wrong?

1185_polypolish 59005   C   <NON_REF>   1110010011011000000
1185_polypolish 59011   T   <NON_REF>   1011001011011000000
1185_polypolish 59020   A   <NON_REF>   1011001011011000000
1185_polypolish 59025   C   T,<NON_REF> 1010011011011000000
1185_polypolish 59028   G   A,<NON_REF> 1010011011011000000
1185_polypolish 59031   T   C,<NON_REF> 1010011011011000000
1185_polypolish 59061   T   C,<NON_REF> 1011001011011000000
1185_polypolish 59062   A   <NON_REF>   1011001011011000000
1104_polypolish 108092  T   G,<NON_REF> 0001000110011110010
1104_polypolish 108095  T   G,<NON_REF> 0001000110011110010
bcftools vcf intersection • 380 views
ADD COMMENT
1
Entering edit mode

Check the README.txt file that should be present in the output folder. You also seem to be using gvcf's.

ADD REPLY
2
Entering edit mode
5 months ago
 T,<NON_REF> 

it looks like your trying to handle G.VCF files https://gatk.broadinstitute.org/hc/en-us/articles/360035531812-GVCF-Genomic-Variant-Call-Format . Bcftools isec is not the right tool to do this. You should use gatk combinegvcf and gatk genotypegvcfs

ADD COMMENT
0
Entering edit mode

Thank you, I was completely unaware of the difference between VCF and GVCF file.

ADD REPLY

Login before adding your answer.

Traffic: 2543 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6