Question

An error occurred in bcftools view - S

0

Entering edit mode

2.3 years ago

yoser4 ▴ 10

Hello everyone

I have a VCF file with 99 samples. I want to split it to get a subset (multiple VCFs, each of which has a specific variety).

The code I use is as follows (I use loops,${id} is the name of the variety. The content in sampleID.txt has only one column, which is the name of the sample):

ls *_sampleID.txt | cut -d "_" -f 1 | while read id
do
        bcftools view -S ${id}_sampleID.txt  snps.vcf.gz  -Oz > ${id}.vcf.gz
done

However, an error occurred in the output result: the original file has about 40 million lines. Normally, the output file should have the same number of lines as the original file, but there is only a difference in the number of columns. However, the output file I get is only about 10 million lines. I don't know what the problem is.

Any help will be appreciated^-^

bcftools • 719 views

ADD COMMENT • link updated 2.1 years ago by Ram 44k • written 2.3 years ago by yoser4 ▴ 10

score 1 · Answer 1 · 2022-09-21

Isn't that because there are positions where your sample does not contain variants thus, those positions are not output?

the converse is also true. If you have two single-row VCF files with different positions, after merging, you would have two rows where each sample would indicate no-variant in one of the columns.

I want to also note that an output you don't fully understand is not necessarily an "error"... thinking about it as an error works against understanding it later.