Hello everyone
I have a VCF file with 99 samples. I want to split it to get a subset (multiple VCFs, each of which has a specific variety).
The code I use is as follows (I use loops,${id}
is the name of the variety. The content in sampleID.txt
has only one column, which is the name of the sample):
ls *_sampleID.txt | cut -d "_" -f 1 | while read id
do
bcftools view -S ${id}_sampleID.txt snps.vcf.gz -Oz > ${id}.vcf.gz
done
However, an error occurred in the output result: the original file has about 40 million lines. Normally, the output file should have the same number of lines as the original file, but there is only a difference in the number of columns. However, the output file I get is only about 10 million lines. I don't know what the problem is.
Any help will be appreciated^-^