Dear Friends,
I have a list of 8000 samples in a file "samples.txt":
samples.txt:
TCGA..barcode..
TCGA..barcode..
.
.
I am using bcftools to only keep these samples in the vcf.gz file. The vcf.gz file has 10000 samples. Hence, I am trying to use bcftools to keep only the 8000 samples in the "samples.txt" file in the vcf.gz file and remove the remaining 2000 samples. I did:
bcftools -S samples.txt vcf.gz -o filtered-vcf.vcf
it gives me error:
[E::main] unrecognized command -S
Could you please suggest me what could be the issue here, and how you think I can do the above? Thanks much.
Thanks much Pierre! I ran this. however, it showed one error saying:
Error: subset called for sample that does not exist in header "TCGA..."
If am right, this means that the mentioned "TCGA.." sample in "samples.txt" is not present in the vcf.gz file? So, I used "--force-samples" to ignore this warning and it runs now.
I used similar command as bcftools view -S samplelist.txt input_file.vcf.gz -o newfiltered.vcf.gz to subset sample data from compressed vcf file. but got error message [w::bcf_sr_add_reader] No BGZF EOF markers; file 'input_file.vcf.gz' may be truncated. I have to abort the execution since I don't understand what this error message means. Could someone help me please.
your input file is corrupted because input_file.vcf.gz is missing an EOF = END OF FILE signature.