Hi,
I'm having a problem while subsetting a vcf-file (from here: ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/ ) using the "VCFtools" function vcf-subset.
I downloaded the file for chromosome 11 from the link above and wanted to extract 11 samples using:
vcf-subset -c HG00096,HG00097,HG00099,HG00100,HG00101,HG00102,HG00103,HG00105,HG00106,HG00107,HG00108 -e chr11.vcf.gz > chr11_subset.vcf
I've used the exact same command on many other chromosomes before, from 22-12 it worked perfectly fine. I extracted also the same samples.
What I now get is an error with the message
Wrong number of fields in vcf_files/chr11.vcf.gz; expected 2513, got 1529. The offending line was:
[11 107608645 rs556912820 ATTTG A 100 PASS AC=9;AF=0.00179712;AN=5008;NS=2504;DP=20206;EAS_AF=0;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0.0092;VT=INDEL (then follows a list of genotypes, here all 0|0 )
Does anybody know how to deal with this? I already googled the error but I couldn't find a related problem that was actually solved.
Thanks in advance for any advice!
Is this the last line of the chr11.vcf file? If so it might be because of corrupted file or that you ran out of space when you try to write it. It seems like you are missing an awful lot of fields e.g. ~1000
I'm not sure if it really is the last line (might be though), I couldn't open the un-splitted file yet due to it's enormous size; nonetheless I'm confused where I should run out of space - it worked fine before, and I'm doing nothing different... Yes, the error says like ~1000 fields are missing, I just don't know why/where, and how to cope with this..
If that happened, the only thing you can do is to re-run the script. A quick check of the file can be using ls -lh to see if the size of the file is correct.
Thanks for that tip with the size check - the size is indeed smaller than it should be according to the given download link, so I guess it's just a download error! Thank you again!