Hi, I have a 4 GB *.vcf file and would like to filter only the chrome allocations that I need and write to a new file.
for example this one :
17 7571720 7590868c
3 10141635 10153670
i saved it to *.bed file and try it this command:
vcftools --gzvcf /home/user/Documents/*.vcf --bed /home/user/Documents/list.bed --out /home/sentinel/Documents/test
return: -> No data left for analysis!
VCFtools - 0.1.17
(C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted:
--gzvcf /home/user/Documents/*.vcf
--out /home/user/Documents/test
--recode
--bed list.bed
Using zlib version: 1.2.11
Warning: Expected at least 2 parts in INFO entry: ID=BLOCKAVG_min30p3a,Number=0,Type=Flag,Description="Non-variant multi-site block. Non-variant blocks are defined independently for each sample. All sites in such a block are constrained to be non-variant, have the same filter value, and have sample values {GQX,DP,DPF} in range [x,y], y <= max(x+3,(x*1.3)).">
Warning: Expected at least 2 parts in INFO entry: ID=BLOCKAVG_min30p3a,Number=0,Type=Flag,Description="Non-variant multi-site block. Non-variant blocks are defined independently for each sample. All sites in such a block are constrained to be non-variant, have the same filter value, and have sample values {GQX,DP,DPF} in range [x,y], y <= max(x+3,(x*1.3)).">
Warning: Expected at least 2 parts in INFO entry: ID=BLOCKAVG_min30p3a,Number=0,Type=Flag,Description="Non-variant multi-site block. Non-variant blocks are defined independently for each sample. All sites in such a block are constrained to be non-variant, have the same filter value, and have sample values {GQX,DP,DPF} in range [x,y], y <= max(x+3,(x*1.3)).">
Warning: Expected at least 2 parts in INFO entry: ID=BLOCKAVG_min30p3a,Number=0,Type=Flag,Description="Non-variant multi-site block. Non-variant blocks are defined independently for each sample. All sites in such a block are constrained to be non-variant, have the same filter value, and have sample values {GQX,DP,DPF} in range [x,y], y <= max(x+3,(x*1.3)).">
Warning: Expected at least 2 parts in INFO entry: ID=BLOCKAVG_min30p3a,Number=0,Type=Flag,Description="Non-variant multi-site block. Non-variant blocks are defined independently for each sample. All sites in such a block are constrained to be non-variant, have the same filter value, and have sample values {GQX,DP,DPF} in range [x,y], y <= max(x+3,(x*1.3)).">
Warning: Expected at least 2 parts in INFO entry: ID=BLOCKAVG_min30p3a,Number=0,Type=Flag,Description="Non-variant multi-site block. Non-variant blocks are defined independently for each sample. All sites in such a block are constrained to be non-variant, have the same filter value, and have sample values {GQX,DP,DPF} in range [x,y], y <= max(x+3,(x*1.3)).">
Warning: Expected at least 2 parts in FORMAT entry: ID=GQX,Number=1,Type=Integer,Description="Empirically calibrated genotype quality score for variant sites, otherwise minimum of {Genotype quality assuming variant position,Genotype quality assuming non-variant position}">
Warning: Expected at least 2 parts in FORMAT entry: ID=GQX,Number=1,Type=Integer,Description="Empirically calibrated genotype quality score for variant sites, otherwise minimum of {Genotype quality assuming variant position,Genotype quality assuming non-variant position}">
Warning: Expected at least 2 parts in FORMAT entry: ID=FT,Number=1,Type=String,Description="Sample filter, 'PASS' indicates that all filters have passed for this sample">
Warning: Expected at least 2 parts in FORMAT entry: ID=DPI,Number=1,Type=Integer,Description="Read depth associated with indel, taken from the site preceding the indel">
Warning: Expected at least 2 parts in FORMAT entry: ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
After filtering, kept 1 out of 1 Individuals
Outputting VCF file...
Read 2 BED file entries.
After filtering, kept 0 out of a possible 41203829 Sites
No data left for analysis!
Run Time = 40.00 seconds
any ideas ?
use bedtools
intersect
(https://bedtools.readthedocs.io/en/latest/content/tools/intersect.html). But make sure that your VCF is formatted well @ dev.info.2021