Hi everybody,
I need to filter for DP and GQ or GQX for genotypes in FORMAT field, within several (380) vcf files, everyone belonging to a different patient. Then I have to merge them in a single file to analyze data. So for me it's the same doing it before merging or on the merged file. I try with vcftool
--gzvcf *.vcf.gz --minGQX 30 --minDP 30 --recode --recode-INFO-all --out fltD30GQ30
it work with one file, but I need to update for every (379) files for both input and output options. It does not work for the merged vcf (multisample).
have anyone hints? even with different tools or some cycles
I try with vcffilter
but I did not succed in installing it. command "vcffilter" not found
thank you all in advance
Diego
Have you tried bcftools?
Also you say
Can you elaborate on this? What do you mean by "does not work", and how do you know it does not work?
Thank you RamRS,
I tried bcftools. Which I use for merging vcfs. I use
As vcftools it worked with one, but not to all. With
*.vcf
in input file, as I don't know how to give multiple output names, it results in one vcf with all header and the sample-name and, unfortunately, with no data nor variants. When I use bcftools filter on a merged vcf some genotype with DP<30 and GQX<30 are still present. Furthermore bcftols did not write how many variants were filteredAbout vcftools on merged vcf, as bcftools, file is created right but genotype with DP<30 or GQ<30 are retained. Those are the warning of vcftools filtering merged vcf
You'll need to either merge the VCF files before running the
bcftools filter
operation or run bcftools once per file. When you use shell globs (such as the*.vcf
), the shell will first expand them before anything else happens.The way to give "multiple output filenames" is to run bcftools once per input file (using a loop or xargs or GNU parallel). I'd recommend using a loop as it the easiest of the options. Of course, if the loop was the problem in the first place, you don't need to switch to bcftools if you prefer vcftools.
Always Thank You RamRS, I don't know if I am right, I tried with loop:
it result in saving a single vcf file, with name in -o string containing data of only last sample processed. Without giving -o option it write all data on terminal.
I FOUND a WAY!
That's wonderful! Congratulations. You had to build an output file for each input file, and that is indeed what you've done. Plus, you figured out shell parameter expansion syntax, which will be quite useful going forward.
I don't understand the
${i//.vcf.gz}
though - did you mean to remove.vcf.gz
from$i
? If so, while what you're using works, it's better to use the correct syntax, which puts the.vcf.gz
at thesearch
spot and not thereplace
spot in${string/search/replace}
syntax.Yeah! each day some new achievement! =D
I don't know exactly what do those I write -where can I find some info?- But without it filtered vcf have messy names like: "sample-number.vcf.gz_out.vcf"
This is the page I use as my parameter expansion go-to: http://wiki-dev.bash-hackers.org/syntax/pe