Entering edit mode
7.0 years ago
victoria_aleks
▴
60
Dear all, I need to know the "coverage" of sites in vcf files, but the files i am working with do not contain DP info, only AD. Most of the tools for vcf parcing use DP parameter. I understand that I can use AD to estimate depth, too. Is there any known ways of how to? Will be happy to any help:)
and what if i dont have neither bams nor reference fasta? I only downloaded vcf files from the source
you cannot get the depth without the bams...
a reference sequence is always needed to run gatk. https://gatkforums.broadinstitute.org/gatk/discussion/1601/how-can-i-prepare-a-fasta-file-to-use-as-reference
ah, sorry, now I see why you're asking: you already have AD...
Ok here is a solution using vcfilterjdk http://lindenb.github.io/jvarkit/VcfFilterJdk.html
First I use awk to insert the new VCF header for DP . Then I loop over the genotypes and I create a new variant.
thank you very much. unfortunately vcffilterjdk constantly gives errors, in my hands :) Is it possible just write a home-made script to extract AD values for each line in vcf-file? I read that its better not to mess with vcf parcing and rather use the ready software. but right now it seems easier just code something by our own...
unfortunately vcffilterjdk constantly gives errors, in my hands
:) basically my question was: does it worth it to write a script myself or better not to mess? anyway, the error is: 2 errors
I do understand that.
But it's a pity for the developer (aka 'me') to see that his tool is not used just because 'it doesn't work'. :-)
It's hard to debug without the context. what is exactly your command line please ?
i see:) honestly i repeated exactly what you wrote. except, first i ran awk command (i checked the resulting file, and DP appeared there. only: i do not understand why do I need incorporate DP in the header if I want to use AD anyway?). then I set this awk-mdified file as the input for vcffilterjdk:
I also tried to use -f instead -e; the error was:
(in case, i tried to pipe awk into vcffilterjdk, as you have shown - didnt help)
I hope it gives some useful info
because my script produces a new attribute DP and it must be declared in the VCF header if it wasn't declared before.
you say
it should be something like
its a "wrapper", as our IT guy likes to use short words:) so, "java -jar ..." is included in "vcffilterjdk"
but I never use it and your messages show me that there is an error in the script (thanks for pointing this out!) the script ends with
while I think I should I written it as (with double quotes)
you mean that you want to correct the script?
I've just fixed this https://github.com/lindenb/jvarkit/commit/7f811475a7648d24289702f49d53c89fb53761c9#diff-b67911656ef5d18c4ae36cb6741b7965R133
but anyway, I've never said you should use this wrapper, use the
java -jar ...
syntax please.now it worked:) as I understand the script summated AD for ref and alt and wrote it into DP parameter. but now i try to use vcftools to extract this DP (for example, --depth or --geno-depth) and it reports on 'missing entries'. and i still cannot use gatk, i think, as i have only vcf files, no fasta or bam
is there something that can be done here?:)
I'm glad it's working now !
piping my example into 'bcftools view' produces no error on my side
you can use another program bioalcidaejdk similar to vcffilterjdk:
thank you a lot:) it all finally seem to work now!
but also: can I extract exactly AD values, not only DP?? I want to know how many reads contain ref and how many contain alt nucleotides.
ok, at the end we had to write our own script:)