Entering edit mode
6.5 years ago
princy149
▴
80
hi, I have two conditions of vcf file as one is merged Variant call VCF file of 50 samples and other is individual 50 genomic VCF files, I want to fill "missing genotype "information in merged VCF file using "REF "and "ALT" of individual 50 samples VCF file by using some program as "Perl" etc. so anyone have idea about this.
Thank you,
see:
It sounds like you want to do an 'imputation'. Can you confirm this?
hi kevin, thank you for your reply. I have exactly no idea about "Imputation" for this case. here i will explain you my process as firstly i make "variant call vcf files for each 50 samples separately using "bcftool" and after that I merged the all 50 samples "variant call vcf files" but in merged file I got various "missing genotype calls" in samples at different locai look like as" ./.:." so I need to filled these information by using "Ref " and "ALT" information. for this purpose I made "genomic VCF" file for each samples using their "sorted BAM" files and "reference genome fasta " using samtools.
after making genomic vcf file , I want to create a program using Perl or Python in which I can use 50 samples "genomic VCF" file (as separate files) and 50 Samples "merged variant call vcf file" matched together and create genotype for "missing GT" info in "merged variant call vcf file). please help me for this.
Thank you,
did you look at bcftools annotate function?
Thank you Pierre for valuable links. I read your written program " https://github.com/lindenb/jvarkit/wiki/FixVcfMissingGenotypes" but is it possible when I can use "genomic VCF files " instead of BAM files and is there any options to make program in Perl or Python or R instead of java. I have condition for making of program for filling missing GT info used both type of vcf files.
ah, I see; you I would first use GATK combinevariant to merge both VCF to get both genotypes for the same variant.
Thank you Pierre for reply.
Is anyone has idea to solve this problem using perl or python like PyVCF etc..
Can you paste a quick example of exactly what you want to do? Just reading textual descriptions is not enough to grasp what one wants, at times.
Paste what you have right now, and then what you would like to have.
hi,kevin
Sorry for my late post.
this is my merged variant call vcf file data. here as you can see many "./.:." GT positions for samples so I want to fill this by using individual sample gVCF data using its "REF/ALT" info for filling of no GT info in merged vcf file.
Did you not try Pierre's program: A: Back-filling missing genotypes in merged VCF ?
This is "gVCF " format of one sampple, contains variant and non variant both sits.
Thank you kevin for your prompt reply. I want to use individual gVCF files instead of Bam files for filling of GT info where it is not found. And also want to use Perl instead of java. This is a condition for me.
I do not doubt that this is possible using perl or some customised solution, but you should not heavily restrict yourself to any one particular language. I don't know if there are many Perl programmers on Biostars.
If I get free time, I may do this in AWK or Python for you, but I am also pressed for time.
Thank you very much kevin for your reply. I will appreciate you for helping me and giving your precious time for this query. looking forward for your solutions for this.
Just to be aware: most of the genotypes that you've listed in the gVCF have an asterisk. These can potentially be removed.
Please read here: What does <*> mean in a vcf file?
A solution for you is to just merge your data with the gVCF using
bcftools merge
, and use the following flag:By the way, here is a simple AWK script for parsing VCFs, in case you want to learn:
It just looks up positions from test.vcf in Ref.vcf. If the positions match, it displays the reference genotype that's available (i.e. the one that could be added to missing genotypes).
To be honest, though, I think that all that you need is the BCFtools command that I mentioned (above)
Thank you so much Kevin, you'r really very helpful. I appreciate you very much for your hard work to making solution of my problem.
I will sure Try This.
Thank you again :)
Please use
ADD COMMENT
orADD REPLY
to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.Good luck using gvcfs and perl for this purpose, but if you have requirements like that I suggest you figure it out on your own.