Entering edit mode
9.0 years ago
hellbio
▴
520
Hi,
I have multi-sample vcf file and an example variant is shown below:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 03-071 04-051 04-071 06-044 07-085 10-009
chr1 6526093 . T C 197.77 . AC1=1;AC=1;AF1=0.5 GT:GQ:DP:PL:AD 0/1 1/1 0/0 1/1 1/1 0/1
For each variant I would need to retrieve the sample names based on genotype. If the genotype is 0/1
it should output the first 5 columns and the sample names in the 6th column.
chr1 6526093 . T C 03-071,10-009
If the genotype is 1/1
it should output:
chr1 6526093 . T C 04-051,06-044,07-085
The original file has >500 sample and I would need to get the output in the above format. Are there any tools which can do this to some extent and further tweaking to get the desire output format?
Thanks. This looks like printing each variant for each sample which would be very large output especially for WGS vcf file with 500 samples as it will print 500 rows for 1 variant. Instead, it would be helpful if it could print only the variant in single row and add additional columns fwith sample names to represent hom/het variants.
so the script:
output:
Thanks for you time. Could we achieve it this way.
The script need not bother about the multiple alternate alleles. It only has to consider 0/1 or 1/1 or 1/2 or 2/2. The above output has heterozygote sample names in 4th column and homozygote sample names in 5th column.
Could we just save the updated script and run or should we download the tool and install?
uh ??
this is just a simple loop. I let this as an exercise...
I tried to download and install but unsuccessful.
Terminated with the below error:
Could you help to work through the tool.
ant is required https://github.com/lindenb/jvarkit/wiki/Compilation#requirements--dependencies