We are working on some project to check concordance between Genotype Sample and NGS sample. We are using MEGA Genotype Array. We have processed genotype sample using GenomeStudio from Illumina and following procedure given at Is there a tool to transform GenomeStudio genotype format to VCF? .
Generated (Genotype) VCF has some very weird entries with Chromosome as 0 and Position as 0 as well. I am not sure why these entries are coming with Genotype VCF. As we are using same genotype file for concordance with Sequencing VCF file and we are getting concordance around 80-85% only which ideally should be more then 90%. Could some one help me with some additional input about any improvements specially why We are getting Chromosome as 0 and Position as 0 . Apart from this we are not getting QUAL score as well for any of VCF entries .
If it is not a must for you to work on the vcf data, you may try and convert the genome studio data into plink format. Then you can use the --vcf function from plink to convert the vcf format into plink format. That will allow you to easily check the concordance. However, for those SNPs with chromosome 0 and position 0, something might be wrong. How do you call the VCF file from the NGS samples?
side note: vassialk, it is not really helpful to keep posting the commercial software here without actually answering the question.
ADD COMMENT
• link
updated 5.0 years ago by
Ram
44k
•
written 9.0 years ago by
Sam
★
4.8k
0
Entering edit mode
We have used plink format and then converted in vcf format only.
So then you don't need to transform the genomestudio format into vcf? If you already got the vcf files, then you should follow Robert's suggestion to use bcftools for the concordance estimation.
ADD REPLY
• link
updated 5.0 years ago by
Ram
44k
•
written 9.0 years ago by
Sam
★
4.8k
Chr0 probes in the genome studio project are problem probes (mapping to multiple locations, etc.) and should be omitted prior to calculating concordance.
After that, you can use bcftools and intersect your array.vcf with sequencing.vcf to create union.vcf. Then compare concordance between array.vcf & sequencing.vcf for only locations in union.vcf
We have used plink format and then converted in vcf format only.
So then you don't need to transform the genomestudio format into vcf? If you already got the vcf files, then you should follow Robert's suggestion to use bcftools for the concordance estimation.