Comparing VCF with dbsnp
1
0
Entering edit mode
8.1 years ago

Hi,

I want to compare my vcf file with dbsnp so i have used RTG tools for comparing my results while running it i have encountered the error

Error: Record did not contain enough samples: 1 69552 rs55874132 G C. CFL;HD;OTHERKG;REF;RS=55874132;RSPOS=69552;S3D;SAO=0;SSR=0;SYN;VC=SNV;VP=0x050200000309000402000100;WGT=1;dbSNPBuildID=129

Command used for comparing ./Tools/rtg-tools-3.6.1/rtg vcfeval --all-records -b dbsnp_sort.vcf.gz -c Gatk_bowtie_sure_select.vcf.gz -T 3 -t Reference/RTG/HG37 -o Gatk

while comparing using vcf-compare

I got the results like this but i dont know to interpret my data can any one help me with these errors

Results from vcf-compare

This file was generated by vcf-compare. The command line vcf-compare dbsnp_sort.vcf.gz Gatk_bowtie_sure_select.vcf.gz

'Venn-Diagram Numbers'. Use grep ^VN | cut -f 2- to extract this part.

VN The columns are:

VN 1 .. number of sites unique to this particular combination of files

VN 2- .. combination of files and space-separated number, a fraction of sites in the file

VN 4423 Gatk_bowtie_sure_select.vcf.gz (11.6%)

VN 33680 Gatk_bowtie_sure_select.vcf.gz (88.4%) dbsnp_sort.vcf.gz (14.0%)

VN 206183 dbsnp_sort.vcf.gz (86.0%)

SN Summary Numbers. Use grep ^SN | cut -f 2- to extract this part.

SN Number of REF matches: 33394

SN Number of ALT matches: 32073

SN Number of REF mismatches: 286

SN Number of ALT mismatches: 1321

SN Number of samples in GT comparison: 0

Number of sites lost due to grouping (e.g. duplicate sites): lost, %lost, read, reported, file

SN Number of lost sites: 1573 0.7% 241436 239863 dbsnp_sort.vcf.gz

SN Number of lost sites: 2 0.0% 38105 38103 Gatk_bowtie_sure_select.vcf.g

next-gen dbsnp VCF RTG • 3.1k views
ADD COMMENT
3
Entering edit mode
7.8 years ago
Len Trigg ★ 1.6k

rtg vcfeval performs a pairwise comparison of the (usually diploid) haplotypes asserted by the GT field in the sample column of your VCF. In your case, are supplying a VCF that does not contain a sample column.

Comparing against a "database-style" VCF like dbSNP is something that will be added to vcfeval in the future. For now, you could add a synthetic sample to your dbSNP VCF that includes a GT field with a value of 1 (i.e. referring to the ALT allele), and then run vcfeval with --squash-ploidy to tell it to do a haploid comparison only when it compares against your calls.

(Edit: since 3.7 vcfeval now supports comparison against database-style VCFs, using the special sample identifier "ALT")

ADD COMMENT
0
Entering edit mode

see example in the RTG manual

ADD REPLY

Login before adding your answer.

Traffic: 1933 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6